Understanding Data Pump Architecture will let you take
advantage of its speed and features. Also, knowing the intricacies of a tool does lend you the
powers to identify the use cases when that tool would be a better/worse fit
than the alternatives.
So without further ado, here we go:
The MASTER TABLE
This is a table
created in the schema of the user who runs the data pump.
It contains
following details about the job:
-
current state of every object being imported/exported
-
locations of objects in the dumpfile set
-
status of every worker process
-
current set of dumpfiles
-
restart information etc
During export,
the master table is built as the first step of export job and written to the
dump set as the last step.
During import,
the master table is loaded first into the schema from the dumpset
The PROCESSES
·
Client Process
The expdp and impdp sessions
invoked by users are the client processes
A client is not required once a
job is underway.
Multiple clients may
attach/detach from a job as necessary for monitoring and control
·
Shadow Process
A standard shadow (foreground)
process is created when any client logs into Oracle database. Same is true for the Data Pump
client processes.
It services Data Pump API
requests.
In other words :
o Upon
receipt of DBMS_DATAPUMP.OPEN request, shadow process creates a job, which
consists of creating Master table, Advanced Queuing (AQ) queues for
communication among processes and master control process.
o Once
Data Pump job is running, main task of the shadow process is to service
GET_STATUS requests from the client.
If client process detaches,
shadow process also goes away.
·
Master Control Process (MCP)
This is the Captain of the
Ship, and as there is only one captain for a ship, there is only one MCP for
each Data Pump job.
It takes care of following:
o job
state
o job
description
o restart
information in master table
o dumpfile
info in master table
o starting/stopping
worker processes
Job is divided into various
phases of metadata and data loading/unloading and each phase has associated worker processes.
MCP hands out work requests to worker processes appropriate for current
phase. Majority of MCP processing is spent in this work dispatch loop.
MCP also has file management
duties, maintaining active dumpfile list, handing out file pieces as requested
by process unloading/loading data.
MCP process is of form: <instance>_DMnn_<pid>
·
Worker Process
These are the petty officers
and subordinates that actully do the work on Master's direction. (cruel world eh?)
Upon receipt of START_JOB
request, MCP creates worker processes based on the PARALLEL parameter. Worker
process performs tasks assigned by MCP (primarily loading/unloading metadata
and data) and maintain object rows that make up bulk of master table.
As object rows are
loaded/unloaded, status is updated in the master table, pending/completed/failed
etc.
Workers also maintain TYPE
COMPLETION ROWS which track the type of objects being worked on:
tables/views/indexes etc. These rows are used during restart.
A worker process has name of
form : *DWnn*
·
Parallel Query Process -
If External table data access
method is chosen for loading/unloading table/partition, worker process creates
parallel query processes which carry out actual load/unload. Worker process act
as query coordinators. These are standard parallel query execution slaves.
In RAC, to take advantage of
load sharing opportunity, the PQ processes may be created on a different
instance and all other processes may operate on the initial instance.
Now let's move a little beyond the meta information and
look at the bird's eye view of the actual work
The DATA MOVEMENT
There are four data movement methods
·
Data File Copying (Transportable Tablespace)
·
Direct Path Load
·
External Tables
·
Conventional Path
Data file copying has some limitations as certain data
types cannot be transported.
Also, Character sets must be same at both the systems.
Some types of data, some types of tables can't be moved. e.g.
encrypted data cannot be moved.
___________________________________________________
EXPDP
Conventional/Direct path loads write the dump
in a
binary stream that is at least 15% smaller
than
original EXP representation
___________________________________________________
The METADATA MOVEMENT
DBMS_METADATA package is used by worker processes.
EXP utility used to store object definitions like SQL
DDL.
EXPDP on the other hand writes object definitions to dump files as XML docs.
XML might take more space but allows for more features
(object's ownership, storage characteristics, tablespace etc. can be changed during
import). COMPRESSED parameter might be added to reduce size.
This also allows more flexibility in remapping the data
into different schemas, changing the object names during import etc.
The INTERPROCESS COMMUNICATION
AQ mechanism is used by Data Pump processes to
communicate.
There are 2 queues:
1.
Command and Control Queue:
All processes except clients
subscribe to this queue. All API commands, work requests and responses, file
requests and log messages are processed by this queue.
2.
Status Queue:
Only MCP writes to this queue
and only shadow process subscribes to this. Recieves work in progress and error
message info.
The SECURITY
All the IO related to the data pump job is handled by the
Oracle background server process.
Hence, the OS user doing the IO is 'oracle'.
This is a security risk as 'oracle' is a privileged
account.
Hence, all directory specifications are made using Oracle
directory objects, with the directory read/write grants established by the DBA.
The INITIALIZATION PARAMETERS
DISK_ASYNCH_IO should be set to TRUE to allow for
asynchronous IO
DB_BLOCK_CHECKSUM should ideally be FALSE but impact is
minimal (5%) otherwise
STREAMS_POOL_SIZE should be set sufficiently
No comments:
Post a Comment