Data Pump Architecture and Getting Started Manual

Oracle Data Pump

Data Pump is often described as an upgraded version of the old exp/imp utilities. That depiction is inaccurate; it’s a bit like calling a modern smartphone a replacement for an old rotary-dial landline. Although the old utilities are dependable and work well, Data Pump encompasses that functionality and while adding completely new dimensions to how data can be lifted and moved between environments. This post will help explain how Data Pump makes your current data transfer tasks easier and will also show how to move information and solve problems in ways that you didn’t think were possible.



Table of contents[Show]

Data Pump enables you to efficiently back up, replicate, secure, and transform large amounts data and metadata. You can use Data Pump in a variety of ways:

  • Perform point-in-time logical backups of the entire database or subsets of data
  • Replicate entire databases or subsets of data for testing or development
  • Quickly generate DDL required to recreate objects
  • Upgrade a database by exporting from the old version and importing into the new version

Sometimes, DBAs exert a Luddite-like attachment to the exp/imp utilities because the DBAs are familiar with the syntax of these utilities, and they get the job done quickly. Even if those legacy utilities are easy to use, you should consider using Data Pump going forward. Data Pump contains substantial functionality over the old exp/imp utilities:

  • Performance with large data sets, allowing efficient export and import gigabytes of data
  • Interactive command line utility, which lets you disconnect and then later attach to active Data Pump jobs
  • Ability to export large amounts of data from a remote database and import them directly into a local database without creating a dump file
  • Ability to make on-the-fly changes to schemas, tablespaces, data files, and storage settings from export to import
  • Sophisticated filtering of objects and data
  • Security-controlled (via database) directory objects
  • Advanced features, such as compression and encryption

This article begins with a discussion on the Data Pump architecture. Subsequent topics include basic export and import tasks, moving data across networks, filtering data, and running Data Pump in legacy mode.

Data Pump Architecture

Data Pump consists of the following components:

The expdp and impdp utilities use the DBMS_DATAPUMP and DBMS_METADATA built-in PL/SQL packages when exporting and importing data and metadata. The DBMS_DATAPUMP package moves entire databases or subsets of data between database environments. The DBMS_METADATA package exports and imports information about database objects.

Image Note  You can call the DBMS_DATAPUMP and DBMS_METADATA packages independently (outside expdp and impdp) from SQL*Plus. I rarely call these packages directly from SQL*Plus, but you may have a specific scenario in which it’s desirable to interact directly with them. See the Oracle Database PL/SQL Packages and Types Reference Guide, which is available for download from the Technology Network area of the Oracle Web site (http://otn.oracle.com), for more details.

When you start a Data Pump export or import job, a master OS process is initiated on the database server. This master process name has the format ora_dmNN_<SID>. On Linux/Unix systems, you can view this process from the OS, prompt using the ps command:

$ ps -ef | grep -v grep | grep ora_dm

oracle   14602     1  4 08:59 ?        00:00:03 ora_dm00_O12C

Depending on the degree of parallelism and the work specified, a number of worker processes are also started. If no parallelism is specified, then only one worker process is started. The master process coordinates the work between master and worker processes. The worker process names have the format ora_dwNN_<SID>.

Also, when a user starts an export or import job, a database status table is created (owned by the user that starts the job). This table exists only for the duration of the Data Pump job. The name of the status table is dependent on what type of job you’re running. The table is named with the format SYS_<OPERATION>_<JOB_MODE>_NN, where OPERATION is either EXPORT or IMPORT. JOB_MODE can be one of the following types:

  • FULL
  • SCHEMA
  • TABLE
  • TABLESPACE
  • TRANSPORTABLE

For example, if you’re exporting a schema, a table is created in your account with the name SYS_EXPORT_SCHEMA_NN, where NN is a number that makes the table name unique in the user’s schema. This status table contains information such as the objects exported/imported, start time, elapsed time, rows, and error count. The status table has more than 80 columns.

Image Tip  The Data Pump status table is created in the default permanent tablespace of the user performing the export/import. Therefore, if the user has no privileges to create a table in the default tablespace, the Data Pump job will fail, with an ORA-31633 error.

The status table is dropped by Data Pump upon successful completion of an export or import job. If you use the KILL_JOB interactive command, the master table is also dropped. If you stop a job with the STOP_JOB interactive command, the table isn’t removed and is used in the event you restart the job.

If your job terminates abnormally, the master table is retained. You can delete the status table if you don’t plan to restart the job.

When Data Pump runs, it uses a database directory object to determine where to write and read dump files and log files. Usually, you specify which directory object you want Data Pump to use. If you don’t specify a directory object, a default directory is used. The default directory path is defined by a data directory object named DATA_PUMP_DIR. This directory object is automatically created when the database is first created. On Linux/Unix systems this directory object maps to the ORACLE_HOME/rdbms/log directory.

A Data Pump export creates an export file and a log file. The export file contains the objects being exported. The log file contains a record of the job activities. Figure 8-1 shows the architectural components related to a Data Pump export job.

Similarly, Figure 8-2 displays the architectural components of a Data Pump import job. The main difference between export and import is the direction in which the data flow. Export writes data out of the database, and import brings information into the database. Refer back to these diagrams as you work through Data Pump examples and concepts throughout this blog.

For each Data Pump job, you must ensure that you have access to a directory object. The basics of exporting and importing are described in the next few sections.

Image Tip  Because Data Pump internally uses PL/SQL to perform its work, there needs to be some memory available in the shared pool to hold the PL/SQL packages. If there is not enough room in the shared pool, Data Pump will throw an ORA-04031: unable to allocate  bytes of shared memory... error and abort. If you receive this error, set the database parameter SHARED_POOL_SIZE to at least 50M. See MOS note 396940.1 for further details.

Getting Started

Now that you have an understanding of the Data Pump architecture, next is a simple example showing the required export setup steps for exporting a table, dropping the table, and then reimporting the table back into the database. This will lay the foundation for all other Data Pump tasks covered in this blog.

A small amount of setup is required when you run a Data Pump export job. Here are the steps:

  1. Create a database directory object that points to an OS directory that you want to write/read Data Pump files to/from.
  2. Grant read and write privileges on the directory object to the database user running the export.
  3. From the OS prompt, run the expdp utility.

Before you run a Data Pump job, first create a database directory object that corresponds to a physical location on disk. This location will be used to hold the export and log files and should be a location where you know you have plenty of disk space to accommodate the amount of data being exported.

Use the CREATE DIRECTORY command to accomplish this task. This example creates a directory named dp_dir and specifies that it is to map to the /oradump physical location on disk:

SQL> create directory dp_dir as '/oradump';

To view the details of the newly created directory, issue this query:

SQL> select owner, directory_name, directory_path from dba_directories;

Here is some sample output:

OWNER      DIRECTORY_NAME  DIRECTORY_PATH

---------- --------------- --------------------

SYS        DP_DIR          /oradump

Keep in mind that the directory path specified has to physically exist on the database server. Furthermore, the directory has to be one that the oracle OS user has read/write access to. Finally, the user performing the Data Pump operations needs to be granted read/write access to the directory object (see step 2).

If you don’t specify the DIRECTORY parameter when exporting or importing, Data Pump will attempt to use the default database directory object (as previously discussed, this maps to ORACLE_HOME/rdbms/log). I don’t recommend using the default directory for two reasons:

  • If you’re exporting large amounts of data, it’s better to have on disk the preferred location, where you know you have enough room to accommodate your disk space requirements. If you use the default directory, you run the risk of inadvertently filling up the mount point associated with ORACLE_HOME and then potentially hanging your database.
  • If you grant privileges to non-DBA users to take exports, you don’t want them creating large dump files in a location associated with ORACLE_HOME. Again, you don’t want the mount point associated with ORACLE_HOME to become full to the detriment of your database.

You need to grant permissions on the database directory object to a user that wants to use Data Pump. Use the GRANT statement to allocate the appropriate privileges. If you want a user to be able to read from and write to the directory, you must grant security access. This example grants access to the directory object to a user named MV_MAINT:

SQL> grant read, write on directory dp_dir to mv_maint;

All directory objects are owned by the SYS user. If you’re using a user account that has the DBA role granted to it, then you have the requisite read/write privileges on any directory objects. I usually perform Data Pump jobs with a user that has the DBA granted to it (so that I don’t need to bother with granting access).

SECURITY ISSUES WITH THE OLD EXP UTILITY

The idea behind creating directory objects and then granting specific I/O access to the physical storage location is that you can more securely administer which users have the capability to generate read and write activities when normally they wouldn’t have permissions. With the legacy exp utility, any user that has access to the tool by default has access to write or read a file to which the owner (usually oracle) of the Oracle binaries has access. It’s conceivable that a malicious non-oracle OS user can attempt to run the exp utility to purposely overwrite a critical database file. For example, the following command can be run by any non-oracle OS user with execute access to the exp utility:

$ exp heera/foo file=/oradata04/SCRKDV12/users01.dbf

The exp process runs as the oracle OS user and therefore has read and write OS privileges on any oracle-owned data files. In this exp example, if the users01.dbf file is a live database data file, it’s overwritten and rendered worthless. This can cause catastrophic damage to your database.

To prevent such issues, with Oracle Data Pump you first have to create a database object directory that maps to a specific directory and then additionally assign read and write privileges to that directory per user. Thus, Data Pump doesn’t have the security problems that exist with the old exp utility.

When the directory object and grants are in place, you can use Data Pump to export information from a database. The simple example in this section shows how to export a table. Later posts in this article describe in detail the various ways in which you can export data. The point here is to work through an example that will provide a foundation for understanding more complex topics that follow.

As a non-SYS user, create a table, and populate it with some data:

SQL> create table inv(inv_id number);

SQL> insert into inv values (123);

Next, as a non-SYS user, export the table. This example uses the previously created directory, named DP_DIR. Data Pump uses the directory path specified by the directory object as the location on disk to which to write the dump file and log file:

$ expdp mv_maint/foo directory=dp_dir tables=inv dumpfile=exp.dmp logfile=exp.log

The expdp utility creates a file named exp.dmp in the /oradump directory, containing the information required to recreate the INV table and populate it with data as it was at the time the export was taken. Additionally, a log file named exp.log is created in the /oradump directory, containing logging information associated with this export job.

If you don’t specify a dump file name, Data Pump creates a file named expdat.dmp. If a file named expdat.dmp already exists in the directory, then Data Pump throws an error. If you don’t specify a log file name, then Data Pump creates one named export.log. If a log file named export.log already exists, then Data Pump overwrites it.

Image Tip  Although it’s possible to execute Data Pump as the SYS user, I don’t recommend it for couple of reasons. First, SYS is required to connect to the database with the AS SYSDBA clause. This requires a Data Pump parameter file with the USERID parameter and quotes around the associated connect string. This is unwieldy. Second, most tables owned by SYS cannot be exported (there are a few exceptions, such as AUD$). If you attempt to export a table owned by SYS, Data Pump will throw an ORA-39166 error and indicate that the table doesn’t exist. This is confusing.

One of the key reasons to export data is so that you can recreate database objects. You may want to do this as part of a backup strategy or to replicate data to a different database. Data Pump import uses an export dump file as its input and recreates database objects contained in the export file. The procedure for importing is similar to exporting:

  1. Create a database directory object that points to an OS directory that you want to read/write Data Pump files from.
  2. Grant read and write privileges on the directory object to the database user running the export or import.
  3. From the OS prompt, run the impdp command.

Steps 1 and 2 were covered in the prior section, “Taking an Export,” and therefore will not be repeated here.

Before running the import job, drop the INV table that was created previously.

SQL> drop table inv purge;

Next, recreate the INV table from the export taken:

$ impdp mv_maint/foo directory=dp_dir dumpfile=exp.dmp logfile=imp.log

You should now have the INV table recreated and populated with data as it was at the time of the export. Now is a good time to inspect again Figures 8-1 and 8-2. Make sure you understand which files were created by expdb and which files were used by impdp.

Instead of typing commands on the command line, in many situations it’s better to store the commands in a file and then reference the file when executing Data Pump export or import. Using parameter files makes tasks more repeatable and less prone to error. You can place the commands in a file once and then reference that file multiple times.

Additionally, some Data Pump commands (such as FLASHBACK_TIME) require the use of quotation marks; in these situations, it’s sometimes hard to predict how the OS will interpret these. Whenever a command requires quotation marks, it’s highly preferable to use a parameter file.

To use a parameter file, first create an OS text file that contains the commands you want to use to control the behavior of your job. This example uses the Linux/Unix vi command to create a text file named exp.par:

$ vi exp.par

Now, place the following commands in the exp.par file:

userid=mv_maint/foo

directory=dp_dir

dumpfile=exp.dmp

logfile=exp.log

tables=inv

reuse_dumpfiles=y

Next, the export operation references the parameter file via the PARFILE command line option:

$ expdp parfile=exp.par

Data Pump processes the parameters in the file as if they were typed on the command line. If you find yourself repeatedly typing the same commands or using commands that require quotation marks, or both, then consider using a parameter file to increase your efficiency.

Image Tip  Don’t confuse a Data Pump parameter file with the database initialization parameter file. A Data Pump parameter file instructs Data Pump as to which user to connect to the database as, which directory locations to read/write files to and from, what objects to include in the operation, and so on. In contrast, a database parameter file establishes characteristics of the instance upon database startup.

Вас заинтересует / Intresting for you:

How to use Data Pump Legacy Mo...
How to use Data Pump Legacy Mo... 801 views dbstalker Fri, 04 Feb 2022, 18:28:14
Data Pump: Common Data Pump Ta...
Data Pump: Common Data Pump Ta... 3414 views dbstalker Fri, 04 Feb 2022, 17:55:04
Data Pump: Filtering Oracle Da...
Data Pump: Filtering Oracle Da... 7230 views dbstalker Fri, 04 Feb 2022, 17:30:44
Data Pump: Features for Manipu...
Data Pump: Features for Manipu... 1393 views dbstalker Fri, 04 Feb 2022, 17:32:59
Comments (1)
This comment was minimized by the moderator on the site

Excellent introductory documentation on using the Oracle Data Pump utility!

1dz
There are no comments posted here yet
Leave your comments
Posting as Guest
×
Suggested Locations