Using Data Pump Exports As Long-Term Storage

I recently helped a customer get the most out of Data Pump. Here’s how it all started.

For legal reasons, we must keep a backup of our data for five years. We want to use Data Pump exports. Do you have any recommendations?

Before someone flames me, I know that an export is not a backup, but we’re not talking about disaster recovery here.

Meme depicting Captain Picard saying an export is not a backup

Data Pump Export

Here are some of the Data Pump export parameters I would use.

Log File

logtime=all
metrics=yes
logfile=dataexport.log

Whenever you store data for the long term, I think it’s a good idea to store relevant metadata too. So, be sure to save the log file together with the dump files, and include diagnostic information using logtime and metrics.

Dump Files

dumpfile=dataexport_%T_%L.dmp
filesize=10000M
reuse_dumpfiles=yes

The dumpfile specification contains the %T wildcard, which translates into YYYYMMDD at runtime. It’s handy to have the date in the file names. Using %L allows the creation of multiple dump files.

I recommend breaking the dump files into smaller pieces using filesize. Smaller files are easier to handle and transport, and avoid some issues I’ve seen with object storage and huge files.

Content

schemas=myapp
tables=mytab
exclude=statistics

I would advise against doing a full export. It contains far too much information that you won’t need, like tablespace definitions, SQL management objects, and audit trail. A schema or table export is more suitable.

Be sure to exclude statistics. They take time to export and have no value. To narrow down the objects you export, you can also use the include parameter.

The parameters include and exclude are mutually exclusive in Oracle Database 19c. That’s no longer the case in later releases.

Compression

compression=all
compression_algorithm=high

Use compression to make the dump files smaller. I think it’s worth spending a few more CPU cycles on export and compressing as much as possible to reduce the dump file size, so use the high algorithm.

Creating a compressed export requires the Advanced Compression Option. However, importing a compressed backup does not require the same option. Always check the license guide for up-to-date information.

If you don’t have a license for Advanced Compression Option, you can compress the dump file using OS utilities. Unless you also encrypt the dump file. Compressing an encrypted dump file gives no benefit.

Integrity

data_options=verify_stream_format
checksum=yes

Since you are storing data for long-term storage, it’s good to double-check that there’s no corruption in the data. Data Pump writes into the dump file in a streaming manner, and you can check it along the way using data_options.

I’d also recommend generating a checksum and storing that in the dump file using the checksum parameter.

You can periodically check your dump for corruption by using impdp ... verify_only=yes. It will re-calculate the dump file checksum and match that against what is stored in the dump file.

Note that checksum is a feature of Oracle Database 21c.

Usually, using the default checksum_algorithm is fine for checking data integrity. If you want to guard against someone tampering with the dump file, it’s better to use checksum_algorithm=sha512. But for proper protection of your data, use encryption.

Encryption

encryption=all
encryption_mode=password

You can encrypt the data in the dump using the encryption parameter.

I prefer using a separate passphrase to protect the dump file, instead of using the database master encryption key. If you use the latter, you also need to have a backup of the database keystore.

Using Data Pump encryption requires a license for Advanced Security Option. Always check the license guide for up-to-date information.

Miscellaneous

flashback_scn=systimestamp
parallel=<n>

Remember to make a consistent export using flashback_scn. Alternatively, export from a quiesced database or a snapshot standby.

Use whatever parallel degree that you can afford. Normally, on x86-64 architectures, the optimal setting is twice the number of physical cores.

Using OCI Object Storage

The customer also stated:

We plan on storing the dumps in OCI object storage.

To reduce the storage costs, use either:

Your Opinion

How would you store data for the long term? Do you have any experiences with the above Data Pump parameters?

Drop a comment and let me know.

Happy exporting!

Copy Data Pump Files Before the End of the Export

In Oracle Database 23ai, you can copy the dump files even before the export completes.

This saves time during your migration because you can start moving files to the target host while the export is in progress. Additionally, it potentially saves disk space because you can move the files away from the source host.

Which Files Can You Move

  1. After starting the Data Pump export, you must connect to the database using:

    • The same user who started the export
    • A user with DATAPUMP_EXP_FULL_DATABASE role
  2. Replace the job owner and name, and execute:

    set serverout on
    declare
       l_job_owner    varchar2(30) := 'DPUSER';
       l_job_name     varchar2(30) := 'SYS_EXPORT_FULL_01';
       l_handle       number;
       l_stsmask      integer := dbms_datapump.ku$_status_job_status;
       l_job_state    varchar2(30);
       l_status       ku$_status;
       l_dump_file    ku$_dumpfile;
    begin
       l_handle := dbms_datapump.attach(l_job_name, l_job_owner);
       dbms_datapump.get_status(l_handle, l_stsmask, NULL, l_job_state, l_status);
       dbms_datapump.detach(l_handle);
       
       for i in l_status.job_status.files.first..l_status.job_status.files.last() loop
          
          l_dump_file := l_status.job_status.files(i);
    
          if l_dump_file.file_type = dbms_datapump.ku$_dumpfile_type_template then
            continue;
          end if;
    
          if (l_dump_file.file_bytes_written = l_dump_file.file_size) then
             dbms_output.put_line('DONE: ' || l_dump_file.file_name);
          end if;
    
        end loop; 
    end;
    /
    
    
  3. The code lists all the files that Data Pump has marked as completed. Data Pump no longer writes to those files, and you can safely start moving them to the target system.

Prerequisites

  • You must use multiple dump files. You do that by including %L in your DUMPFILE specification.
    DUMPFILE=exp%L.dmp
    
  • You must specify a file size that allows Data Pump to rotate into multiple files when they are full. I suggest using 5G, which is also a good setting for most cloud migrations.
    FILESIZE=5G
    

What About

  • The code examines the max file size and bytes written to determine if the file is full. This is not the same as the physical size of the file in the operating system. You can’t use the file size information from the file system.

  • What about using rsync or similar tools? I guess that could work; however, we didn’t test that.

  • You could query the Data Pump control table for the information:

    SELECT file_name
    FROM <control_table>
    WHERE process_order = -21
    AND file_max_size = completed_bytes
    AND file_max_size <> 0
    ORDER BY file_name;
    
    • However, querying the control table is not a supported approach. Use the above PL/SQL.

Happy exporting!

Oracle Data Pump and Compression – Also Without a License

Whenever you use Data Pump to export from Oracle Database, you should use compression. It’s conveniently built into Data Pump.

Pros:

  • The dump file is much smaller:
    • Less disk space is needed.
    • Easier to transfer over the network.
  • Often it is faster to use compression when you measure the entire workflow (export, transfer, and import).
  • Imports are often faster because less data needs to be written from disk.

Cons:

How Do I Enable Data Pump Compression

You simply set COMPRESSION option:

$ expdp ... compression=all

You use COMPRESSION option only for exports. When you import, Data Pump handles it automatically.

You only need a license for Advanced Compression Option when you use compression during export. You don’t need a license to import a compressed dump file.

Medium Is a Good Compression Algorithm

I recommend you use the medium compression algorithm:

$ expdp ... compression=all compression_algorithm=medium

Our experience and tests show that it best balances between compression ratio and CPU.

Here are the results of a test my team did:

Algorithm File Size (MB) Compression Ratio Elapsed Time
NONE 5.800 1,0 2m 33s
BASIC 705 8,2 3m 03s
LOW 870 6,6 3m 11s
MEDIUM 701 8,2 3m 01s
HIGH 509 11,3 12m 16s

I would recommend high algorithm only if you need to transfer over a really slow network.

But I Don’t Have a License

gzip

You can still compress the dump file but not using Data Pump. Use OS utilities. In this case, I recommend splitting the dump file into pieces. It is easier to handle, and you can start transferring the dump files as they are compressed:

$ expdp ... filesize=5G dumpfile=myexp%L.dmp
$ gzip -r /u01/app/oracle/dpdir

Now, you transfer the files, uncompress and import:

[target]$ gunzip -r /u01/app/oracle/dpdir
[target]$ impdp ...

rsync

Another option is to use rsync. It has the option to compress the dump file over the network only:

$ expdp ... filesize=5G dumpfile=myexp%L.dmp
$ rsync -z ...

Cheatsheet

If you have the proper license, use Data Pump compression during export:

$ expdp ... compression=all compression_algorithm=medium

If you don’t have a license, compress the dump file over the wire only:

$ rsync -z ....

Don’t combine Data Pump compression and gzip/rsync! Compressing compressed stuff is not a good idea.