Can I Use Data Pump for Incremental Backups of a Table?

I received this question from a customer:

We take backups of a critical table using Data Pump. The table is very large, so the export takes very long time. How can we speed up the process by doing incremental exports in Data Pump?

My short answer is:

There’s no native functionality in Data Pump that can perform an incremental export of a table.

What would you expect of an incremental export?

  • Does it require an immutable table?
  • Does it include only the new rows?
  • Does it also include the updated rows?
  • What do you want to do about deleted rows?

At first glance, it might sound like a trivial requirement, but as always, the devil is in the details.

Speed Up The Export

The customer has a working solution today, but it is too slow. Let’s see if we can speed up the export.

  • Use the parallel parameter to allow Data Pump to export faster. What’s the best setting for the fastest export? Generally speaking:

    • On-prem, x86-64: 2 x physical cores
    • On-prem, other architectures: It depends :)
    • OCI, OCPU: Number of OCPUs
    • OCI; ECPU: Number of ECPUs / 4

    But make your own tests to see what works best for you.

  • Use multiple dump files by specifying the %L wildcard on the dumpfile parameter. Each Data Pump worker must have exclusive access to a dump file for optimal write speed. Just use the %L wildcard and Data Pump creates as many files as needed.

  • Use compression if your license allows. Compressing the data will burn CPU, but it will significantly reduce the amount of data that has to be written to the dump file. I’ve seen numerous cases where compressing an export speeds it up significantly.

  • If the table is partitioned, Data Pump can use multiple workers on the same table. All workers can use parallel query to extract data – so parallel with parallel. The performance gain you can get from partitioning the table might be marginal. But give it a try.

  • If you have a standby database, you can temporarily convert it into a snapshot standby database and perform the export there. That will allow you to use all available resources without affecting the primary database.

Our Table Is Immutable, We Just Want The New Rows

Here are some ideas if your table is immutable, i.e., you just insert new rows.

A Flag On The Table

You can add a new column, IS_NEW, and set the default value to 1. You can use that to select just the new rows:

  1. Lock the table:
    lock table sales in exclusive mode;
    
  2. Export the table using the query parameter to select only the new records:
    expdp ... tables=sales query="is_new=1";
    
  3. Update the rows. The commit releases the lock:
    update sales set is_new=0 where is_new=1;
    commit;
    

You must lock the table to avoid any issues concurrency issues. Without locking the table, you might end up in a situation where an uncommitted insert is not part of the export, but gets its flag updated by the post-export update.

The drawbacks:

  • Depending on the nature of the table, you effectively have downtime while you export.
  • You also have to add the new column, which might not be allowed.
  • Each export now generate redo during the update.
  • You have to design your table carefully so the flag won’t cause issues with row migration or prevent trailing null optimization.

Insert Timestamp

You could also add a column, insert_at, where the default value is SYSTIMESTAMP. This solution is similar to the one with a flag, but a timestamp can help avoid issues with uncommitted transactions.

When you export the rows, you export from present time minus two hours. Why two hours before? That is to ensure that there are no uncommitted transactions that won’t be part of the export. If there are transactions running longer than two hours, there is a risk of missing those records, leading to logical corruption. So, you could make it even longer – perhaps 24 hours before. But then you don’t get the most up-to-date data when you export.

This solution could avoid the issues of locking and redo generation, but it still requires adding a column to the table.

Plus, it doesn’t prevent anyone from updating the insert_at column, which can destroy your solution and lead to logical corruption.

Monotonically Increasing Or Decreasing Immutable Key

A Sequence

Instead of a flag, you can use a monotonically increasing or decreasing immutable key. That is a column in the table that is constantly increasing (or decreasing). So, something like a column using sequence-generated value. Can you use that as offset for your exports?

Sequences come with their own problems:

  1. You have to use sequences with the ORDER flag to avoid issues in Oracle RAC databases, but that can be a performance killer.
  2. You have to use NOCYCLE sequences, which is the default and probably won’t be an issue.
  3. If you use Application Continuity, you would also need to use the KEEP flag to avoid issues during a transaction replay.
  4. There is no way that the database can guarantee that the rows are inserted into the database in the order of the sequence. That would require that each session would commit in the order they got the sequence number. This would be a concurreny killer.

A sequence-generated key might sound like a good idea, but when you look in the details and take concurrency into consideration, it should be evident that a sequence is not a good idea.

A Word About ORA_ROWSCN Pseudocolumn

What’s ORA_ROWSCN?

ROWSCN reflects the system change-number (SCN) of the most recent change to a row.

It’s a pseudocolumn and part of every table. OK, sounds interesting. Can we use that?

Well, the documentation states some pretty important caveats that you must take into consideration if you build something using ORA_ROWSCN.

Further, ORA_ROWSCN can only say something has changed in this row. It can’t distinguish between a new and a modified row. Plus, it won’t give you deleted rows. So, is this of any benefit at all?

Checking Table At Point In Time

If you want to be able to see how a table looked at a specific time, there are some good options.

RMAN And Recover Table

RMAN has the ability to recover a table. It will recover the table to an auxiliary database, and then perform a Data Pump export of just that table. You can import the data and check how it looked at that specific time.

  1. Put your table into a dedicated tablespace.
  2. Back up the relevant tablespace together with SYSTEM and SYSAUX tablespaces (check separate requirements for a table in a PDB).
  3. Use RMAN functionality to keep that backup for as long as required.
  4. Now, your data is safe, and, if needed, you can restore it to any point in time that your backups allow.

Flashback Time Travel

You can also use Flashback Time Travel (it used to be called Flashback Data Archive).

The database tracks all changes to a table and allows you to query the table using Flashback Query at any previous point in time.

However, this requires that you add Flashback Archives to your database. It requires space, and there is a slight overhead.

Replicating Data

If you want to replicate your data to another database, for instance, a data warehouse, there are some good options.

What About Materialized Views?

The simplest approach is to use materialized views.

You can add materialized view logs to the production database, which enables it to track the changes using materialized view logs. In a remote database, you can incrementally apply those changes (a fast refresh) and have a complete and up-to-date version of the same table. Optionally, instead of replicating the entire table, you can replicate the outcome of a query, a view, even with joins.

What About GoldenGate

The ultimate replication technique is Oracle GoldenGate. It is an extensive product designed specifically for data replication. It will mine the redo to find all changes to your table and replicate them to another database. You can even offload the mining process to another database so your production database remains completely unaffected.

When I mention Oracle GoldenGate, I often hear:

Yeah, but it’s complex and expensive.

Perhaps. However, it is complex and expensive because data replication is a really big challenge. There are so many pitfalls and things to consider. Besides, if you’re in OCI, the price model for OCI GoldenGate is completely different and probably much more attractive.

If you try to make a do-it-yourself replication, you’re in for a big challenge, and the odds are against you.

Conclusion

Without knowing exactly which requirement an incremental export must satisfy, it’s impossible to come up with a solution. In this specific case, I would need more details to find a viable solution.

But I bet the solution I would come up with, would use some of the above features, but not Data Pump. Trying to build incremental backups with Data Pump is a very dangerouos path to tread. It is simply too hard and there are so many pitfalls, especially related to concurrency, to take into consideration.

Do you have a similar requirement? How did you solve the problem? Leave a comment below.

XTTS: Backup on Standby Database

When doing a cross-platform migration with transportable tablespaces and incremental backup, is it possible to perform backups on a standby database? If so, you could offload the work from a primary database.

The short answer is yes. You can use a physical standby database but not a snapshot standby database.

Using a standby database for the backups is an advantage in some situations:

  • The primary database does not have the capacity to perform the extra backups (CPU, disk space, or I/O).
  • You want the primary database to be completely unaffected by the migration.

How To

You follow the regular procedure described in the MOS note V4 Reduce Transportable Tablespace Downtime using Cross Platform Incremental Backup (Doc ID 2471245.1) with a few changes. The changes are described in Using XTTs in a Data Guard Environment.

Be Careful

You must have an Active Data Guard license, if you:

  • Open a physical standby database and enable redo apply.
  • Enable block change tracking on the standby database (for faster incremental backups).

Procedure

The primary database is called SRCPRI and runs on the host src-pri. The standby is called SRCSTDBY and runs on the host src-stdby.

  • Ensure allowstandby=1 in xtt.properties.

  • Whenever you need to perform a backup on the physical standby database – level 0 or incremental:

    • Cancel redo apply and open the physical standby database:
    SRCSTDBY SQL> alter database recover managed standby database cancel;
    SRCSTDBY SQL> alter database open;
    
    • Perform the backup
    [oracle@src-stdby]$ $ORACLE_HOME/perl/bin/perl xttdriver.pl --backup
    
    • Bring the standby database back in MOUNT mode and re-enable redo apply:
    SRCSTDBY SQL> shutdown immediate
    SRCSTDBY SQL> startup mount
    SRCSTDBY SQL> alter database recover managed standby database disconnect from session;
    
  • When it is time to perform the final incremental backup:

    • Set tablespaces in READ ONLY mode on the primary database:
    SRCPRI SQL> alter tablespace ... read only;
    
    • Archive current log and ensure it is applied on the standby database:
    SRCPRI SQL> alter system archive log current;
    
    • When you have confirmed the redo is applied on the standby database, cancel redo apply and open it:
    SRCSTDBY SQL> alter database recover managed standby database cancel;
    SRCSTDBY SQL> alter database open;
    
    • Verify tablespaces are READ ONLY:
    SRCSTDBY SQL> select tablespace_name, status from dba_tablespaces;
    
    • Perform the final incremental backup
    [oracle@src-stdby]$ $ORACLE_HOME/perl/bin/perl xttdriver.pl --backup
    
    • Perform the Data Pump export on the primary database:
    [oracle@src-pri]$ expdp system ... full=y transportable=always ...
    

Active Data Guard

If you have a license for Active Data Guard, you can simply keep the physical standby database in OPEN WITH REDO APPLY mode. You don’t need to switch from OPEN mode to MOUNT mode.

Conclusion

Being able to perform the backups on a standby database is a huge advantage. But it makes the process slightly more complicated, so I would recommend it only if really needed. In addition, if you don’t follow the procedure strictly, you might accidentally use a feature that requires Active Data Guard.

The MOS note V4 Reduce Transportable Tablespace Downtime using Cross Platform Incremental Backup (Doc ID 2471245.1) has some additional details about using a standby database. You can get those details by opening a Service Request and asking for it.

Other Blog Posts in This Series