Can I Use Data Pump for Incremental Backups of a Table?

I received this question from a customer:

We take backups of a critical table using Data Pump. The table is very large, so the export takes very long time. How can we speed up the process by doing incremental exports in Data Pump?

My short answer is:

There’s no native functionality in Data Pump that can perform an incremental export of a table.

What would you expect of an incremental export?

  • Does it require an immutable table?
  • Does it include only the new rows?
  • Does it also include the updated rows?
  • What do you want to do about deleted rows?

At first glance, it might sound like a trivial requirement, but as always, the devil is in the details.

Speed Up The Export

The customer has a working solution today, but it is too slow. Let’s see if we can speed up the export.

  • Use the parallel parameter to allow Data Pump to export faster. What’s the best setting for the fastest export? Generally speaking:

    • On-prem, x86-64: 2 x physical cores
    • On-prem, other architectures: It depends :)
    • OCI, OCPU: Number of OCPUs
    • OCI; ECPU: Number of ECPUs / 4

    But make your own tests to see what works best for you.

  • Use multiple dump files by specifying the %L wildcard on the dumpfile parameter. Each Data Pump worker must have exclusive access to a dump file for optimal write speed. Just use the %L wildcard and Data Pump creates as many files as needed.

  • Use compression if your license allows. Compressing the data will burn CPU, but it will significantly reduce the amount of data that has to be written to the dump file. I’ve seen numerous cases where compressing an export speeds it up significantly.

  • If the table is partitioned, Data Pump can use multiple workers on the same table. All workers can use parallel query to extract data – so parallel with parallel. The performance gain you can get from partitioning the table might be marginal. But give it a try.

  • If you have a standby database, you can temporarily convert it into a snapshot standby database and perform the export there. That will allow you to use all available resources without affecting the primary database.

Our Table Is Immutable, We Just Want The New Rows

Here are some ideas if your table is immutable, i.e., you just insert new rows.

A Flag On The Table

You can add a new column, IS_NEW, and set the default value to 1. You can use that to select just the new rows:

  1. Lock the table:
    lock table sales in exclusive mode;
    
  2. Export the table using the query parameter to select only the new records:
    expdp ... tables=sales query="is_new=1";
    
  3. Update the rows. The commit releases the lock:
    update sales set is_new=0 where is_new=1;
    commit;
    

You must lock the table to avoid any issues concurrency issues. Without locking the table, you might end up in a situation where an uncommitted insert is not part of the export, but gets its flag updated by the post-export update.

The drawbacks:

  • Depending on the nature of the table, you effectively have downtime while you export.
  • You also have to add the new column, which might not be allowed.
  • Each export now generate redo during the update.
  • You have to design your table carefully so the flag won’t cause issues with row migration or prevent trailing null optimization.

Insert Timestamp

You could also add a column, insert_at, where the default value is SYSTIMESTAMP. This solution is similar to the one with a flag, but a timestamp can help avoid issues with uncommitted transactions.

When you export the rows, you export from present time minus two hours. Why two hours before? That is to ensure that there are no uncommitted transactions that won’t be part of the export. If there are transactions running longer than two hours, there is a risk of missing those records, leading to logical corruption. So, you could make it even longer – perhaps 24 hours before. But then you don’t get the most up-to-date data when you export.

This solution could avoid the issues of locking and redo generation, but it still requires adding a column to the table.

Plus, it doesn’t prevent anyone from updating the insert_at column, which can destroy your solution and lead to logical corruption.

Monotonically Increasing Or Decreasing Immutable Key

A Sequence

Instead of a flag, you can use a monotonically increasing or decreasing immutable key. That is a column in the table that is constantly increasing (or decreasing). So, something like a column using sequence-generated value. Can you use that as offset for your exports?

Sequences come with their own problems:

  1. You have to use sequences with the ORDER flag to avoid issues in Oracle RAC databases, but that can be a performance killer.
  2. You have to use NOCYCLE sequences, which is the default and probably won’t be an issue.
  3. If you use Application Continuity, you would also need to use the KEEP flag to avoid issues during a transaction replay.
  4. There is no way that the database can guarantee that the rows are inserted into the database in the order of the sequence. That would require that each session would commit in the order they got the sequence number. This would be a concurreny killer.

A sequence-generated key might sound like a good idea, but when you look in the details and take concurrency into consideration, it should be evident that a sequence is not a good idea.

A Word About ORA_ROWSCN Pseudocolumn

What’s ORA_ROWSCN?

ROWSCN reflects the system change-number (SCN) of the most recent change to a row.

It’s a pseudocolumn and part of every table. OK, sounds interesting. Can we use that?

Well, the documentation states some pretty important caveats that you must take into consideration if you build something using ORA_ROWSCN.

Further, ORA_ROWSCN can only say something has changed in this row. It can’t distinguish between a new and a modified row. Plus, it won’t give you deleted rows. So, is this of any benefit at all?

Checking Table At Point In Time

If you want to be able to see how a table looked at a specific time, there are some good options.

RMAN And Recover Table

RMAN has the ability to recover a table. It will recover the table to an auxiliary database, and then perform a Data Pump export of just that table. You can import the data and check how it looked at that specific time.

  1. Put your table into a dedicated tablespace.
  2. Back up the relevant tablespace together with SYSTEM and SYSAUX tablespaces (check separate requirements for a table in a PDB).
  3. Use RMAN functionality to keep that backup for as long as required.
  4. Now, your data is safe, and, if needed, you can restore it to any point in time that your backups allow.

Flashback Time Travel

You can also use Flashback Time Travel (it used to be called Flashback Data Archive).

The database tracks all changes to a table and allows you to query the table using Flashback Query at any previous point in time.

However, this requires that you add Flashback Archives to your database. It requires space, and there is a slight overhead.

Replicating Data

If you want to replicate your data to another database, for instance, a data warehouse, there are some good options.

What About Materialized Views?

The simplest approach is to use materialized views.

You can add materialized view logs to the production database, which enables it to track the changes using materialized view logs. In a remote database, you can incrementally apply those changes (a fast refresh) and have a complete and up-to-date version of the same table. Optionally, instead of replicating the entire table, you can replicate the outcome of a query, a view, even with joins.

What About GoldenGate

The ultimate replication technique is Oracle GoldenGate. It is an extensive product designed specifically for data replication. It will mine the redo to find all changes to your table and replicate them to another database. You can even offload the mining process to another database so your production database remains completely unaffected.

When I mention Oracle GoldenGate, I often hear:

Yeah, but it’s complex and expensive.

Perhaps. However, it is complex and expensive because data replication is a really big challenge. There are so many pitfalls and things to consider. Besides, if you’re in OCI, the price model for OCI GoldenGate is completely different and probably much more attractive.

If you try to make a do-it-yourself replication, you’re in for a big challenge, and the odds are against you.

Conclusion

Without knowing exactly which requirement an incremental export must satisfy, it’s impossible to come up with a solution. In this specific case, I would need more details to find a viable solution.

But I bet the solution I would come up with, would use some of the above features, but not Data Pump. Trying to build incremental backups with Data Pump is a very dangerouos path to tread. It is simply too hard and there are so many pitfalls, especially related to concurrency, to take into consideration.

Do you have a similar requirement? How did you solve the problem? Leave a comment below.

2 thoughts on “Can I Use Data Pump for Incremental Backups of a Table?

  1. The MV solution is something I have used for multiple clients for similar requirement. It is simple and fast, most of the time ,depending on the number of changes of course. Optionally you can do a full refresh if the fast refresh has failed or is taking too much time.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.