SecureFile LOBs are the best LOB type, partly because it allows parallel access to and from the LOB. The parallel access allows Data Pump to unload and load data faster.
The following applies to SecureFile LOBs:
select owner, table_name, column_name
from dba_lobs
where securefile='YES';
There is another blog post for Data Pump and BasicFile LOBs.
How Data Pump Works
First of all, you must allow Data Pump to work in parallel. Only when so, Data Pump assigns one worker – and only one worker – per table data object. That is a table, partition or subpartition.
Suppose the table data object is big enough, that one worker uses parallel query (PQ) to unload the data. The PQ worker will hand over the data to the worker, who will write it to the dump file.
How much is big enough? By default, Data Pump examines the object statistics, and if the segment is larger than 250 MB, it will use parallel query. You can change this threshold using the parallel_threshold
parameter.
To export a SecureFile LOB in parallel:
- You must allow Data Pump to do parallel work using the
parallel
parameter. - The table data object must be of a certain size.
Data Pump determines the size by using either:
- Optimizer statistics (default) –
estimate=statistics
- Calculation –
estimate=blocks
Why doesn’t Data Pump use parallel query on all objects? Because it takes time to start a PQ worker and keep track of them. It is a waste on small table data objects and will often be slower. Parallel query makes sense only when a table data object reaches a certain size.
How the Database Stores LOBs
With LOBs you can potentially store a lot of data.
If a LOB (in a single row) is less than 4000 bytes, it is stored in-row together with the rest of the data. Having small LOBs as part of the table is efficient. However, if a LOB is larger than 4000 bytes, it is inefficient to store it in-row. Instead, the database stores the entire LOB in a separate LOB segment. This is called out-of-row.
In Oracle Database 23c, you can optionally store up to 8000 bytes in-row.
When Statistics Are Not Enough
Update, 6 May 2024: The below mentioned issue is solved in the 19.23.0 Data Pump Bundle Patch
If a LOB is stored in-row, the space it uses, is added to the total segment space of the table. However, if the LOB is stored out-of-row, the LOB space is added to the LOB segment space usage.
The table statistics (dba_tab_statistics) show only the blocks (i.e., space) used by the table. The size of the LOB segment does not affect the size of the table in the optimizer statistics. This is by design, because the purpose of statistics is to enable the optimizer to come up with good plans, and the size of the LOB segment is irrelevant in that situation.
In other words, you can have a table with 100 rows and 1 TB of LOB data. If all that LOB data is stored out-of-row, the table looks really small.
If Data Pump determines the size of the table based on the statistics, that table with 1 TB LOBs looks really small. As a result, Data Pump will not use parallel query on it.
This applies also to partitioned tables. In this case, Data Pump looks at the partition statistics instead of the table statistics.
Faster Exports
Bundle Patch
The patch to fix the above issue is included in the 19.23.0 Data Pump Bundle Patch.
Don’t Estimate By Statistics
You can change the way Data Pump determines the size of a table from statistics to a proper calculation:
expdp ... estimate=blocks
- Estimating by blocks is slower than using statistics. You will see that the startup phase in Data Pump takes longer.
- Due to a bug in Data Pump, this will not work unless your database is on 19.18.0 with the Data Pump bundle patch.
Fake Statistics
Since Data Pump only looks at the table data object statistics, you can fake the statistics. Trick Data Pump into believing the table itself is huge by setting statistics to high values:
begin
dbms_stats.set_table_stats (
ownname => 'APPUSER',
tabname => 'T1',
numrows => 10000000,
numblks => 1000000);
end;
/
- You must do this for all tables with big LOBs.
- It requires testing to get the best result.
- The fake statistics will influence the choices made by the optimizer, so only do this in your maintenance window. Also, note that setting statistics invalidates cursors in the library cache.
- Ensure that the statistics gathering job doesn’t overwrite your fake stats.
Use Partitioning
If you partition a table, Data Pump uses one worker per partition or subpartition. Data Pump might also use parallel query on a single partition/subpartition depending on the number of partitions and subpartitions and available parallel processes. But only if the partition statistics reflect that it is significant in size. Otherwise, you run into the same problem as described above.
An advantage of having multiple Data Pump workers on the same partitioned table, is that more workers can write to dump files simultaneously. Not only can Data Pump read faster from the database, it can also store it faster into dump files.
Final Words
Our developers solved the issue in the 19.23.0 Data Pump Bundle Patch. The best solution is of course to stay current and apply the Data Pump Bundle Patch.
Finally, as a reminder, always convert your LOBs to SecureFile LOBs on import:
impdp ... transform=lob_storage:securefiles