Exporting BasicFile LOBs can be quite a pain. If your maintenance window is too short, here is a way to make the export faster.
The following applies to BasicFile LOBs:
select owner, table_name, column_name
from dba_lobs
where securefile='NO';
Background
The old BasicFile LOBs do not offer any parallel access methods. The new, improved SecureFile LOBs offer superior functionality, including full parallel access to the LOB. This is a good reason to migrate any old BasicFile LOBs in your database.
When Data Pump starts to export a table with a BasicFile LOB, only one worker will be assigned (due to the limited parallel access methods). If you have a large table, it will take a lot of time for that one worker to process the table.
Solution
Instead of one Data Pump job with one worker processing the table with the LOB, the idea is to start multiple Data Pump exports. Each export has one worker working on a dedicated part of the table.
Imagine you have a table with 100 rows. You start four concurrent Data Pump sessions:
| Job | Rows |
|---|---|
| Data Pump job 1 | 1-25 |
| Data Pump job 2 | 26-50 |
| Data Pump job 3 | 51-75 |
| Data Pump job 4 | 76-100 |
To do this, you need to use the Data Pump query parameter.
Generating the Predicates
Now, you need a way to split the table into chunks and generate a predicate for each of the concurrent Data Pump sessions. It is important that:
- All rows are exported.
- No rows are exported more than once.
Further, you should ensure that:
- The rows are spread evenly across the Data Pump jobs.
ROWIDs
An elegant solution is to use modulus and the block number that holds the row. You can get the block number using rowid and dbms_rowid:
| Job | Predicate |
|---|---|
| Data Pump job 1 | where mod(dbms_rowid.rowid_block_number(rowid), 4) = 0 |
| Data Pump job 2 | where mod(dbms_rowid.rowid_block_number(rowid), 4) = 1 |
| Data Pump job 3 | where mod(dbms_rowid.rowid_block_number(rowid), 4) = 2 |
| Data Pump job 4 | where mod(dbms_rowid.rowid_block_number(rowid), 4) = 3 |
This might not be the fastest option, but it is a generic one that works without any knowledge of the table structure you are exporting.
Keys
You can also split the table on a primary/unique key or any other column that offers an even distribution of values:
| Job | Predicate |
|---|---|
| Data Pump job 1 | where mod(--primary_key_column--, 4) = 0 |
| Data Pump job 2 | where mod(--primary_key_column--, 4) = 1 |
| Data Pump job 3 | where mod(--primary_key_column--, 4) = 2 |
| Data Pump job 4 | where mod(--primary_key_column--, 4) = 3 |
More Workers
If you need more sessions working on the table, you can change the modulus operation and add more Data Pump jobs.
How to Export With ROWID
Here is the first of the Data Pump parameter files:
logfile=exp_lob_0.log
dumpfile=exp_lob_0.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 0"
The second one. Notice how the serial number changes in the logfile and dumpfile parameters. Also, the modulus operation changes:
logfile=exp_lob_1.log
dumpfile=exp_lob_1.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 1"
The third one:
logfile=exp_lob_2.log
dumpfile=exp_lob_2.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 2"
The fourth one:
logfile=exp_lob_3.log
dumpfile=exp_lob_3.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 3"
Now, you can start four concurrent Data Pump jobs. You must execute the commands in different terminals:
expdp ... parfile=exp_lob_0.par
expdp ... parfile=exp_lob_1.par
expdp ... parfile=exp_lob_2.par
expdp ... parfile=exp_lob_3.par
How to Import
When importing, you change the LOB to SecureFile LOB, which allows parallel access. Then, take one Data Pump job at a time.
-
The first job does a regular schema import of the first dump file:
impdp ... \ dumpfile=exp_lob_0.dmp \ logfile=imp_lob_0.log transform=lob_storage:securefile \ parallel=4- It also creates the table itself.
- You should always convert the LOB to SecureFile LOB during import. Conversion happens on-the-fly and allows for parallel import of data. Even if the LOB originally was a BasicFile LOB.
-
Now, proceed with the other dump files in serial. Since you already converted the LOB to a SecureFile (previous step), you can now perform a Data Pump import using native parallelism:
impdp ... \ dumpfile=exp_lob_1.dmp \ logfile=imp_lob_1.log parallel=4 \ table_exists_action=append impdp ... \ dumpfile=exp_lob_2.dmp \ logfile=imp_lob_2.log parallel=4 \ table_exists_action=append impdp ... \ dumpfile=exp_lob_3.dmp \ logfile=imp_lob_3.log parallel=4 \ table_exists_action=append- Notice how the
dumpfileandlogfilechanges for each job. - Run the jobs in serial. Each job will use Data Pump parallelism.
- Notice how the
That’s it!
Few Words of Caution
-
You must figure out how you create indexes – if they are present on the table. Ideally, you want to postpone index creation until the last job has loaded its rows. Otherwise, you’ll have expensive index maintenance happening when you load data.
-
Data Pump uses Advanced Queueing, which relies on the streams pool in the SGA. Be sure that
streams_pool_sizeis set high enough when you start multiple data pump sessions. In most situations, setting it to 2G should be more than enough:alter system set streams_pool_size=2G scope=memory; -
Do this in a maintenance window only with no users connected.
Here’s a slightly more esoteric option to consider which might squeeze some more performance out of the operation.
If we know the location of each extent for the source table, we could create DataPump query clauses so that each worker will scan a specific set of extents. Thus for (say) 4 workers, each will only need to scan 1/4 of the physical data on disk.
We can use DBA_EXTENTS to map out each extent’s file/starting_block pair, and use NTILE to carve that up into (in this case) 4 groups of extents in file/block order. We then take the low and high boundaries of each group to construct starting and terminating rowids. To generate the full rowid, we’ll first need the object ID
SQL> select data_object_id
2 from user_objects
3 where object_name = ‘T’;
DATA_OBJECT_ID
————–
144240
and then build our rowid pairs using the DBA_EXTENTS logic above
SQL> select
2 ‘rowid between ”’||
3 dbms_rowid.rowid_create(1,144240,start_file,start_block,0)||’ and ”’||
4 dbms_rowid.rowid_create(1,144240,end_file,end_block+ext_end_blks,10000)||”” query
5 from
6 (
7 select
8 nt,
9 trunc(ext_start/1e12) start_file,
10 mod(ext_start,1e12) start_block,
11 trunc(ext_end/1e12) end_file,
12 mod(ext_end,1e12) end_block,
13 ext_end_blks
14 from
15 (
16 select
17 nt,
18 min(pos) ext_start,
19 max(pos) ext_end,
20 max(blocks) keep (dense_rank last order by pos) ext_end_blks
21 from (
22 select pos, blocks, ntile(4) over ( order by pos) nt
23 from (
24 select file_id*1e12+block_id as pos, blocks
25 from dba_extents
26 where segment_name = ‘T’
27 and owner = user
28 )
29 )
30 group by nt
31 )
32 );
QUERY
———————————————————-
rowid between ‘AAAjNwAAMAAAACQAAA and ‘AAAjNwAAMAAAU6ACcQ’
rowid between ‘AAAjNwAAMAAAV8AAAA and ‘AAAjNwAAMAAAWyACcQ’
rowid between ‘AAAjNwAAMAAAWyAAAA and ‘AAAjNwAAMAAAgYACcQ’
rowid between ‘AAAjNwAAMAAAgoAAAA and ‘AAAjNwAAMAAAtkACcQ’
(You’d need a slight variant for a partitioned table – that is left as a reader exercise :-))
LikeLike
Wow, Connor!
You are so next level. That’s a very elegant and cool solution. Great for performance that each chunk scans only the dedicated part of the table. Wow.
Thanks for sharing that tip.
Daniel
LikeLike
Hi Daniel! Great trick, very helpful!!! Thanks a lot!
LikeLike
Hi Vadin,
You’re welcome. Happy migrating.
Daniel
LikeLike
Based on this script, I developed a script to do expdp with parallelism in standard editions.
LikeLike
Hi César,
That’s a quite clever idea. I like that. Kudos.
Regards,
Daniel
LikeLike
Daniel:
Sorry to bother you in this post. I like DB are fun very much. I want to re-direct all new posts to my personal permanent email. How can I do it? Or can I add my personal permanent email to your distribute list? Please tell me how?
Thanks for your help and consistent contribution to DBA all over the world.
Frank
LikeLike
Hi Frank,
Thanks for the nice feedback. Much appreciated. I’m glad you find my blog useful.
You can subscribe via e-mails. On the frontpage, simply input your e-mail in the upper right corner and you’ll get notified about new posts. If the box doesn’t show up on the frontpage, simply log out of wordpress (upper right corner) and you should see it.
If that doesn’t work, send me your e-mail (daniel.overby.hansen oracle.com) and I’ll add it manually.
Regards,
Daniel
LikeLike