Exporting BasicFile LOBs can be quite a pain. If your maintenance window is too short, here is a way to make the export faster.

The following applies to BasicFile LOBs:

select owner, table_name, column_name 
from   dba_lobs 
where  securefile='NO';

Background

The old BasicFile LOBs do not offer any parallel access methods. The new, improved SecureFile LOBs offer superior functionality, including full parallel access to the LOB. This is a good reason to migrate any old BasicFile LOBs in your database.

When Data Pump starts to export a table with a BasicFile LOB, only one worker will be assigned (due to the limited parallel access methods). If you have a large table, it will take a lot of time for that one worker to process the table.

Solution

Instead of one Data Pump job with one worker processing the table with the LOB, the idea is to start multiple Data Pump exports. Each export has one worker working on a dedicated part of the table.

Imagine you have a table with 100 rows. You start four concurrent Data Pump sessions:

Job	Rows
Data Pump job 1	1-25
Data Pump job 2	26-50
Data Pump job 3	51-75
Data Pump job 4	76-100

To do this, you need to use the Data Pump query parameter.

Generating the Predicates

Now, you need a way to split the table into chunks and generate a predicate for each of the concurrent Data Pump sessions. It is important that:

All rows are exported.
No rows are exported more than once.

Further, you should ensure that:

The rows are spread evenly across the Data Pump jobs.

ROWIDs

An elegant solution is to use modulus and the block number that holds the row. You can get the block number using rowid and dbms_rowid:

Job	Predicate
Data Pump job 1	`where mod(dbms_rowid.rowid_block_number(rowid), 4) = 0`
Data Pump job 2	`where mod(dbms_rowid.rowid_block_number(rowid), 4) = 1`
Data Pump job 3	`where mod(dbms_rowid.rowid_block_number(rowid), 4) = 2`
Data Pump job 4	`where mod(dbms_rowid.rowid_block_number(rowid), 4) = 3`

This might not be the fastest option, but it is a generic one that works without any knowledge of the table structure you are exporting.

Keys

You can also split the table on a primary/unique key or any other column that offers an even distribution of values:

Job	Predicate
Data Pump job 1	`where mod(--primary_key_column--, 4) = 0`
Data Pump job 2	`where mod(--primary_key_column--, 4) = 1`
Data Pump job 3	`where mod(--primary_key_column--, 4) = 2`
Data Pump job 4	`where mod(--primary_key_column--, 4) = 3`

More Workers

If you need more sessions working on the table, you can change the modulus operation and add more Data Pump jobs.

How to Export With ROWID

Here is the first of the Data Pump parameter files:

logfile=exp_lob_0.log
dumpfile=exp_lob_0.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 0"

The second one. Notice how the serial number changes in the logfile and dumpfile parameters. Also, the modulus operation changes:

logfile=exp_lob_1.log
dumpfile=exp_lob_1.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 1"

The third one:

logfile=exp_lob_2.log
dumpfile=exp_lob_2.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 2"

The fourth one:

logfile=exp_lob_3.log
dumpfile=exp_lob_3.dmp
tables=appuser.table_with_lob
query="where mod(dbms_rowid.rowid_block_number(rowid), 4) = 3"

Now, you can start four concurrent Data Pump jobs. You must execute the commands in different terminals:

expdp ... parfile=exp_lob_0.par
expdp ... parfile=exp_lob_1.par
expdp ... parfile=exp_lob_2.par
expdp ... parfile=exp_lob_3.par

How to Import

When importing, you change the LOB to SecureFile LOB, which allows parallel access. Then, take one Data Pump job at a time.

The first job does a regular schema import of the first dump file:
```
impdp ... \
   dumpfile=exp_lob_0.dmp \
   logfile=imp_lob_0.log
   transform=lob_storage:securefile \
   parallel=4
```
- It also creates the table itself.
- You should always convert the LOB to SecureFile LOB during import. Conversion happens on-the-fly and allows for parallel import of data. Even if the LOB originally was a BasicFile LOB.

Now, proceed with the other dump files in serial. Since you already converted the LOB to a SecureFile (previous step), you can now perform a Data Pump import using native parallelism:

impdp ... \
   dumpfile=exp_lob_1.dmp \
   logfile=imp_lob_1.log
   parallel=4 \
   table_exists_action=append
impdp ... \
   dumpfile=exp_lob_2.dmp \
   logfile=imp_lob_2.log
   parallel=4 \
   table_exists_action=append
impdp ... \
   dumpfile=exp_lob_3.dmp \
   logfile=imp_lob_3.log
   parallel=4 \	  
   table_exists_action=append

Notice how the dumpfile and logfile changes for each job.
Run the jobs in serial. Each job will use Data Pump parallelism.

That’s it!

Few Words of Caution

You must figure out how you create indexes – if they are present on the table. Ideally, you want to postpone index creation until the last job has loaded its rows. Otherwise, you’ll have expensive index maintenance happening when you load data.
Data Pump uses Advanced Queueing, which relies on the streams pool in the SGA. Be sure that streams_pool_size is set high enough when you start multiple data pump sessions. In most situations, setting it to 2G should be more than enough:
```
alter system set streams_pool_size=2G scope=memory;
```
Do this in a maintenance window only with no users connected.

9 thoughts on “Data Pump and Faster Export of BasicFile LOBs”

Here’s a slightly more esoteric option to consider which might squeeze some more performance out of the operation.

If we know the location of each extent for the source table, we could create DataPump query clauses so that each worker will scan a specific set of extents. Thus for (say) 4 workers, each will only need to scan 1/4 of the physical data on disk.

We can use DBA_EXTENTS to map out each extent’s file/starting_block pair, and use NTILE to carve that up into (in this case) 4 groups of extents in file/block order. We then take the low and high boundaries of each group to construct starting and terminating rowids. To generate the full rowid, we’ll first need the object ID

SQL> select data_object_id
2 from user_objects
3 where object_name = ‘T’;

DATA_OBJECT_ID
————–
144240

and then build our rowid pairs using the DBA_EXTENTS logic above

SQL> select
2 ‘rowid between ”’||
3 dbms_rowid.rowid_create(1,144240,start_file,start_block,0)||’ and ”’||
4 dbms_rowid.rowid_create(1,144240,end_file,end_block+ext_end_blks,10000)||”” query
5 from
6 (
7 select
8 nt,
9 trunc(ext_start/1e12) start_file,
10 mod(ext_start,1e12) start_block,
11 trunc(ext_end/1e12) end_file,
12 mod(ext_end,1e12) end_block,
13 ext_end_blks
14 from
15 (
16 select
17 nt,
18 min(pos) ext_start,
19 max(pos) ext_end,
20 max(blocks) keep (dense_rank last order by pos) ext_end_blks
21 from (
22 select pos, blocks, ntile(4) over ( order by pos) nt
23 from (
24 select file_id*1e12+block_id as pos, blocks
25 from dba_extents
26 where segment_name = ‘T’
27 and owner = user
28 )
29 )
30 group by nt
31 )
32 );

QUERY
———————————————————-
rowid between ‘AAAjNwAAMAAAACQAAA and ‘AAAjNwAAMAAAU6ACcQ’
rowid between ‘AAAjNwAAMAAAV8AAAA and ‘AAAjNwAAMAAAWyACcQ’
rowid between ‘AAAjNwAAMAAAWyAAAA and ‘AAAjNwAAMAAAgYACcQ’
rowid between ‘AAAjNwAAMAAAgoAAAA and ‘AAAjNwAAMAAAtkACcQ’

(You’d need a slight variant for a partitioned table – that is left as a reader exercise :-))

LikeLike

Daniel Overby Hansen says:

October 11, 2023 at 20:02

Wow, Connor!
You are so next level. That’s a very elegant and cool solution. Great for performance that each chunk scans only the dedicated part of the table. Wow.

Thanks for sharing that tip.
Daniel

LikeLike

Reply

Hi Daniel! Great trick, very helpful!!! Thanks a lot!

LikeLike

Daniel Overby Hansen says:

October 10, 2023 at 13:41

Hi Vadin,
You’re welcome. Happy migrating.
Daniel

LikeLike

Reply

Based on this script, I developed a script to do expdp with parallelism in standard editions.

LikeLike

Daniel Overby Hansen says:

October 11, 2023 at 19:58

Hi César,
That’s a quite clever idea. I like that. Kudos.
Regards,
Daniel

LikeLike

Reply

Daniel:
Sorry to bother you in this post. I like DB are fun very much. I want to re-direct all new posts to my personal permanent email. How can I do it? Or can I add my personal permanent email to your distribute list? Please tell me how?

Thanks for your help and consistent contribution to DBA all over the world.

Frank

LikeLike

Daniel Overby Hansen says:

October 11, 2023 at 19:57

Hi Frank,

Thanks for the nice feedback. Much appreciated. I’m glad you find my blog useful.

You can subscribe via e-mails. On the frontpage, simply input your e-mail in the upper right corner and you’ll get notified about new posts. If the box doesn’t show up on the frontpage, simply log out of wordpress (upper right corner) and you should see it.

If that doesn’t work, send me your e-mail (daniel.overby.hansen oracle.com) and I’ll add it manually.

Regards,
Daniel

LikeLike

Reply

Pingback: Using ROWID for efficient parallel execution – Learning is not a spectator sport

Background

Solution

Generating the Predicates

ROWIDs

Keys

More Workers

How to Export With ROWID

How to Import

Few Words of Caution

Share this:

9 thoughts on “Data Pump and Faster Export of BasicFile LOBs”

Leave a comment Cancel reply