Short answer: If the data was validated originally, it must be validated (again) during import. If you are short on time, you can add the constraints as NOT VALIDATED instead. But be aware of the drawbacks.
The Problem
I got a question from a colleague importing customer data as part of a migration. Data Pump was spending a lot of time in TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT.
We attached to the Data Pump session and used the status
command:
$ impdp attach=<job_name>
Import> status
...
Worker 1 Status:
...
State: EXECUTING
Object Schema: SH
Object Name: C_CUST2_EMAIL
Object Type: TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Worker Parallelism: 1
In another session, we ran a new Data Pump import, but this time into a SQL file so we could examine the metadata for the object C_CUST2_EMAIL:
$ impdp parfile=<par_file_name> sqlfile=ddl.sql
$ cat ddl.sql | grep C_CUST2_EMAIL
ALTER TABLE "SH"."CUSTOMERS2" ADD CONSTRAINT "C_CUST2_EMAIL" CHECK (cust_email like '%@%') ENABLE;
Data Pump is adding a check constraint and enables it with the ENABLE clause. But there is a missing detail – the constraint is also validated even though there is no VALIDATE clause. It is the default.
What Happens
In the source database, the check constraint was enabled and validated. During import, Data Pump adds the constraint with the same attributes. The only way to add a validated constraint – is to validate it.
But if the constraint was validated during export, then we know data in the dump file is validated as well. When importing data that we know is validated, why do we need to validate it again?
Because we know, but the database doesn’t know it. In order for a check constraint to be marked as validated, the database must ensure that it is truly the case. It trusts no one and insists on validating the data.
And when you import data from a dump file, can you actually be sure all data is good? What if …
- someone tampered with the dump file
- data was changed during export using the remap option
- data was changed as part of character set conversion
- data was changed as part of timezone file conversion
Why Does It Takes So Long To Validate A Constraint
To answer that, let’s trace:
SQL> ALTER SESSION SET TRACEFILE_IDENTIFIER = "CONST";
SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 4';
SQL> ALTER TABLE sh.customers2 ADD CONSTRAINT c_cust2_valid
CHECK (cust_valid in ('I', 'A'))
ENABLE VALIDATE;
Run the trace file through tkprof and have a look at the first recursive statement after the ALTER TABLE ... ADD CONSTRAINT
statement:
select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3
from "SH"."CUSTOMERS2" A
where not ( cust_email like '%@%')
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 49.03 66.61 3103448 3103510 1 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 49.03 66.62 3103448 3103510 1 0
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 100 (recursive depth: 3)
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 TABLE ACCESS FULL CUSTOMERS2 (cr=3103510 pr=3103448 pw=0 time=66618486 us starts=1 cost=2 size=39 card=1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 24274 0.20 22.45
PGA memory operation 1 0.00 0.00
The database checks the table for any existing data that violates the constraint. It does so by issuing a SELECT using the inverse function of the check constraint. This results in a full table scan, and that is what takes so long.
If you have multiple check constraints on the same table, Data Pump adds them one by one, and that results in multiple full table scans.
Since full table scans rarely populate the buffer cache, you end up with a lot of disk reading as well. You can verify that in the example above in the row source part. Physical reads (pr) are almost the same as consistent gets (cr). Almost every read requires a physical read.
But I Know Data Is Good
If you are sure that the data is good and you want to avoid the validation of data, you can create the check constraints as enabled but not validated. Existing data is not checked, but new data will be validated.
There are two ways of doing that:
1. Change Constraints in Source
Before exporting the data, in the source database, change all validated check constraints to NOVALIDATE:
ALTER TABLE "SH"."CUSTOMERS2"
MODIFY CONSTRAINT "C_CUST2_EMAIL"
ENABLE NOVALIDATE;
But be aware that you are changing the source database. If you need to use it for other purposes or as fallback, this might not be a good idea.
2. Change Constraints During Import
First, import the data, but exclude the constraints. This will be much faster because no time is spent on validating constraints:
$ impdp ... exclude=constraint
Next, extract the definition of all constraints from the Data Pump dump file and into a SQL file:
$ impdp ... include=constraint sqlfile=ddl.sql
Now, add the NOVALIDATE clause to all the statements:
ALTER TABLE "SH"."CUSTOMERS2"
ADD CONSTRAINT "C_CUST2_EMAIL"
CHECK (cust_email like '%@%')
ENABLE NOVALIDATE;
But There Is a Catch!
Regardless of which method you choose, there is a catch. Validated constraints are very useful to the database because it enables the optimizer to perform query rewrite and potentially improve query performance. Also, index access method might become available instead of full table scans.
You want to get those constraints validated. But you don’t have to do it during the import. Validating an enabled, not validated constraint does not require a lock on the table, so you can do it while the database is open for business.
What About Rely
If you follow option 2 and create the constraints after import as enabled and not validated, you could also add the rely clause:
ALTER TABLE "SH"."CUSTOMERS2"
ADD CONSTRAINT "C_CUST2_EMAIL"
CHECK (cust_email like '%@%')
RELY ENABLE NOVALIDATE;
Rely tells the database that you know the data is good. The optimizer still doesn’t trust you until you set the parameter QUERY_REWRITE_INTEGRITY to TRUSTED. Now, the optimizer can now benefit from some query rewrite options, but not all of them.
Nothing beats a truly validated constraint!
Conclusion
Importing validated check constraints takes time. If you want to save time in your maintenance window, instruct Data Pump to exclude the constraints, and add them yourself as enabled and not validated. This will save a lot of time that Data Pump otherwise would have to spend on validating the constraints.
The database benefits from validated constraints, so you should validate them. The database can validate constraints without a lock on the table if the constraints are already enabled. Thus, you can postpone the validation to a later time in your maintenance window, and you can perform other activities at the same time. Perhaps you can validate constraints while users are testing the database. Or wait until the next maintenance window. Just be aware that you might see some SQLs degrade until all constraints are validated.