Patching Oracle Grid Infrastructure And Oracle Data Guard

How do you patch Oracle Grid Infrastructure 19c (GI) when Oracle Data Guard protects your Oracle Database?

I had a talk with Ludovico Caldara, the product manager for Oracle Data Guard, about it:

To provide more details, I will use the following setup as an example:

  • Data Guard setup with two databases.
  • Each database is a 2-node RAC database.
  • Sites are called copenhagen and aarhus.

Patching Oracle Grid Infrastructure Only

  1. Prepare new GI homes on all nodes in both sites (copenhagen and aarhus).
  2. Disable Fast-Start Failover (FSFO) for the reasons described below. You can leave the observer running.
  3. Start with the standby site, aarhus.
  4. Complete the patching process by switching to the new GI home in a rolling manner on all nodes at aarhus site.
  5. If you use Active Data Guard and have read-only sessions in your standby database, you should ensure that instances are properly drained before restarting the GI stack (via root.sh).
  6. Proceed with the primary site, copenhagen.
  7. Complete the patching process by switching to the new GI home in a rolling manner on all nodes at copenhagen site.
  8. Be sure to handle draining properly to ensure there are no interuptions.
  9. Re-enable FSFO.

Later, when you want to patch the database, you can follow up the standby-first method described in Oracle Patch Assurance – Data Guard Standby-First Patch Apply (Doc ID 1265700.1). If the database patches you install are RAC Rolling Installable (like Release Updates), you should choose option 1 in phase 3 to avoid any downtime or brownout.

Alternative Approach

If you have many nodes in your cluster and an application that doesn’t behave well during draining, consider switching over to the standby site instead of patching the primary site in a rolling manner. When you switch over, there is only one interruption, whereas many interruptions in a rolling patch apply.

  1. Patch standby site, aarhus.
  2. Switch over to aarhus.
  3. Patch former primary, copenhagen.

What If You Want to Patch the Database At the Same Time?

Out-of-place SwitchGridHome

You get complete control over the process with Out-of-place SwitchGridHome. It is my preferred method. There are more commands to execute, but it doesn’t matter if you automate it.

Here is an overview of the process. You can use many of the commands from this blog post:

  1. Prepare new GI homes using gridSetup. Be sure to apply the needed patches. Do it on one node in both primary (copenhagen) and standby site (aarhus). The process will copy the new GI home to all other nodes in the cluster. Do not execute root.sh.
  2. Prepare new database homes. Be sure to apply the needed patches. Here is an example. Do it on one node in both primary (copenhagen) and standby site (aarhus). The process will copy the new database home to all other nodes in the cluster. Remember to execute root.sh.
  3. Disable FSFO.
  4. Start with the standby site, aarhus.
  5. Configure the standby database to start in the new database home:
    $ $OLD_ORACLE_HOME/bin/srvctl modify database \
         -db $STDBY_ORACLE_UNQNAME \
         -oraclehome $NEW_ORACLE_HOME
    
  6. If you use Active Data Guard and have read-only sessions connected, drain the instance.
  7. Switch to the new GI home using gridSetup.sh -switchGridHome ... and root.sh.
    1. root.sh restarts the entire GI stack. When it restarts the database, the database instance runs in the new database home.
    2. Repeat the process on all nodes in the standby site (aarhus).
  8. Proceed with the primary site, copenhagen.
  9. Configure the primary database to start in the new database home:
    $ $OLD_ORACLE_HOME/bin/srvctl modify database \
         -db $PRMY_ORACLE_UNQNAME \
         -oraclehome $NEW_ORACLE_HOME
    
  10. Be sure to drain the instance.
  11. Switch to the new GI home using gridSetup.sh -switchGridHome ... and root.sh.
    1. root.sh restarts the entire GI stack. When it restarts the database, the database instance runs in the new database home.
    2. Repeat the process on all nodes in the primary site (copenhagen).
  12. Execute datapatch -verbose on one of the primary database instances to finish the patch apply.
  13. Re-enable FSFO.

Out-of-place OPatchAuto

Out-of-place OPatchAuto is a convenient way of patching because it also automates the database operations. However, I still recommend using Out-of-place SwitchGridHome method because it gives you more control over draining.

Here is an overview of the process:

  1. Deploy new GI and database homes using opatchauto apply ... -prepare-clone. Do it on all nodes in both primary (copenhagen) and standby site (aarhus). Since you want to patch GI and database homes, you should omit the -oh parameter.
  2. Disable FSFO.
  3. Start with the standby site, aarhus.
  4. Complete patching of all nodes in the standby site (aarhus) using opatchauto apply -switch-clone.
    1. When OPatchAuto completes the switch on a node, it takes down the entire GI stack on that node, including database instance.
    2. GI restarts using the new GI home. But the database instance still run on the old database home.
    3. On the last node, after the GI stack has been restarted, all database instances restart again to switch to the new database home. This means that each database instance will restart two times.
  5. Proceed with the primary site, copenhagen.
  6. Complete patching of all nodes in the primary site (copenhagen) using opatchauto apply -switch-clone.
    1. The procedure is the same as on the standby site.
    2. In addition, OPatchAuto executes Datapatch to complete the database patching.
  7. Re-enable FSFO.

Fast-Start Failover

When you perform maintenance operations, like patching, consider what to do about Fast-Start Failover (FSFO).

If you have one standby database

  • Single instance standby I recommend disabling FSFO. If something happens to the primary database while you are patching the standby site, you don’t want to switch over or fail over automatically. Since the standby site is being patched, the standby database might restart shortly. You should evaluate the situation and determine what to do rather than relying on FSFO handling it.
  • RAC standby I recommend disabling FSFO for the same reasons as above. Now, you could argue that the standby database is up all the time if you perform rolling patching. That’s correct, but nodes are being restarted as part of the patching process, and services are being relocated. Having sessions switching over or failing over while you are in the middle of a rolling patch apply is a little delicate situation. Technically, it works; the Oracle stack can handle it. But I prefer to evaluate the situation before switching or failing over. Unless you have a super-cool application that can transparently handle it.

Nevertheless, leaving FSFO enabled when you patch GI or a database is fully supported.

If you have more standby databases

I recommend keeping FSFO enabled if you have multiple standby databases.

When you patch one standby database, you can set FastStartFailoverTarget to the other standby database. When patching completes, you can set FastStartFailoverTarget to the first standby database and continue patching the second standby database. This keeps your primary database protected at all times.

The Easy Way

As shown above, you can patch Oracle Grid Infrastructure even when you have Oracle Data Guard configured. But why not take the easy way and use Oracle Fleet Patching and Provisioning (FPP)?

FPP automatically detects the presence of Data Guard and executes the commands in the appropriate order, including invoking Datapatch when needed.

If you need to know more, you can reach out to Philippe Fierens, product manager for FPP. He is always willing to get you started.

Happy Patching

Appendix

Other Blog Posts in This Series

Can I Run Datapatch When Users Are Connected

The short answer is: Yes! The longer answer is: Yes, but very busy systems or in certain situations, you might experience a few hiccups.

The obvious place to look for the answer would be in the documentation. Unfortunately, there is no Patching Guide similar to the Upgrade Guide. The information in this blog post is pieced together from many different sources.

A few facts about patching with Datapatch:

  • The database must be open in read write mode.
  • You can’t run Datapatch on a physical standby database – even if it’s open (Active Data Guard).
  • A patch is not fully installed until you have executed Datapatch successfully.

How To

First, let me state that it is fully supported to run Datapatch on a running database with users connected.

The procedure:

  1. Install a new Oracle Home and use OPatch to apply the desired patches.
  2. Shut down the database.
  3. Restart the database in the new, patched Oracle Home.
  4. Downtime is over! Users are allowed to connect to the database
  5. Execute ./datapatch -verbose.
  6. End of procedure. The patch is now fully applied.

Often users move step 3 to the end of the procedure. That’s of course also perfectly fine, but it does extend the downtime needed and often is not needed.

What About RAC and Data Guard

The above procedure is exactly what happens in a rolling patch apply on a RAC database. When you perform a rolling patch apply on a RAC database, there is no downtime at all. You use opatchauto to patch a RAC database. opatchauto restarts all instances of the database in the patched Oracle Home in a rolling manner. Finally, it executes datapatch on the last node. Individual instances are down temporarily, but the database is always up.

It is a similar situation when you use the Standby First Patch Apply. First, you restart all standby databases in the patched Oracle Home. Then, you perform a switchover and restart the former primary database in the patched Oracle Home. Finally, you execute datapatch to complete the patch installation. You must execute datapatch on the primary database.

Either way, don’t use Datapatch until all databases or instances run on the new, patched Oracle Home.

That’s It?

Yes, but I did write initally that there might be hiccups.

Waits

Datapatch connects to the database like any other session to make changes inside the database. These changes could be:

  • Creating new tables
  • Altering existing tables
  • Creating or altering views
  • Recreating PL/SQL packages like DBMS_STATS

Imagine this scenario:

  1. Database is restarted in patched Oracle Home.
  2. A user connects and starts to use DBMS_STATS.
  3. You execute datapatch.
    1. DBMS_STATS must be recreated to fix a bug.
    2. Datapatch executes CREATE OR REPLACE PACKAGE SYS.DBMS_STATS .....
    3. The Datapatch session will go into a wait.
  4. User is done with DBMS_STATS.
  5. The Datapatch session will come out of wait and replace the package.

In this scenario, the patching procedure was prolonged due to the wait. But it was completed eventually.

Hangs

From time to time, we are told that Datapatch hangs. Most likely, it is not a real hang, but just a wait on a lock. You can connect to the database and identify the blocker. You might even want to kill the blocking session to allow Datapatch to do its work.

Timeouts

What will happen in the above scenario if the user never releases the lock on DBMS_STATS? After a while, the DDL statement executed by Datapatch will error out:

ORA-04021: timeout occurred while waiting to lock object

To resolve this problem, restart Datapatch and ensure that there are no blocking sessions.

Really Busy Databases

I recommend patching at off-peak hours to reduce the likelihood of hitting the above problems.

If possible, you can also limit the activity in the database while you perform the patching. If your application is using e.g. DBMS_STATS and locking on that object is often a problem, you can hold off these sessions for a little while.

Similarly, if Advanced Queeing is causing problems, perhaps it helps temporarily set aq_tm_processes to 0. Or, in the case of the scheduler, job_queue_processes.

If nothing helps your situation, you can patch in restricted mode. But that means downtime:

  1. SQL> startup restrict
  2. ./datapatch -verbose
  3. SQL> alter system disable restricted session;

I don’t recommend starting in upgrade mode. To get out of upgrade mode a database restart is needed extending the downtime window.

Datapatch And Resources

How much resources does Datapatch need? Should I be worried about Datapatch depleting the system?

No, you should not. The changes that Datapatch needs to make are not resource-intensive. However, a consequence of the DDL statements might be object invalidation. But even here, you should not worry. Datapatch will automatically recompile any ORACLE_MAINTAINED object that was invalidated by the patch apply. But the recompilation happens serially, i.e., less resources needed.

Of course, if you system is running at 99% capacity, it might be a problem. On the other hand, if your system is at 99%, patching problems are probably the least of your worries.

What About OJVM

If you are using OJVM and you apply the OJVM bundle patch, things are a little different.

Release RAC Rolling Standby-First Datapatch
Oracle Database 21c Fully No No Datapatch downtime.
Oracle Database 19c + 18c Partial No No Datapatch downtime, but java system is patched which requires ~10 second outage. Connected clients using java will receive ORA-29548.
Oracle Database 12.2 + 12.1 No No Datapatch must execute in upgrade mode.
Oracle Database 11.2.0.4 No No Similar to 12.2 and 12.1 except you don’t use Datapatch.

Mike Dietrich also has a good blog that you might want to read: Do you need STARTUP UPGRADE for OJVM?

What About Oracle GoldenGate

You should stop Oracle GoldenGate when you execute datapatch. When datapatch is done, you can restart Oracle GoldenGate.

If you are manually recompiling objects after datapatch, I recommend that you restart Oracle GoldenGate after the recompilation.

The above applies even if the patches being applied does not contain any Oracle GoldenGate specific patches.

Oracle GoldenGate uses several objects owned by SYS. When datapatch is running it might change some of those objects. In that case, unexpected errors might occur.

Recommendations

  • Before starting the patching procedure and downtime, I recommend you recompile invalid objects.
    SQL> @?/rdbms/admin/utlrp
    
  • Always execute Datapatch with the -verbose flag. This will give you much better information about is going on.
    $ $ORACLE_HOME/OPatch/datapatch -verbose
    
  • Always use the latest OPatch.
  • Always use out-of-place patching, even for RAC databases.

Conclusion

Go ahead and patch your database with Datapatch while users are connected.

Further Reading