A Word about Zero Downtime Oracle Grid Infrastructure Patching

In previous blog posts, I have shown you how to patch Oracle Grid Infrastructure 19c (GI) in a rolling manner. However, all those methods require a shutdown of the GI stack on each node. The database remains up all the time, but individual nodes will go down for a period.

You can change that with Zero Downtime Oracle Grid Infrastructure Patching (ZDOGIP):

Zero-downtime Oracle Grid Infrastructure patching enables patching of Oracle Grid Infrastructure without interrupting database operations. Patches are applied out-of-place and in a rolling fashion, with one node being patched at a time, while the database instances on this node remain operational. Zero-downtime Oracle Grid Infrastructure patching supports Oracle Real Application Clusters (Oracle RAC) databases on clusters with two or more nodes.

ZDOGIP achieves this with a bit of trickery:

When using Zero Downtime Patching, only the binaries in the Oracle Grid Infrastructure user space are patched. Additional Oracle Grid Infrastructure OS system software, kernel modules and system commands including ACFS, AFD, OLFS, and OKA, are not updated. These commands continue to run the version previous to the patch version.

Questions and Answers

Several questions came to my mind when I read about ZDOGIP. Here’s a summary:

How Can the Database Survive Without the ASM Instance?

ZDOGIP uses out-of-place patching and switches to a new, patched home. The database remains up while the GI stack restarts. The ASM instance restarts as well. How can the database survive that? The answer is Oracle Flex ASM. While the GI stack restarts, the database can access an ASM instance on another hub and access the shared storage directly.

This is an extract of the alert log. It shows how a database switches to a remote ASM instance (+ASM2) during a zero downtime patching session:

2023-02-24T09:18:03.663616+00:00
ALTER SYSTEM RELOCATE CLIENT TO '+ASM2'
2023-02-24T09:18:09.855879+00:00
NOTE: ASMB (9427) relocating from ASM instance +ASM1 to +ASM2 (User initiated)
NOTE: ASMB (index:0) registering with ASM instance as Flex client 0x1409d79b6620c71c (reg:163770215) (startid:1129292367) (reconnect)
NOTE: ASMB (index:0) (9427) connected to ASM instance +ASM2, osid: 53135 (Flex mode; client id 0x1409d79b6620c71c)
NOTE: ASMB (9427) rebuilding ASM server state for all pending groups
NOTE: ASMB (9427) rebuilding ASM server state for group 2 (RECO)

What Are the Minimum Requirements?

Can I Use It with ASM Filter Driver and ASM Cluster File System?

Yes, but if you are using ASM Filter Driver (AFD) or ASM Cluster File System (ACFS) and the patch you are applying updates these components, special attention is needed. You can’t update the kernel drivers when GI is running. However, the kernel drivers must be updated. With ZDOGIP you can postpone the update of the kernel drivers, and thus postpone the restart of the entire GI stack including the database that it manages.

… running with the older version of drivers is not supported for an extended period (e.g. restart should be completed in 24 hours of patching).

In short, if you are using AFD or ACFS, you more of less lose the benefit of ZDOGIP because the database must restart anyway.

This feature is recommended for the configurations that do not have (ACFS/AFD/OKA/OLFS).

You should expect that every Release Update contains patches for AFD and ACFS.

If you use AFD or ACFS, I recommend relying on rolling patch installation instead and investing your time in Application Continuity.

The quotes are from MOS note Zero-Downtime Oracle Grid Infrastructure Patching (ZDOGIP). (Doc ID 2635015.1).

How Do I Use Zero Downtime Oracle Grid Infrastructure Patching Together with ASM Filter Driver and ASM Cluster File System?

After patching with ZDOGIP, you must restart the entire GI stack, including the local database instance. You must do this shortly after the patch apply.

The procedure involves executing root.sh -updateosfiles. You will find the full details in the documentation.

How Can I Tell Whether ASM Filter Driver or ACFS Is Installed?

To see whether your system uses AFD:

$ORACLE_HOME/bin/asmcmd afd_state

To see whether your system uses ACFS:

$ORACLE_HOME/bin/crsctl query driver activeversion -all

Does It Work for Single Instance Databases as Well?

No, this feature is for Oracle RAC databases only.

Can I Use Zero Downtime Oracle Grid Infrastructure Patching with Oracle Fleet Patching and Provisioning

Yes, you can. There is a simple command line parameter that you can use, which tells Oracle Fleet Patching and Provisioning (FPP) to use ZDOGIP:

rhpctl move gihome ... -tgip

Patching can becomes slightly more complicated when you use ZDOGIP. You can alleviate that complexity by using FPP.

What Do All the Abbreviations Mean?

When you read the documentation and the MOS notes, you will come across several abbreviations. Here’s a handy list of some of them:

Abbreviation Meaning
ACFS ASM Cluster File System
ADVM ASM Dynamic Volume Manager
AFD ASM Filter Driver
OKA OS Kernel extensions
OLFS Oracle Layered File System

Appendix

Further Reading

Other Blog Posts in This Series

2 thoughts on “A Word about Zero Downtime Oracle Grid Infrastructure Patching

  1. Thank you very much for checking out the Zero Downtime Patching.
    As often new features sounds good at first, but if you want to go deeper and test it, you recognise there are too many limitations and maybe unkown pitfails in future with it.
    It would be nice to know how many DBAs realy use zero downtime patching.

    Like

  2. Hi Peter,

    Since ZDOGIP is still a fairly new feature, my guess is that it hasn’t been widely adopted yet.

    I understand what you say about the limitations. Many use AFD and ACFS and get no benefit from it, but I also believe that there is a substantial user base for whom this is a great benefit.
    Also, bear in mind that new features are often delivered in pieces. I don’t know the road map for ZDOGIP but there could be more enhancements on the way. ZDOGIP started out supporting one-off patches only, and now you can use it for RUs. Who knows what the future brings.

    Regards,
    Daniel

    Like

Leave a comment