A Word about Zero Downtime Oracle Grid Infrastructure Patching

In previous blog posts, I have shown you how to patch Oracle Grid Infrastructure 19c (GI) in a rolling manner. However, all those methods require a shutdown of the GI stack on each node. The database remains up all the time, but individual nodes will go down for a period.

You can change that with Zero Downtime Oracle Grid Infrastructure Patching (ZDOGIP):

Zero-downtime Oracle Grid Infrastructure patching enables patching of Oracle Grid Infrastructure without interrupting database operations. Patches are applied out-of-place and in a rolling fashion, with one node being patched at a time, while the database instances on this node remain operational. Zero-downtime Oracle Grid Infrastructure patching supports Oracle Real Application Clusters (Oracle RAC) databases on clusters with two or more nodes.

ZDOGIP achieves this with a bit of trickery:

When using Zero Downtime Patching, only the binaries in the Oracle Grid Infrastructure user space are patched. Additional Oracle Grid Infrastructure OS system software, kernel modules and system commands including ACFS, AFD, OLFS, and OKA, are not updated. These commands continue to run the version previous to the patch version.

Questions and Answers

Several questions came to my mind when I read about ZDOGIP. Here’s a summary:

How Can the Database Survive Without the ASM Instance?

ZDOGIP uses out-of-place patching and switches to a new, patched home. The database remains up while the GI stack restarts. The ASM instance restarts as well. How can the database survive that? The answer is Oracle Flex ASM. While the GI stack restarts, the database can access an ASM instance on another hub and access the shared storage directly.

This is an extract of the alert log. It shows how a database switches to a remote ASM instance (+ASM2) during a zero downtime patching session:

2023-02-24T09:18:03.663616+00:00
ALTER SYSTEM RELOCATE CLIENT TO '+ASM2'
2023-02-24T09:18:09.855879+00:00
NOTE: ASMB (9427) relocating from ASM instance +ASM1 to +ASM2 (User initiated)
NOTE: ASMB (index:0) registering with ASM instance as Flex client 0x1409d79b6620c71c (reg:163770215) (startid:1129292367) (reconnect)
NOTE: ASMB (index:0) (9427) connected to ASM instance +ASM2, osid: 53135 (Flex mode; client id 0x1409d79b6620c71c)
NOTE: ASMB (9427) rebuilding ASM server state for all pending groups
NOTE: ASMB (9427) rebuilding ASM server state for group 2 (RECO)

What Are the Minimum Requirements?

Can I Use It with ASM Filter Driver and ASM Cluster File System?

Yes, but if you are using ASM Filter Driver (AFD) or ASM Cluster File System (ACFS) and the patch you are applying updates these components, special attention is needed. You can’t update the kernel drivers when GI is running. However, the kernel drivers must be updated. With ZDOGIP you can postpone the update of the kernel drivers, and thus postpone the restart of the entire GI stack including the database that it manages.

… running with the older version of drivers is not supported for an extended period (e.g. restart should be completed in 24 hours of patching).

In short, if you are using AFD or ACFS, you more of less lose the benefit of ZDOGIP because the database must restart anyway.

This feature is recommended for the configurations that do not have (ACFS/AFD/OKA/OLFS).

You should expect that every Release Update contains patches for AFD and ACFS.

If you use AFD or ACFS, I recommend relying on rolling patch installation instead and investing your time in Application Continuity.

The quotes are from MOS note Zero-Downtime Oracle Grid Infrastructure Patching (ZDOGIP). (Doc ID 2635015.1).

How Do I Use Zero Downtime Oracle Grid Infrastructure Patching Together with ASM Filter Driver and ASM Cluster File System?

After patching with ZDOGIP, you must restart the entire GI stack, including the local database instance. You must do this shortly after the patch apply.

The procedure involves executing root.sh -updateosfiles. You will find the full details in the documentation.

How Can I Tell Whether ASM Filter Driver or ACFS Is Installed?

To see whether your system uses AFD:

$ORACLE_HOME/bin/asmcmd afd_state

To see whether your system uses ACFS:

$ORACLE_HOME/bin/crsctl query driver activeversion -all

Does It Work for Single Instance Databases as Well?

No, this feature is for Oracle RAC databases only.

Can I Use Zero Downtime Oracle Grid Infrastructure Patching with Oracle Fleet Patching and Provisioning

Yes, you can. There is a simple command line parameter that you can use, which tells Oracle Fleet Patching and Provisioning (FPP) to use ZDOGIP:

rhpctl move gihome ... -tgip

Patching can becomes slightly more complicated when you use ZDOGIP. You can alleviate that complexity by using FPP.

What Do All the Abbreviations Mean?

When you read the documentation and the MOS notes, you will come across several abbreviations. Here’s a handy list of some of them:

Abbreviation Meaning
ACFS ASM Cluster File System
ADVM ASM Dynamic Volume Manager
AFD ASM Filter Driver
OKA OS Kernel extensions
OLFS Oracle Layered File System

Appendix

Other Blog Posts in This Series

Further Reading

Why You Need to Use Oracle Fleet Patching and Provisioning

First, what is Oracle Fleet Patching and Provisioning (FPP)?

Oracle Fleet Patching & Provisioning (formerly known as Oracle Rapid Home Provisioning) is the recommended solution for performing lifecycle operations (provisioning, patching & upgrades) across entire Oracle Grid Infrastructure and Oracle RAC Database fleets and the default solution used for Oracle Database Cloud services.

Oracle Fleet Patching and Provisioning automates your lifecycle activities

In my own words, a central server that provision, patch, and upgrade your Oracle Databases, Oracle Grid Infrastructure, and Oracle Exadata stack.

With FPP, the lifecycle operations are automated and handled centrally. The more systems you have, the greater the benefit of FPP.

If you are constantly busy trying to keep up with all the patching, or struggling to maintain your tooling, then it is time to look into FPP!

Many Good Reasons To Use Oracle Fleet Patching and Provisioning

Oracle Fleet Patching and Provisioning:

  • Uses a gold image approach to provision new Oracle Database and Grid Infrastructure homes.
  • Manages your software centrally. Install a new database or GI home based on a gold image centrally with one command.
  • Patches your database or GI home, including running datapatch.
  • Uses out-of-place patching to minimize downtime.
  • Makes it easy to use Zero Downtime Oracle Grid Infrastructure Patching (ZDOGIP).
  • Patches your entire Oracle Exadata stack, including the nodes, storage cells, and RoCE and Infiniband network.
  • Upgrades your databases using different approaches. Of course, FPP uses AutoUpgrade underneath the hood.
  • Creates new databases.
  • Adds nodes to your Oracle RAC cluster.
  • Detects missing bugfixes. If prior to patching database or GI home doesn’t have the expected patches, you are notified. Optionally, you can add the missing fixes to your gold image.
  • Checks for configuration drift. Compares the bug fixes in a gold image to the homes deployed.
  • Adheres to the MAA best practices. Provides options to control session draining, and perfectly integrates with Transparent Application Continuity. It will always use the latest HA and MAA features.
  • Does it the same way every time. No more human errors.

All of the above is done centrally from the FPP server.

Imagine how much time you spend on patching your Oracle stack. Now, take into consideration that you are doing it every quarter. Now, take Monthly Recommended Patches into account. Now, take all the other tasks into account. You can really save a lot of time using Oracle Fleet Patching and Provisioning.

But …

I need to learn a new tech!

You’re right. But there are so many resources to get you started:

It Requires a License

True, but it comes included in your Oracle RAC license. If you don’t have that, you must license the Enterprise Manager Lifecycle Management pack. Check the license guide for details.

My Database Is On Windows

OK, bummer. Then you can’t use FPP. It supports Linux, Solaris, and AIX only.

What’s the Result?

Settling the score is easy. There are many more pros than cons.

I recommend using Oracle Fleet Patching and Provisioning when you manage more than a few systems. It will make your life so much easier.

I interviewed Philippe Fierens, the product manager for Oracle Fleet Patching and Provisioning, on the benefits of using FPP.

Appendix

Other Blog Posts in This Series

Further Reading

Patching Oracle Grid Infrastructure 19c – Beginner’s Guide

This is the start of a blog post series on patching Oracle Grid Infrastructure 19c (GI). It is supposed to be easy to follow, so that I may have skipped a detail here and there.

I know my way around database patching. I have done it countless times. When it comes to GI, it’s the other way around. I have never really done it in the real world (i.e., before joining Oracle) and my knowledge was limited. I told my boss, Mike, and he gave me a challenge: Learn about it by writing a blog post series.

Why Do I Need to Patch Oracle Grid Infrastructure

Like any other piece of software, you need to patch GI to get rid of security issues and fix issues.

You should keep the GI and Oracle Database patch level in sync. This means that you need to patch GI and your Oracle Database at the same cadence. Ideally, that cadence is quarterly.

It is supported to run GI and Oracle Database at different patch levels as long as they are on the same release. GI is also certified to run some of the older Oracle Database releases. This is useful in upgrade projects. Check Oracle Clusterware (CRS/GI) – ASM – Database Version Compatibility (Doc ID 337737.1) for details.

A few examples:

GI Database Supported
19.18.0 19.18.0 Yes – recommended
19.16.0 19.18.0 Yes
19.18.0 19.16.0 Yes
19.18.0 11.2.0.4 Yes – used during upgrade, for instance
19.18.0 21.9.0 No

If possible and not too cumbersome, I recommend that you first patch GI and then Oracle Database. Some prefer to patch the two components in two separate operations, while others do it in one operation.

Which Patches Should You Apply to Oracle Grid Infrastructure

You should apply:

Whether you download the bundle patches individually or go with the combo patch is a matter of personal preference. Ultimately, they contain the same.

Some prefer an N-1 approach: When the April Release Update comes, they patch with the previous one from January; Always one quarter behind. For stability reasons, I assume.

What about OJVM patches for GI? The short answer is no.

Which Method Do I Use For Patching

You can patch in two ways:

  • In-place patching
  • Out-of-place patching
In-place Out-of-place
You apply patches to an existing Grid Home. You apply patches to a new Grid Home.
You need disk space for the patches. You need disk space for a brand new Grid Home and the patches.
You patch the existing Grid Home. When you start patching a node, GI drains all connections and moves services to other nodes. The node is down during patching. You create and patch a new Grid Home without downtime. You complete patching by switching to the new Grid Home. The node is down only during switching.
Longer node downtime. Shorter node downtime.
No changes to profile and scripts. Profile, scripts and the like must be updated to reflect the new Grid Home.
My recommended method.

Note: When I write node downtime, it does not mean database downtime. I discuss it shortly.

In other words:

In-place patching replaces the Oracle Clusterware software with the newer version in the same Grid home. Out-of-place upgrade has both versions of the same software present on the nodes at the same time, in different Grid homes, but only one version is active.

Oracle Fleet Patching and Provisioning

When you have more systems to manage, it is time to consider Fleet Patching and Provisioning (FPP).

Oracle Fleet Patching & Provisioning is the recommended solution for performing lifecycle operations (provisioning, patching & upgrades) across entire Oracle Grid Infrastructure and Oracle RAC Database fleets and the default solution used for Oracle Database Cloud services

It will make your life so much easier; more about that in a later blog post.

Zero Downtime Oracle Grid Infrastructure Patching

As of 19.16.0 you can also do Zero Downtime Oracle Grid Infrastructure Patching (ZDOGIP).

Use the zero-downtime Oracle Grid Infrastructure patching method to keep your Oracle RAC database instances running and client connections active during patching.

ZDOGIP is an extension to out-of-place patching. But ZDGIOP will not update the operating system drivers and will not bring down the Oracle stack (database instance, listener etc.). The new GI takes over control of the Oracle stack without users noticing. However, you must update the operating system drivers by taking down the node. But you can postpone it to a later point in time.

More details about ZDGIOP in a later blog post.

What about Oracle Database Downtime

When you patch GI on a node, the node is down. You don’t need to restart the operating system itself, but you do shut down the entire GI stack, including everything GI manages (database, listeners etc.).

What does that mean for Oracle Database?

Single Instance

If you have a single instance database managed by GI, your database is down during patching. Your users will experience downtime. By using out-of-place patching, you can reduce downtime.

Data Guard

If you have a Data Guard configuration, you can hide the outage from the end users.

First, you patch GI on your standby databases, then perform a switchover, and finally patch GI on the former primary database.

The only interruption is the switchover; a brownout period while the database switches roles. In the brownout period, the database appears to hang, but underneath the hood, you wait for the role switch to complete and connect to the new primary database.

If you have configured your application properly, it will not encounter any ORA-errors. Your users experience a short hang and continue as if nothing had happened.

RAC

If you have a RAC database, you can perform the patching in a rolling manner – node by node.

When you take down a node for patching, GI tells connections to drain from the affected instances and connect to other nodes.

If your application is properly configured, it will react to the drain events and connect seamlessly to another instance. The end users will not experience any interruption nor receive any errors.

If you haven’t configured your application properly or your application doesn’t react in due time, the connections will be forcefully terminated. How that will affect your users depend on the application. But it won’t look pretty.

Unless you configure Application Continuity. If so, the database can replay any in-flight transaction. From a user perspective, all looks fine. They won’t even notice that they have connected to a new instance and that the database replayed their transaction.

Happy Patching!

Appendix

Other Blog Posts in This Series

Further Reading