How to Roll Back Oracle Grid Infrastructure 19c Using SwitchGridHome

Let me show you how I roll back a patch from Oracle Grid Infrastructure 19c (GI) using the out-of-place method and the -switchGridHome parameter.

My demo system:

  • Is a 2-node RAC (nodes copenhagen1 and copenhagen2).
  • Runs Oracle Linux.
  • Was patched from 19.17.0 to 19.19.0. I patched both GI and database. Now I want GI back on 19.17.0.

I only roll back the GI home. See the appendix for a few thoughts on rolling back the database as well.

This method works if you applied the patch out-of-place – regardless of whether you used the OPatchAuto or SwitchGridHome method.

Preparation

  • I use the term old Oracle Home for the original, lower patch level Oracle Home.

    • It is my 19.17.0 Oracle Home
    • It is stored in /u01/app/19.0.0.0/grid
    • I refer to this home using the environment variable OLD_ORACLE_HOME
    • This is the Oracle Home that I want to roll back to
  • I use the term new Oracle Home for the higher patch level Oracle Home.

    • It is my 19.19.0 Oracle Home
    • It is stored in /u01/app/19.19.0/grid
    • I refer to this home using the environment variable NEW_ORACLE_HOME
    • This is the Oracle Home that I want to roll back from

Both GI homes are present in the system already.

How to Roll Back Oracle Grid Infrastructure 19c

1. Sanity Checks

I execute the following checks on both nodes, copenhagen1 and copenhagen2. I show the commands for one node only.

  • I verify that the active GI home is the new GI home:

    [grid@copenhagen1]$ export ORACLE_HOME=$NEW_GRID_HOME
    [grid@copenhagen1]$ $ORACLE_HOME/srvm/admin/getcrshome
    
  • I verify that the cluster upgrade state is NORMAL:

    [grid@copenhagen1]$ $ORACLE_HOME/bin/crsctl query crs activeversion -f
    
  • I verify all CRS services are online:

    [grid@copenhagen1]$ $ORACLE_HOME/bin/crsctl check cluster
    
  • I verify that the cluster patch level is 19.19.0 – the new patch level:

    [grid@copenhagen1]$ $ORACLE_HOME/bin/crsctl query crs releasepatch
    

2. Cluster Verification Utility

  • I use Cluster Verification Utility (CVU) to verify that my cluster meets all prerequisites for a patch/rollback. I do this on one node only:
    [grid@copenhagen1]$ $CVU_HOME/bin/cluvfy stage -pre patch
    
    • You can find CVU in the GI home, but I recommend always getting the latest version from My Oracle Support.

3. Roll Back Node 1

The GI stack (including database, listener, etc.) needs to restart on each instance. But I do the rollback in a rolling manner, so the database stays up all the time.

  • I drain connections from the first node, copenhagen1.

  • I unlock the old GI home, root:

    [root@copenhagen1]$ export OLD_GRID_HOME=/u01/app/19.0.0.0/grid
    [root@copenhagen1]$ cd $OLD_GRID_HOME/crs/install
    [root@copenhagen1]$ ./rootcrs.sh -unlock -crshome $OLD_GRID_HOME
    
    • This is required because the next step (gridSetup.sh) runs as grid and must have access to the GI home.
    • Later on, when I run root.sh, the script will lock the GI home.
  • I switch to old GI home as grid:

    [grid@copenhagen1]$ export OLD_GRID_HOME=/u01/app/19.0.0.0/grid
    [grid@copenhagen1]$ export ORACLE_HOME=$OLD_GRID_HOME
    [grid@copenhagen1]$ export CURRENT_NODE=$(hostname)
    [grid@copenhagen1]$ $ORACLE_HOME/gridSetup.sh \
       -silent -switchGridHome \
       oracle.install.option=CRS_SWONLY \
       ORACLE_HOME=$ORACLE_HOME \
       oracle.install.crs.config.clusterNodes=$CURRENT_NODE \
       oracle.install.crs.rootconfig.executeRootScript=false
    
  • I complete the switch by running root.sh as root:

    [root@copenhagen1]$ export OLD_GRID_HOME=/u01/app/19.0.0.0/grid
    [root@copenhagen1]$ $OLD_GRID_HOME/root.sh
    
    • This step restarts the entire GI stack, including resources it manages (databases, listener, etc.). This means downtime on this node only. The remaining nodes stay up.
    • In that period, GI marks the services as OFFLINE so users can connect to other nodes.
    • If my database listener runs out of the Grid Home, GI will move it to the new Grid Home, including copying listener.ora.
    • In the end, GI restarts the resources (databases and the like).
  • I update any profiles (e.g., .bashrc) and other scripts referring to the GI home.

  • I verify that the active GI home is the new GI home:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/srvm/admin/getcrshome
    
  • I verify that the cluster upgrade state is ROLLING PATCH:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl query crs activeversion -f
    
  • I verify all CRS services are online:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl check cluster
    
  • I verify all resources are online:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl stat resource -t 
    
  • I verify that the GI patch level is 19.17.0 – the old patch level:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl query crs releasepatch
    

4. Roll Back Node 2

  • I roll back the second node, copenhagen2, using the same process as the first node, copenhagen1.
    • I double-check that the CURRENT_NODE environment variable gets updated to copenhagen2.
    • When I use crsctl query crs activeversion -f to check the cluster upgrade state, it will now be back in NORMAL mode, because copenhagen2 is the last node in the cluster.

5. Cluster Verification Utility

  • I use Cluster Verification Utility (CVU) again. Now I perform a post-rollback check. I do this on one node only:
    [grid@copenhagen1]$ $CVU_HOME/bin/cluvfy stage -post patch
    

That’s it!

My cluster is now operating at the previous patch level.

Appendix

SwitchGridHome Does Not Have Dedicated Rollback Functionality

OPatchAuto has dedicated rollback functionality that will revert the previous patch operation. Similar functionality does not exist when you use the SwitchGridHome method.

This is described in Steps for Minimal Downtime Grid Infrastructure Out of Place ( OOP ) Patching using gridSetup.sh (Doc ID 2662762.1). To rollback, simply switch back to the previous GI home using the same method as for the patch.

There is no real rollback option as this is a switch from OLD_HOME to NEW_HOME To return to the old version you need to recreate another new home and switch to that.

Should I Roll Back the Database as Well?

This post describes rolling back the GI home only. Usually, I recommend keeping the database and GI patch level in sync. If I roll back GI, should I also roll back the database?

The short answer is no!

Keeping the GI and database patch in sync is a good idea. But when you need to roll back, you are in a contingency. Only roll back the component that gives you problems. Then, you will be out of sync for a period of time until you can get a one-off patch or move to the next Release Update. Being in this state for a shorter period is perfectly fine – and supported.

Other Blog Posts in This Series

Patching Oracle Grid Infrastructure 19c – Beginner’s Guide

This is the start of a blog post series on patching Oracle Grid Infrastructure 19c (GI). It is supposed to be easy to follow, so that I may have skipped a detail here and there.

I know my way around database patching. I have done it countless times. When it comes to GI, it’s the other way around. I have never really done it in the real world (i.e., before joining Oracle) and my knowledge was limited. I told my boss, Mike, and he gave me a challenge: Learn about it by writing a blog post series.

Why Do I Need to Patch Oracle Grid Infrastructure

Like any other piece of software, you need to patch GI to get rid of security issues and fix issues.

You should keep the GI and Oracle Database patch level in sync. This means that you need to patch GI and your Oracle Database at the same cadence. Ideally, that cadence is quarterly.

It is supported to run GI and Oracle Database at different patch levels as long as they are on the same release. GI is also certified to run some of the older Oracle Database releases. This is useful in upgrade projects. Check Oracle Clusterware (CRS/GI) – ASM – Database Version Compatibility (Doc ID 337737.1) for details.

A few examples:

GI Database Supported
19.18.0 19.18.0 Yes – recommended
19.16.0 19.18.0 Yes
19.18.0 19.16.0 Yes
19.18.0 11.2.0.4 Yes – used during upgrade, for instance
19.18.0 21.9.0 No

If possible and not too cumbersome, I recommend that you first patch GI and then Oracle Database. Some prefer to patch the two components in two separate operations, while others do it in one operation.

Which Patches Should You Apply to Oracle Grid Infrastructure

You should apply:

Whether you download the bundle patches individually or go with the combo patch is a matter of personal preference. Ultimately, they contain the same.

Some prefer an N-1 approach: When the April Release Update comes, they patch with the previous one from January; Always one quarter behind. For stability reasons, I assume.

What about OJVM patches for GI? The short answer is no.

Which Method Do I Use For Patching

You can patch in two ways:

  • In-place patching
  • Out-of-place patching
In-place Out-of-place
You apply patches to an existing Grid Home. You apply patches to a new Grid Home.
You need disk space for the patches. You need disk space for a brand new Grid Home and the patches.
You patch the existing Grid Home. When you start patching a node, GI drains all connections and moves services to other nodes. The node is down during patching. You create and patch a new Grid Home without downtime. You complete patching by switching to the new Grid Home. The node is down only during switching.
Longer node downtime. Shorter node downtime.
No changes to profile and scripts. Profile, scripts and the like must be updated to reflect the new Grid Home.
My recommended method.

Note: When I write node downtime, it does not mean database downtime. I discuss it shortly.

In other words:

In-place patching replaces the Oracle Clusterware software with the newer version in the same Grid home. Out-of-place upgrade has both versions of the same software present on the nodes at the same time, in different Grid homes, but only one version is active.

Oracle Fleet Patching and Provisioning

When you have more systems to manage, it is time to consider Fleet Patching and Provisioning (FPP).

Oracle Fleet Patching & Provisioning is the recommended solution for performing lifecycle operations (provisioning, patching & upgrades) across entire Oracle Grid Infrastructure and Oracle RAC Database fleets and the default solution used for Oracle Database Cloud services

It will make your life so much easier; more about that in a later blog post.

Zero Downtime Oracle Grid Infrastructure Patching

As of 19.16.0 you can also do Zero Downtime Oracle Grid Infrastructure Patching (ZDOGIP).

Use the zero-downtime Oracle Grid Infrastructure patching method to keep your Oracle RAC database instances running and client connections active during patching.

ZDOGIP is an extension to out-of-place patching. But ZDGIOP will not update the operating system drivers and will not bring down the Oracle stack (database instance, listener etc.). The new GI takes over control of the Oracle stack without users noticing. However, you must update the operating system drivers by taking down the node. But you can postpone it to a later point in time.

More details about ZDGIOP in a later blog post.

What about Oracle Database Downtime

When you patch GI on a node, the node is down. You don’t need to restart the operating system itself, but you do shut down the entire GI stack, including everything GI manages (database, listeners etc.).

What does that mean for Oracle Database?

Single Instance

If you have a single instance database managed by GI, your database is down during patching. Your users will experience downtime. By using out-of-place patching, you can reduce downtime.

Data Guard

If you have a Data Guard configuration, you can hide the outage from the end users.

First, you patch GI on your standby databases, then perform a switchover, and finally patch GI on the former primary database.

The only interruption is the switchover; a brownout period while the database switches roles. In the brownout period, the database appears to hang, but underneath the hood, you wait for the role switch to complete and connect to the new primary database.

If you have configured your application properly, it will not encounter any ORA-errors. Your users experience a short hang and continue as if nothing had happened.

RAC

If you have a RAC database, you can perform the patching in a rolling manner – node by node.

When you take down a node for patching, GI tells connections to drain from the affected instances and connect to other nodes.

If your application is properly configured, it will react to the drain events and connect seamlessly to another instance. The end users will not experience any interruption nor receive any errors.

If you haven’t configured your application properly or your application doesn’t react in due time, the connections will be forcefully terminated. How that will affect your users depend on the application. But it won’t look pretty.

Unless you configure Application Continuity. If so, the database can replay any in-flight transaction. From a user perspective, all looks fine. They won’t even notice that they have connected to a new instance and that the database replayed their transaction.

Happy Patching!

Appendix

Further Reading

Other Blog Posts in This Series