Which Grid Infrastructure Should I Install on My Brand New Exadata?

I received a question from a customer:

We just got a brand-new Exadata. We will use it for critical databases and stay on Oracle Database 19c for the time being. Which Grid Infrastructure should we install: 19c or 23ai?

I recommend installing Oracle Grid Infrastructure 19c (GI 19c).

The Reason

GI 19c has been out since 2019. It is currently at the 23rd Release Update (19.26), and used on many thousands of systems. Those systems include some of the most critical systems you can find.

GI 19c is a very proven release that has reached a very stable state. Proven and stable – two attributes that are very valuable for a mission-critical system.

Additionally, I would apply the latest Release Update – at the time of writing that’s 19.26. Also, I would include fixes from Oracle Database 19c Important Recommended One-off Patches (Doc ID 555.1).

Further, I would ensure the databases were on the same Release Update, 19.26. If that’s impossible, at least keep the database within two Release Updates of Grid Infrastructure, so, minimum 19.24.

In this case, the customer migrates the databases from a different platform onto the new Exadata system. The source database is already running GI 19c, and keeping the same GI release on the new system means there’s one less change to deal with.

Why Not Oracle Grid Infrastructure 23ai?

First, there’s absolutely nothing wrong with the quality of Oracle Grid Infrastructure 23ai (GI 23ai).

When I recommend GI 19c over GI 23ai, it is a matter of choosing between two good options.

But GI 23ai has been out for Exadata for over half a year. Much less than GI 19c, which is about to reach six years of general availability.

Every piece of software as a few rough edges to grind off and I would expect that for GI 23ai as well.

For a mission-critical system, there’s no need to take any chances, which is why I recommend GI 19c.

When To Use Oracle Grid Infrastructure 23ai

If the customer wants to use Oracle Database 23ai – either now or in the foreseeable future – then they should install GI 23ai. No doubt about that.

Also, for less critical systems, including test and development systems, I would recommend GI 23ai as well.

Why Not Both?

I added this after a report on LinkedIn by my colleague, Alex Blyth.

Alex agrees with my recommendation but adds the following:

What I also would have said is, you can have your cake, and you can eat it too. This means Exadata can do more than one thing at a time. With virtualization, you can have a VM cluster with 19c Database and GI for critical databases, and another VM cluster (up to 50 per DB server with X10M / X11M and the latest Exadata System Software) that is running DB and GI 23ai. What’s more, you could also deploy Exadata Exascale and take advantage of the high-performance shared storage for the VM images, and the awesome instant database snapshot and cloning capabilities for DB 23ai.

He raises a really good point.

Exadata is the world’s best database platform and the flexibility it offers with virtualization would allow this customer the stability they need for their mission-critical database, plus, getting started with the many new features on 23ai.

The best of both worlds!

Final Words

Although I work in the upgrade team and love upgrades, I don’t recommend them at any cost.

For mission-critical systems, stability and maturity are paramount, and that influences my recommendation of GI 19c.

But get started with Oracle Grid Infrastructure and Database 23ai today. Install it in your lab and then on your less important systems. There are many great enhancements to exploring on Oracle Database 23ai.

Prepare yourself for the next release in due time.

How to Roll Back Oracle Grid Infrastructure 19c Using SwitchGridHome

Let me show you how I roll back a patch from Oracle Grid Infrastructure 19c (GI) using the out-of-place method and the -switchGridHome parameter.

My demo system:

  • Is a 2-node RAC (nodes copenhagen1 and copenhagen2).
  • Runs Oracle Linux.
  • Was patched from 19.17.0 to 19.19.0. I patched both GI and database. Now I want GI back on 19.17.0.

I only roll back the GI home. See the appendix for a few thoughts on rolling back the database as well.

This method works if you applied the patch out-of-place – regardless of whether you used the OPatchAuto or SwitchGridHome method.

Preparation

  • I use the term old Oracle Home for the original, lower patch level Oracle Home.

    • It is my 19.17.0 Oracle Home
    • It is stored in /u01/app/19.0.0.0/grid
    • I refer to this home using the environment variable OLD_ORACLE_HOME
    • This is the Oracle Home that I want to roll back to
  • I use the term new Oracle Home for the higher patch level Oracle Home.

    • It is my 19.19.0 Oracle Home
    • It is stored in /u01/app/19.19.0/grid
    • I refer to this home using the environment variable NEW_ORACLE_HOME
    • This is the Oracle Home that I want to roll back from

Both GI homes are present in the system already.

How to Roll Back Oracle Grid Infrastructure 19c

1. Sanity Checks

I execute the following checks on both nodes, copenhagen1 and copenhagen2. I show the commands for one node only.

  • I verify that the active GI home is the new GI home:

    [grid@copenhagen1]$ export ORACLE_HOME=$NEW_GRID_HOME
    [grid@copenhagen1]$ $ORACLE_HOME/srvm/admin/getcrshome
    
  • I verify that the cluster upgrade state is NORMAL:

    [grid@copenhagen1]$ $ORACLE_HOME/bin/crsctl query crs activeversion -f
    
  • I verify all CRS services are online:

    [grid@copenhagen1]$ $ORACLE_HOME/bin/crsctl check cluster
    
  • I verify that the cluster patch level is 19.19.0 – the new patch level:

    [grid@copenhagen1]$ $ORACLE_HOME/bin/crsctl query crs releasepatch
    

2. Cluster Verification Utility

  • I use Cluster Verification Utility (CVU) to verify that my cluster meets all prerequisites for a patch/rollback. I do this on one node only:
    [grid@copenhagen1]$ $CVU_HOME/bin/cluvfy stage -pre patch
    
    • You can find CVU in the GI home, but I recommend always getting the latest version from My Oracle Support.

3. Roll Back Node 1

The GI stack (including database, listener, etc.) needs to restart on each instance. But I do the rollback in a rolling manner, so the database stays up all the time.

  • I drain connections from the first node, copenhagen1.

  • I unlock the old GI home, root:

    [root@copenhagen1]$ export OLD_GRID_HOME=/u01/app/19.0.0.0/grid
    [root@copenhagen1]$ cd $OLD_GRID_HOME/crs/install
    [root@copenhagen1]$ ./rootcrs.sh -unlock -crshome $OLD_GRID_HOME
    
    • This is required because the next step (gridSetup.sh) runs as grid and must have access to the GI home.
    • Later on, when I run root.sh, the script will lock the GI home.
  • I switch to old GI home as grid:

    [grid@copenhagen1]$ export OLD_GRID_HOME=/u01/app/19.0.0.0/grid
    [grid@copenhagen1]$ export ORACLE_HOME=$OLD_GRID_HOME
    [grid@copenhagen1]$ export CURRENT_NODE=$(hostname)
    [grid@copenhagen1]$ $ORACLE_HOME/gridSetup.sh \
       -silent -switchGridHome \
       oracle.install.option=CRS_SWONLY \
       ORACLE_HOME=$ORACLE_HOME \
       oracle.install.crs.config.clusterNodes=$CURRENT_NODE \
       oracle.install.crs.rootconfig.executeRootScript=false
    
  • I complete the switch by running root.sh as root:

    [root@copenhagen1]$ export OLD_GRID_HOME=/u01/app/19.0.0.0/grid
    [root@copenhagen1]$ $OLD_GRID_HOME/root.sh
    
    • This step restarts the entire GI stack, including resources it manages (databases, listener, etc.). This means downtime on this node only. The remaining nodes stay up.
    • In that period, GI marks the services as OFFLINE so users can connect to other nodes.
    • If my database listener runs out of the Grid Home, GI will move it to the new Grid Home, including copying listener.ora.
    • In the end, GI restarts the resources (databases and the like).
  • I update any profiles (e.g., .bashrc) and other scripts referring to the GI home.

  • I verify that the active GI home is the new GI home:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/srvm/admin/getcrshome
    
  • I verify that the cluster upgrade state is ROLLING PATCH:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl query crs activeversion -f
    
  • I verify all CRS services are online:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl check cluster
    
  • I verify all resources are online:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl stat resource -t 
    
  • I verify that the GI patch level is 19.17.0 – the old patch level:

    [grid@copenhagen1]$ $OLD_ORACLE_HOME/bin/crsctl query crs releasepatch
    

4. Roll Back Node 2

  • I roll back the second node, copenhagen2, using the same process as the first node, copenhagen1.
    • I double-check that the CURRENT_NODE environment variable gets updated to copenhagen2.
    • When I use crsctl query crs activeversion -f to check the cluster upgrade state, it will now be back in NORMAL mode, because copenhagen2 is the last node in the cluster.

5. Cluster Verification Utility

  • I use Cluster Verification Utility (CVU) again. Now I perform a post-rollback check. I do this on one node only:
    [grid@copenhagen1]$ $CVU_HOME/bin/cluvfy stage -post patch
    

That’s it!

My cluster is now operating at the previous patch level.

Appendix

SwitchGridHome Does Not Have Dedicated Rollback Functionality

OPatchAuto has dedicated rollback functionality that will revert the previous patch operation. Similar functionality does not exist when you use the SwitchGridHome method.

This is described in Steps for Minimal Downtime Grid Infrastructure Out of Place ( OOP ) Patching using gridSetup.sh (Doc ID 2662762.1). To rollback, simply switch back to the previous GI home using the same method as for the patch.

There is no real rollback option as this is a switch from OLD_HOME to NEW_HOME To return to the old version you need to recreate another new home and switch to that.

Should I Roll Back the Database as Well?

This post describes rolling back the GI home only. Usually, I recommend keeping the database and GI patch level in sync. If I roll back GI, should I also roll back the database?

The short answer is no!

Keeping the GI and database patch in sync is a good idea. But when you need to roll back, you are in a contingency. Only roll back the component that gives you problems. Then, you will be out of sync for a period of time until you can get a one-off patch or move to the next Release Update. Being in this state for a shorter period is perfectly fine – and supported.

Other Blog Posts in This Series

Patching Oracle Grid Infrastructure 19c – Beginner’s Guide

This is the start of a blog post series on patching Oracle Grid Infrastructure 19c (GI). It is supposed to be easy to follow, so that I may have skipped a detail here and there.

I know my way around database patching. I have done it countless times. When it comes to GI, it’s the other way around. I have never really done it in the real world (i.e., before joining Oracle) and my knowledge was limited. I told my boss, Mike, and he gave me a challenge: Learn about it by writing a blog post series.

Why Do I Need to Patch Oracle Grid Infrastructure

Like any other piece of software, you need to patch GI to get rid of security issues and fix issues.

You should keep the GI and Oracle Database patch level in sync. This means that you need to patch GI and your Oracle Database at the same cadence. Ideally, that cadence is quarterly.

It is supported to run GI and Oracle Database at different patch levels as long as they are on the same release. GI is also certified to run some of the older Oracle Database releases. This is useful in upgrade projects. Check Oracle Clusterware (CRS/GI) – ASM – Database Version Compatibility (Doc ID 337737.1) for details.

A few examples:

GI Database Supported
19.18.0 19.18.0 Yes – recommended
19.16.0 19.18.0 Yes
19.18.0 19.16.0 Yes
19.18.0 11.2.0.4 Yes – used during upgrade, for instance
19.18.0 21.9.0 No

If possible and not too cumbersome, I recommend that you first patch GI and then Oracle Database. Some prefer to patch the two components in two separate operations, while others do it in one operation.

Which Patches Should You Apply to Oracle Grid Infrastructure

You should apply:

Whether you download the bundle patches individually or go with the combo patch is a matter of personal preference. Ultimately, they contain the same.

Some prefer an N-1 approach: When the April Release Update comes, they patch with the previous one from January; Always one quarter behind. For stability reasons, I assume.

What about OJVM patches for GI? The short answer is no.

Which Method Do I Use For Patching

You can patch in two ways:

  • In-place patching
  • Out-of-place patching
In-place Out-of-place
You apply patches to an existing Grid Home. You apply patches to a new Grid Home.
You need disk space for the patches. You need disk space for a brand new Grid Home and the patches.
You patch the existing Grid Home. When you start patching a node, GI drains all connections and moves services to other nodes. The node is down during patching. You create and patch a new Grid Home without downtime. You complete patching by switching to the new Grid Home. The node is down only during switching.
Longer node downtime. Shorter node downtime.
No changes to profile and scripts. Profile, scripts and the like must be updated to reflect the new Grid Home.
My recommended method.

Note: When I write node downtime, it does not mean database downtime. I discuss it shortly.

In other words:

In-place patching replaces the Oracle Clusterware software with the newer version in the same Grid home. Out-of-place upgrade has both versions of the same software present on the nodes at the same time, in different Grid homes, but only one version is active.

Oracle Fleet Patching and Provisioning

When you have more systems to manage, it is time to consider Fleet Patching and Provisioning (FPP).

Oracle Fleet Patching & Provisioning is the recommended solution for performing lifecycle operations (provisioning, patching & upgrades) across entire Oracle Grid Infrastructure and Oracle RAC Database fleets and the default solution used for Oracle Database Cloud services

It will make your life so much easier; more about that in a later blog post.

Zero Downtime Oracle Grid Infrastructure Patching

As of 19.16.0 you can also do Zero Downtime Oracle Grid Infrastructure Patching (ZDOGIP).

Use the zero-downtime Oracle Grid Infrastructure patching method to keep your Oracle RAC database instances running and client connections active during patching.

ZDOGIP is an extension to out-of-place patching. But ZDGIOP will not update the operating system drivers and will not bring down the Oracle stack (database instance, listener etc.). The new GI takes over control of the Oracle stack without users noticing. However, you must update the operating system drivers by taking down the node. But you can postpone it to a later point in time.

More details about ZDGIOP in a later blog post.

What about Oracle Database Downtime

When you patch GI on a node, the node is down. You don’t need to restart the operating system itself, but you do shut down the entire GI stack, including everything GI manages (database, listeners etc.).

What does that mean for Oracle Database?

Single Instance

If you have a single instance database managed by GI, your database is down during patching. Your users will experience downtime. By using out-of-place patching, you can reduce downtime.

Data Guard

If you have a Data Guard configuration, you can hide the outage from the end users.

First, you patch GI on your standby databases, then perform a switchover, and finally patch GI on the former primary database.

The only interruption is the switchover; a brownout period while the database switches roles. In the brownout period, the database appears to hang, but underneath the hood, you wait for the role switch to complete and connect to the new primary database.

If you have configured your application properly, it will not encounter any ORA-errors. Your users experience a short hang and continue as if nothing had happened.

RAC

If you have a RAC database, you can perform the patching in a rolling manner – node by node.

When you take down a node for patching, GI tells connections to drain from the affected instances and connect to other nodes.

If your application is properly configured, it will react to the drain events and connect seamlessly to another instance. The end users will not experience any interruption nor receive any errors.

If you haven’t configured your application properly or your application doesn’t react in due time, the connections will be forcefully terminated. How that will affect your users depend on the application. But it won’t look pretty.

Unless you configure Application Continuity. If so, the database can replay any in-flight transaction. From a user perspective, all looks fine. They won’t even notice that they have connected to a new instance and that the database replayed their transaction.

Happy Patching!

Appendix

Further Reading

Other Blog Posts in This Series