I just finished my presentation at the UKOUG conference. This time, it was held at the Oracle office in Reading. Two intense days full of learning experiences.
It’s the 40th anniversary of UKOUG – that’s truly amazing. The community started when I was just a little child and still lives on today, what a change tech has undergone since then.
Congratulations to the board and the entire community on the 40th anniversary.
The Slides
Patch Me If You Can – Grid Infrastructure Edition
This is a modification of an existing talk about database patching, but mostly on Oracle Grid Infrastructure. But since Oracle Database and Grid Infrastructure go hand in hand, it also has some database stuff.
You should flip through the slides if you work with Oracle Grid Infrastructure. And remember – always patch out-of-place.
Help! My Database is still on 8i!
I also had the opportunity to close the conference with my 8i talk. I really like this talk because it is a walk down memory lane. Plus, it includes demos using Oracle 8i Database. It’s cool to be old school.
For a little laugh, you can find a comparison of Oracle Database releases and mobile phones of the same age.
Thanks
Thanks to the board of UKOUG and the organizers for pulling yet another successful conference. Thanks to the sponsors making it all possible and to everyone who attended my sessions or the conference in general.
It keeps impressing me how much you can learn in such a short time. My head is full. Luckily, the weekend is coming up.
P.S. The chocolate fountain was amazing (see below)!
When you have created a new GI home and applied all the necessary patches, you can turn it into a golden image. Later on, you can deploy from that golden image and avoid updating OPatch and apply patches.
How to Create a Golden Image
First, only create a golden image from a freshly installed Oracle Home. Never use an Oracle Home that is already in use. As soon as you start to use an Oracle Home you taint it with various files and you don’t want to carry those files around in your golden image. The golden image must be completely clean.
Then, you create a directory where you can store the golden image:
Be sure to do this before you start to use the new GI home.
The installer creates the golden image as a zip file in the specified directory. The name of the zip file is unique and printed on the console. You can also use the secret parameter -name to specify a name for the zip file. To name the zip file gi_19_20_0.zip:
No software must run out of the Oracle Home, when you create the gold image. Don’t use a production Oracle Home. I recommend using a test or staging server instead.
When you patch your Oracle Grid Infrastructure 19c (GI) using the out-of-place method, you should also remove the old GI homes.
I recommend that you keep the old GI home for a while. At least until you are convinced that a rollback is not needed. Once you are comfortable with the new GI home, you can safely get rid of it.
How to Remove an Oracle Grid Infrastructure 19c Home
I set the path to my old GI home as an environment variable:
export REMOVE_ORACLE_HOME=/u01/app/19.0.0.0/grid
Optionally, I take a backup of the GI home for safekeeping:
I verify that the GI home, is not the active one. This command returns the active GI home. It must not return the path of the GI home, which I want to delete. As grid:
$REMOVE_ORACLE_HOME/srvm/admin/getcrshome
I double-check that the GI home to remove is not the active one. The XML tag returned must not contain an CRS=“true” attribute. As grid:
export ORA_INVENTORY_XML=/u01/app/oraInventory/ContentsXML/inventory.xml
grep "$REMOVE_ORACLE_HOME" $ORA_INVENTORY_XML
#This is good
# <HOME NAME="OraGrid190" LOC="/u01/app/19.0.0.0/grid" TYPE="O" IDX="1"/>
#This is bad
#. <HOME NAME="OraGrid190" LOC="/u01/app/19.0.0.0/grid" TYPE="O" IDX="1" CRS="true"/>
I run the deinstall tool. I switch to my home directory to ensure I am not interfering with the de-installation. As grid:
cd ~
$REMOVE_ORACLE_HOME/deinstall/deinstall
The script:
Detects the nodes in my cluster.
Prints a summary and prompts for confirmation.
Deinstalls the GI home on all nodes.
Instructs me to run a script as root on all nodes.
Prints a summary including any manual tasks in the end.
I verify that the GI home is marked as deleted in the inventory. The XML tag should have a Removed=“T” attribute. As grid:
export ORA_INVENTORY_XML=/u01/app/oraInventory/ContentsXML/inventory.xml
grep "$REMOVE_ORACLE_HOME" $ORA_INVENTORY_XML
#This is good
# <HOME NAME="OraGrid190" LOC="/u01/app/19.0.0.0/grid" TYPE="O" IDX="1" Removed="T"/>
Often the deinstall tool can’t remove some files because of missing permissions. I remove the GI home manually. As root on all nodes:
You can find CVU in the GI home, but I recommend always getting the latest version from My Oracle Support.
3. Roll Back Node 1
The GI stack (including database, listener, etc.) needs to restart on each instance. But I do the rollback in a rolling manner, so the database stays up all the time.
I drain connections from the first node, copenhagen1.
This step restarts the entire GI stack, including resources it manages (databases, listener, etc.). This means downtime on this node only. The remaining nodes stay up.
In that period, GI marks the services as OFFLINE so users can connect to other nodes.
If my database listener runs out of the Grid Home, GI will move it to the new Grid Home, including copying listener.ora.
In the end, GI restarts the resources (databases and the like).
I update any profiles (e.g., .bashrc) and other scripts referring to the GI home.
I verify that the active GI home is the new GI home:
I roll back the second node, copenhagen2, using the same process as the first node, copenhagen1.
I double-check that the CURRENT_NODE environment variable gets updated to copenhagen2.
When I use crsctl query crs activeversion -f to check the cluster upgrade state, it will now be back in NORMAL mode, because copenhagen2 is the last node in the cluster.
5. Cluster Verification Utility
I use Cluster Verification Utility (CVU) again. Now I perform a post-rollback check. I do this on one node only:
My cluster is now operating at the previous patch level.
Appendix
SwitchGridHome Does Not Have Dedicated Rollback Functionality
OPatchAuto has dedicated rollback functionality that will revert the previous patch operation. Similar functionality does not exist when you use the SwitchGridHome method.
There is no real rollback option as this is a switch from OLD_HOME to NEW_HOME
To return to the old version you need to recreate another new home and switch to that.
Keeping the GI and database patch in sync is a good idea. But when you need to roll back, you are in a contingency. Only roll back the component that gives you problems.
Then, you will be out of sync for a period of time until you can get a one-off patch or move to the next Release Update. Being in this state for a shorter period is perfectly fine – and supported.
Like with Oracle Database, I strongly recommend patching Oracle Grid Infrastructure using the out-of-place method. It has many advantages over in-place patching
The two methods also move listener.ora as part of the process.
If you use opatchauto you should note that the tool moves listener.ora when preparing the new GI using opatchauto apply ... -prepare-clone. You can run that command hours or days before you move the listener. If you add things to listener.ora in between, you must also add it to listener.ora in the new GI home.
Conclusion
There is really nothing to worry about when you patch Oracle Grid Infrastructure out-of-place. The above-mentioned two tools will take care of it for you.
Standby-First Patch Apply allows you to minimize downtime to the time it takes to perform a Data Guard switchover. Further, it allows you to test the apply mechanism on the standby database by temporarily converting it into a snapshot standby database.
The scenario:
Oracle Grid Infrastructure 19c and Oracle Database 19c
Patching from Release Update 19.17.0 to 19.19.0
Vertical patching – GI and database at the same time
Data Guard setup with two RAC databases
Cluster 1: copenhagen1 and copenhagen2
Cluster 2: aarhus1 and aarhus2
DB_NAME: CDB1
DB_UNIQUE_NAME: CDB1_COPENHAGEN and CDB1_AARHUS
Using Data Guard broker
Patching GI using SwitchGridHome method
Let’s get started!
Step 1: Prepare
I can make the preparations without interrupting the database.
I can also test my application on the standby database.
At the end of my testing, I revert the standby database to a physical standby database. The database automatically reverts all the changes made during testing:
DGMGRL> convert database CDB1_AARHUS to physical standby;
Step 4: Switchover
I can perform the previous steps without interrupting my users. This step requires a maintenance window because I am doing a Data Guard switchover.
I check that my standby database is ready to become primary. Then, I start a Data Guard switchover:
DGMGRL> connect sys/<password> as sysdba
DGMGRL> validate database CDB1_AARHUS;
DGMGRL> switchover to CDB1_AARHUS;
A switchover does not have to mean downtime.
If my application is configured properly, the users will experience a brownout; a short hang, while the connections switch to the new primary database.
Step 5: Restart New Standby in New Oracle Homes
Now, the primary database runs on aarhus1 and aarhus2. Next, I can move the new standby hosts, copenhagen1 and copenhagen2, to the new GI and database homes.
I repeat step 2 (Restart Standby In New Oracle Homes) but this time for the new standby hosts, copenhagen1 and copenhagen2.
Step 6: Complete Patching
Now, both databases in my Data Guard configuration run out of the new Oracle Homes.
Only proceed with this step once all databases run out of the new Oracle Home.
I need to run this step as fast as possible after I have completed the previous step.
I complete the patching by running Datapatch on the primary database (CDB1_AARHUS). I add the recomp_threshold parameter to ensure Datapatch recompiles all objects that the patching invalidated:
If you use Active Data Guard and have read-only sessions in your standby database, you should ensure that instances are properly drained before restarting the GI stack (via root.sh).
Later, when you want to patch the database, you can follow up the standby-first method described in Oracle Patch Assurance – Data Guard Standby-First Patch Apply (Doc ID 1265700.1). If the database patches you install are RAC Rolling Installable (like Release Updates), you should choose option 1 in phase 3 to avoid any downtime or brownout.
Alternative Approach
If you have many nodes in your cluster and an application that doesn’t behave well during draining, consider switching over to the standby site instead of patching the primary site in a rolling manner.
When you switch over, there is only one interruption, whereas many interruptions in a rolling patch apply.
Patch standby site, aarhus.
Switch over to aarhus.
Patch former primary, copenhagen.
What If You Want to Patch the Database At the Same Time?
Out-of-place SwitchGridHome
You get complete control over the process with Out-of-place SwitchGridHome. It is my preferred method. There are more commands to execute, but it doesn’t matter if you automate it.
Here is an overview of the process. You can use many of the commands from this blog post:
Prepare new GI homes using gridSetup. Be sure to apply the needed patches. Do it on one node in both primary (copenhagen) and standby site (aarhus). The process will copy the new GI home to all other nodes in the cluster. Do not executeroot.sh.
Prepare new database homes. Be sure to apply the needed patches. Here is an example. Do it on one node in both primary (copenhagen) and standby site (aarhus). The process will copy the new database home to all other nodes in the cluster. Remember to execute root.sh.
Out-of-place OPatchAuto is a convenient way of patching because it also automates the database operations. However, I still recommend using Out-of-place SwitchGridHome method because it gives you more control over draining.
Here is an overview of the process:
Deploy new GI and database homes using opatchauto apply ... -prepare-clone. Do it on all nodes in both primary (copenhagen) and standby site (aarhus). Since you want to patch GI and database homes, you should omit the -oh parameter.
Complete patching of all nodes in the standby site (aarhus) using opatchauto apply -switch-clone.
When OPatchAuto completes the switch on a node, it takes down the entire GI stack on that node, including database instance.
GI restarts using the new GI home. But the database instance still run on the old database home.
On the last node, after the GI stack has been restarted, all database instances restart again to switch to the new database home. This means that each database instance will restart two times.
Proceed with the primary site, copenhagen.
Complete patching of all nodes in the primary site (copenhagen) using opatchauto apply -switch-clone.
The procedure is the same as on the standby site.
In addition, OPatchAuto executes Datapatch to complete the database patching.
When you perform maintenance operations, like patching, consider what to do about Fast-Start Failover (FSFO).
If you have one standby database
Single instance standby
I recommend disabling FSFO. If something happens to the primary database while you are patching the standby site, you don’t want to switch over or fail over automatically. Since the standby site is being patched, the standby database might restart shortly. You should evaluate the situation and determine what to do rather than relying on FSFO handling it.
RAC standby
I recommend disabling FSFO for the same reasons as above. Now, you could argue that the standby database is up all the time if you perform rolling patching. That’s correct, but nodes are being restarted as part of the patching process, and services are being relocated. Having sessions switching over or failing over while you are in the middle of a rolling patch apply is a little delicate situation. Technically, it works; the Oracle stack can handle it. But I prefer to evaluate the situation before switching or failing over. Unless you have a super-cool application that can transparently handle it.
Nevertheless, leaving FSFO enabled when you patch GI or a database is fully supported.
If you have more standby databases
I recommend keeping FSFO enabled if you have multiple standby databases.
When you patch one standby database, you can set FastStartFailoverTarget to the other standby database. When patching completes, you can set FastStartFailoverTarget to the first standby database and continue patching the second standby database. This keeps your primary database protected at all times.
The Easy Way
As shown above, you can patch Oracle Grid Infrastructure even when you have Oracle Data Guard configured. But why not take the easy way and use Oracle Fleet Patching and Provisioning (FPP)?
FPP automatically detects the presence of Data Guard and executes the commands in the appropriate order, including invoking Datapatch when needed.
If you need to know more, you can reach out to Philippe Fierens, product manager for FPP. He is always willing to get you started.
The inventory registers the GI Release Updates as OCW RELEASE UPDATE. In this example, GI is running on 19.17.0.
Sometimes critical one-off patches are delivered as merge patches with the GI Release Update. It can mess up the patch description. This example is from a Base Database Service in OCI:
The patch description no longer contains the name of the Release Update. In this case, you can trawl through MOS to find the individual patches in the merge patch to identify which Release Update it contains. Or, you can often look at the ACFS patch instead:
I have shown you a few ways to patch Oracle Grid Infrastructure 19c (GI). Which one should you choose? Here’s an overview of the pros and cons of each method.
Just Grid Infrastructure Or Also The Database
You can patch:
Just GI and later on the database
Or GI and the database at the same time
If possible, I recommend patching GI and database in a separate maintenance operation. Proceed with the database when you are confident the new GI runs fine. If you do it a week apart, you should have enough time to kick the tires on the new GI.
I like to keep things separate. If there is a problem, I can quickly identify whether the GI or database patches are causing problems. The more patches you put in simultaneously, the more changes come in, and the harder it is to keep things apart.
The downside is that you now have two maintenance operations; one for GI and one for the database. But if your draining strategy works and/or you are using Application Continuity, you can complete hide the outage from your end users.
If you have a legacy application or draining is a nightmare for you, then it does make sense to consider patching GI and database at the same time.
I recommend out-of-place patching. There are multiple ways of doing that. Choose the one that suits you best. My personal favorite is the SwitchGridHome method.
Happy Patching
There are even more methods than I have shown in this blog post series. I have demonstrated the methods that most people would consider. Evaluate the pros and cons yourself and choose what works best for you.
What’s your favorite? Why did you choose a specific method? Leave a comment and let me know.
If you decide to patch GI and database at the same time, be aware of the following. The database instance will need to restart two times. First, each instance goes does to switch to the new GI. The second time is when you switch on the last node. Then all database instances are brought down again in a rolling manner and restarted in the new Oracle Home.
If you want to control draining yourself, don’t use this method. The second database restarts happens completely automated one after the other. Without any possibility for you to intervene to control draining.
I started this blog post series to learn about patching Oracle Grid Infrastructure 19c (GI). After spending quite some time patching GI, I got a pretty good feeling about the pros and cons. Here’s my advice on patching Oracle Grid Infrastructure.
Use out-of-place patching.
Out-of-place patching allows you to prepare in advance and keep node downtime minimal. In addition, it makes fallback so much easier. Using the SwitchGridHome method, you can install multiple patches in one operation.
Use Application Continuity.
It is such a cool feature. You can completely hide node outages from your end users. No more late-night patching. Patch your systems when it suits you.
Patch GI and database in separate maintenance operations.
My personal preference is to keep things separate. First, you patch GI. When you know it works fine, you proceed with the database, for instance, the week after.
Keep GI and database patch level in sync.
This means that you must patch GI and your Oracle Database at the same cadence. Ideally, that cadence is quarterly. If you follow advice no. 5, your patch levels will be out of sync for a week or so, but that’s perfectly fine as long as you patch GI first.
Complete a rolling patch installation as quick as possible.
Regardless of whether you install patches to GI or the database, you should complete the rolling patch operation as quick as possible.
I have heard of customers that patch one node a day. In an 8-node RAC it will take more than a week to complete the patch operation. When in a rolling state certain things are disabled in GI. I strongly recommend that you complete the patch on all nodes as soon as possible.
Special Guest Star – Anil Nair
I had a chat with Mr. RAC himself, Anil Nair. Anil is Distinguished Product Manager at Oracle and responsible for RAC.
Anil’s top tips for patching Oracle Grid Infrastructure
I made it really far, and I learned a lot while writing this blog post series. I received good feedback on Twitter and LinkedIn, plus a lot of comments on my blog. Thank you so much. This feedback is really helpful, and I can use it to make the content so much better.
Also, a big thank-you to my colleagues that answered all my questions and helped me on my learning experience.