In previous blog posts, I have shown you have to patch Oracle Grid Infrastructure 19c (GI). When you patch GI, it happens in a rolling manner, node by node. The database is always up, but each instance is temporarily down.
If my application is connected to an instance going down, how do I ensure my application is not interrupted?
My demo system:
- Is a 2-node RAC. The nodes are named copenhagen1 and copenhagen2.
- Database is called CDB_COPENHAGEN
- PDB is called SALES
Connect Using A Custom Service
-
First, I create a dedicated service that my application must use:
[oracle@copenhagen1]$ srvctl add service \ -db CDB1_COPENHAGEN \ -pdb SALES \ -service SALESGOLD \ -preferred CDB11,CDB12 \ -failover_restore AUTO \ -failoverretry 1 \ -failoverdelay 3 \ -commit_outcome TRUE \ -failovertype AUTO \ -replay_init_time 600 \ -retention 86400 \ -notification TRUE \ -drain_timeout 300 \ -stopoption IMMEDIATE
- I want to use all nodes in my cluster for this service, so I set
preferred
to all instances of my database. - I set
failover_restore AUTO
to enable Transparent Application Continuity for the service. - When I stop the service, the sessions have 5 minutes to move to another instance as specified by
drain_timeout
. - If the sessions don’t drain in due time, the
stopoption
parameter tells that sessions are to be killed immediately.
- I want to use all nodes in my cluster for this service, so I set
-
Now I start the service:
[oracle@copenhagen1]$ srvctl start service \ -db CDB1_COPENHAGEN \ -service SALESGOLD
-
I add an alias called SALESGOLD to tnsnames.ora:
SALESGOLD = (DESCRIPTION = (CONNECT_TIMEOUT=90)(RETRY_COUNT=50)(RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3) (ADDRESS_LIST = (LOAD_BALANCE=on) (ADDRESS = (PROTOCOL = TCP)(HOST=copenhagen-scan)(PORT=1521)) ) (CONNECT_DATA= (SERVICE_NAME = SALESGOLD) ) )
- I connect to SALESGOLD service using the
SERVICE_NAME
attribute. - I connect via the SCAN listener.
- I connect to SALESGOLD service using the
-
I configure my application to connect via the SALESGOLD alias.
You should always configure your application to connect to a custom service. The database has default services, but you should not use them for your application. Custom services are the way to go if you want a highly available system.
Let’s Patch
It doesn’t matter whether you use in-place or out-of-place patching. If you are using out-of-place patching, do as much of the preparations as possible. This procedure starts right before a node goes down.
- First, I want to patch GI on copenhagen1.
- I stop my service SALESGOLD on that node only:
[oracle@copenhagen1]$ srvctl stop service \ -db CDB1_COPENHAGEN \ -service SALESGOLD \ -instance CDB11 \ -force
- I use the
instance
parameter to stop the service only on the node I am about to patch. In this case, CDB11 is the database instance name of my database CDB_COPENHAGEN running on copenhagen1. - GI marks the service as OFFLINE on copenhagen1, and new connections will be directed to the other node.
- GI sends out a FAN event telling the instance is going down. The Oracle driver on the clients detects the event.
- When I created the service, I set the drain timeout to 300. This gives database sessions 5 minutes to finish their work.
- If desired, I can override the value using
-drain_timeout
.
- If desired, I can override the value using
- If a session becomes idle before the timeout, the driver on the client automatically reconnects to the other node. The session continues on the other node; no error message is displayed.
- If the session is still active after the timeout, it will be killed immediately. This I defined when I created the service using
stopoption
parameter.- I enabled Transparent Application Continuity (TAC) on my database service using the
failover_restore
parameter. TAC detects the session is killed and attempts to intervene. - If my application is fit for TAC, a new connection is made, any in-flight transaction is replayed in the new session; no error message is displayed.
- If my application is not fit for TAC or if I was doing something that is not replayable, my session is terminated, and my application receives an error.
- I enabled Transparent Application Continuity (TAC) on my database service using the
- I use the
- Now, all sessions using SALESGOLD service have been drained from copenhagen1. I can bring down the GI stack to patch.
- I use
opatchauto apply
for in-place patching. - Or, I use
opatchauto apply -switch-clone
for out-of-place patching. - Or, I use
root.sh
for out-of-place patching withswitchGridHome
. - Or, whatever method suits my environment. It doesn’t matter.
- I use
- When the GI stack is restarted, my service SALESGOLD is automatically restarted. Now, my application can connect to the instance on copehagen1 again.
- I ensure that the service is started:
[grid@copenhagen1]$ crsctl stat resource -t
- I repeat steps 2-5 for the second node, copenhagen2.
My Application Can’t Use Application Continuity
If, for some reason, you are not ready to use Application Continuity or Tranparent Application Continuity yet, you can still use this procedure.
You can change the failover_restore
setting on the service to something that suits your application. You can still perform draining, but you need to find another method for dealing with those sessions that don’t drain in due time.
One option is to set the drain_timeout
parameter high enough to allow everyone to finish their work and reconnect to another node. Just be aware that the other node must be able to handle the entire workload during that period.
If draining doesn’t work for you either (or the patch is non-rolling), you must take an outage to patch. You can follow the Minimum Downtime approach described in Rolling Patch – OPatch Support for RAC (Doc ID 244241.1):
- Shut down GI stack on node 1
- Patch node 1
- Shut down GI stack on node 2
- Patch node 2
- Shut down GI stack on node 3
- Start up node 1 and node 2
- Patch node 3
- Start up node 3
Happy Patching!
Appendix
Other Blog Posts in This Series
- Introduction
- How to Patch Oracle Grid Infrastructure 19c Using In-Place OPatchAuto
- How to Patch Oracle Grid Infrastructure 19c Using Out-Of-Place OPatchAuto
- How to Patch Oracle Grid Infrastructure 19c Using Out-Of-Place SwitchGridHome
- How to Patch Oracle Grid Infrastructure 19c Using Zero Downtime Oracle Grid Infrastructure Patching
- Which Method Should I Choose When Patching Oracle Grid Infrastructure 19c
- How to Avoid Interruptions When You Patch Oracle Grid Infrastructure 19c
- Patching Oracle Grid Infrastructure And Oracle Data Guard
- Use Cluster Verification Utility (cluvfy) and Avoid Surprises
- A Word about Zero Downtime Oracle Grid Infrastructure Patching
- Why You Need to Use Oracle Fleet Patching and Provisioning
- My Best Advice on Patching Oracle Grid Infrastructure
- Pro Tips
Further Reading
- Product page: Oracle Application Continuity
- Documentation: Managing Planned Maintenance Without User Interruption, Real Application Clusters Administration and Deployment Guide 19c
- Documentation: Support for Oracle Application Continuity and Transparent Application Continuity, 19c
- Technical Brief: Continuous Availability – Application Continuity for the Oracle Database
- YouTube: Transparent Application Continuity with Oracle Active Data Guard – Demo
- Blog post: Database Outage? Who cares???
- Blog post: Get started with Oracle Application Continuity
Hi Daniels. Thanks for the blog.
One question. Do we need both settings in the service definition and in the tnsnames.ora?
And second question. Does Application Continuity is the same as TAC in your demo?.
LikeLike
Hi Williams,
Thanks for the feedback. Much appreciated.
In the service definition, you are turning “things” on. In the connect string (tnsnames.ora) you define certain timeout values. The critical part is the service definition. There are defaults for the timeouts in the connect string, but you should change them. The technical brief I am referencing has more details on the settings.
AC should be able to do the same as TAC. First, I am not an expert on this, so please take that into account. With AC you have much more control over the replay when it happens. With TAC things are handled for you – but gives you less control – so it might not be suitable for all application. Again – have a look in the technical brief. It discusses the two features.
Regards,
Daniel
LikeLike