Avoid Problems on the Primary Database by Testing on a Snapshot Standby

One of the advantages of standby-first patch apply, is that I can test the patches in a production-like environment (the standby) before applying them to the primary. Should I find any issues with the patches, I can stop the process and avoid impacting the primary database.

Here’s an overview of the process.

For demo purposes, my Data Guard configuration consists of two databases:

  • SID: SALES
  • Databases: SALES_COPENHAGEN and SALES_AARHUS
  • Hosts: copenhagen and aarhus
  • Primary database: SALES_COPENHAGEN running on copenhagen

How To

This procedure starts right after I’ve patched the standby (SALES_AARHUS). It runs out of the target Oracle home, whereas the primary database (SALES_COPENHAGEN) still runs on the source Oracle home.

  • Convert the patched standby to a snapshot standby:

    DGMGRL> convert database SALES_AARHUS to snapshot standby;
    
  • Test the patch apply by running Datapatch on the standby:

    [oracle@aarhus] $ORACLE_HOME/OPatch/datapatch
    
    • One always runs Datapatch on the primary database and the changes made by the patches goes into redo to the standby.
    • But, since I converted to a snapshot standby, it is now opened like a normal database and I can run Datapatch on it.
    • If Datapatch completes without problems on the standby, I can be pretty sure it will do so on the primary as well. The standby is after all an exact copy of the primary database.
  • Optionally, perform additional testing on the standby.

    • I can connect any application and perform additional tests.
    • I can use SQL Performance Analyzer to check for regressing SQL statements.
    • I can make changes to any data in the standby. It is protected by a restore point.
  • When done, convert the snapshot standby back to a physical standby:

    DGMGRL> convert database SALES_AARHUS to physical standby;
    
    • This implicitly shuts down the standby, flashes back to the restore point and re-opens the database as a physical standby.
    • All changes made when it was a snapshot standby, including the Datapatch run, are undone.

Continue the patching procedure on the primary database as described elsewhere.

Is It Safe?

Sometimes, when I suggest using the standby for testing, people are like: Huh! Seriously?

What Happens If I Need to Switch Over or Fail Over?

I can still perform a switchover or a failover. However, they will take a little bit longer.

When I convert to snapshot standby:

  • Redo transport is still active.
  • Redo apply is turned off.

So, the standby receives all redo from the primary but doesn’t apply it. Since you normally test for 10-20 minutes, this would be the maximum apply lag. On a well-oiled standby, it shouldn’t take more than a minute or two to catch up.

When performing a switchover or failover on a snapshot standby, you should expect an increase with the time it takes to:

  • Shut down
  • Flashback
  • Apply redo

I’d be surprised if that would be more than 5 minutes. If your RTO doesn’t allow for a longer period:

  • Get a second standby.
  • Consider the reduction in risk you get when you test on the standby. Perhaps a short increase in RTO could be allowed after all.

What Happens If Datapatch Fails

If Datapatch fails on my snapshot standby, I should be proud of myself. I just prevented the same problem from hitting production.

  • I grab all the diagnostic information I need, so I can work with Oracle Support on the issue.
  • Convert back to physical standby, which will undo the failed Datapatch run.
  • If I expect to solve the issue quickly, leave the standby running in the target Oracle home. Otherwise, put it back into the source Oracle home.

So, yes, it’s safe to use!

Happy testing

Appendix

Other Blog Posts in the Series

6 thoughts on “Avoid Problems on the Primary Database by Testing on a Snapshot Standby

  1. Hey Daniel,

    Indeed snapshot standby databases are an overlooked treasure. I would only suggest to make a reminder to set the DB_RECOVERY_FILE_DEST_SIZE large enough. I’ve seen too many clients with a small FRA (“because it only needs to apply redo from the primary database”) and then the instance hangs when using it in snapshot standby mode.

    Kind regards,

    Michael

    Like

    1. Hi Michael,
      That’s a good point. But I guess such customers would run into problems after a switchover as well. I guess they don’t use that standby database very much.
      Thanks for sharing,
      Daniel

      Like

  2. Hey Daniel,

    your articles are helpful and reeeeally appreciated!

    One thing I like to add: Can you perhaps add some more info about licensing requirements for the examples you mention?

    E.g. Data Guard Snapshot Standby: what license required? Or SQL Performance Analyzer: Tuning Pack required??

    Any feature usage is recorded in the database. One thing customers really like to avoid is running into the trap and pay a lot of extra money after the next license audit (which happened). If unsure about that we definitely leave hands off.

    Best regards
    Stephan Reinhold

    Like

    1. Hi Stephan,

      Thanks for your positive feedback. Much appreciated! This motivates me to write even more blog posts.

      I think you raise a really good point here. I’m aware of this situation, and I’d hate if a customer ran into problems because of a recommendation, I gave.

      In my position, I don’t work with licensing at all. Also, I know there are multiple ways where licensing are sorted with customers, so I hesitate to comment on something that I don’t know much about. If I start to mention which features are licensable, I need to double-check that everytime, and what if I forget at one point? Will a customer come after me?

      Every customer has the responsibility to stay complaint and the License Guide (found in the documentation) contains a current list.

      I wish it could be different, but that’s the position I’ve decided for my personal blog. When I write something **officially** for Oracle, then it’s of course different.

      I know it’s not the answer you hoped for, but now you at least know why I decided to do it this way.

      Regards,
      Daniel

      Like

  3. Hi Daniel,

    After introducing the bug 3064205.1 and fixing it “professionally” via 37290880, your statement sadly is no longer valid :(
    “If Datapatch completes without problems on the standby, I can be pretty sure it will do so on the primary as well”

    Regards, Nariman

    Like

    1. Hi,
      I don’t see how that changes the general behavior? This seems to be related to specific workload which you of course don’t have on a standby database (unless you run with ADG). There might be a specific problem with this Release Update in some configurations, but generally you can still expect to find most issues (except such types of concurrency issues) when patching the snapshot standby database.
      Regards,
      Daniel

      Like

Leave a reply to Daniel Overby Hansen Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.