Hey, Let Me Kill Your Network!

This short story is about the awesomeness of AutoUpgrade and refreshable clone PDBs.

Colleagues of mine were testing upgrades to Oracle Database 23ai using refreshable clone PDBs. They wanted to see how fast AutoUpgrade would clone the PDB and how that affected the source system.

The Systems

The source and target systems were identical:

  • Exadata X10M
  • 2-node RAC
  • 190 CPU/node
  • 25Gbps network/node

The database:

  • 1 TB in size
  • All data files on ASM

The Results

The source database is Oracle Database 19c. They configured AutoUpgrade to upgrade to Oracle Database 23ai using refreshable clone PDBs. However, this test measured only the initial copy of the data files – the CLONEDB stage in AutoUpgrade.

Parallel Time Throughput Source CPU %
Default 269s 3,6 GB/s 3%
Parallel 4 2060 0,47 GB/s 1%
Parallel 8 850 1,14 GB/s 1%
Parallel 16 591 1,65 GB/s 2%

A few observations:

  • Cloning a 1 TB database in just 5 minutes.
  • Very little effect on CPU + I/O on source, entirely network-bound.
  • The throughput could scale almost up to the limit of the network.
  • By the way, this corresponds with reports we’ve received from other customers.

Learnings

  • The initial cloning of the database is very fast and efficient.
  • You should be prepared for the load on the source system. Especially since the network is a shared resource, it might affect other databases on the source system, too.
  • The target CDB determines the default parallel degree based on its own CPU_COUNT. If the target system is way more powerful than the source, this situation may worsen.
  • Use the AutoUpgrade config file entry parallel_pdb_creation_clause to select a specific parallel degree. Since the initial copy happens before the downtime, you might want to set it low enough to prevent overloading the source system.
  • Be careful. Don’t kill your network!

Happy upgrading!

4 thoughts on “Hey, Let Me Kill Your Network!

  1. Unfortunately we killed our network and caused an outage. Don’t underestimate how few processes can utilize the network. In earlier times the IPC was the limit, but now it looks like it’s not the bottleneck anymore.

    Like

    1. Hi,

      Thanks for sharing your experience. Three things comes to my mind.

      1) Oh dear – I hope it didn’t cause too much of a drama (more than an outage normally do).
      2) Wow. It’s also kinda cool that it is so efficient that it eats all available bits available on the network.
      3) I need to keep this in mind when I present the feature. It’s really important to know upfront.

      Thanks again for sharing your knowledge. Sharing is caring!

      Daniel

      Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.