Some Intel QAT Backup Compression Results

Introduction

With the release of SQL Server 2022 RC0, the Intel QAT Backup Compression feature has become publicly available. I have been doing extensive testing of this feature on a dedicated test machine, and I wanted to share some Intel QAT backup compression results.

I wanted to use a relatively affordable mainstream client desktop system for my testing. A noisy rackmount server (with slower CPUs) was not an option.

I built and configured a system in order to purposely maximize database backup performance. My main limiting factor is the total number of PCIe 4.0 lanes that are available for the CPU and the Z690 chipset. The second limiting factor is the performance of the Samsung 980 PRO M.2 PCIe 4.0 NVMe TLC SSDs.

A final bottleneck is the single and multi-threaded performance of the Intel Core i7-12700K processor. This is actually a very high-performance client CPU that only has eight physical cores. It will smoke any current Intel or AMD server CPU for single-threaded performance.

Some Intel QAT Backup Compression Results
Intel Core i7-12700K CPU-Z Result

I have much more detailed information on the test system later in the post.

Test Methodology

This test sequence includes 37 full database backups of a 434GB database that has two data files and one log file. For each backup, I record the elapsed time and total system CPU utilization. I also record the CPU temperature and the temperature of the two backup drives. There is a short break between each backup to let the backup drive temperatures go back down.

This is the sequence of database backups:

  • One backup to one NUL device
    • This shows how fast we can read the database files
    • Adding additional NUL devices has a negligible effect with this system
  • Nine backups with no backup compression
    • One backup file on one drive
    • Two backup files on two drives
    • Four backup files on two drives
    • Six backup files on two drives
    • Eight backup files on two drives
    • Ten backup files on two drives
    • Twelve backup files on two drives
    • Fourteen backup files on two drives
    • Sixteen backup files on two drives
  • Nine backups with native SQL Server backup compression (MS_EXPRESS)
    • Same pattern of backups
    • This is the legacy native backup compression from SQL Server 2008
  • Nine backups with Intel QAT software mode backup compression (QAT_DEFLATE)
    • Same pattern of backups
    • This mode does NOT require QAT hardware
    • This mode is supported on SQL Server 2022 Standard Edition and above
  • Nine backups with Intel QAT hardware mode backup compression (QAT_DEFLATE)
    • Same pattern of backups
    • This mode requires QAT hardware and SQL Server 2022 Enterprise Edition

By design, I am reading all of the database files from the P: drive, which is connected directly to the CPU. The database backup files are evenly spread between the Q: and R: drives, which are connected to the Z690 chipset.

Even with a 434GB database, most of these backups are quite fast with my hardware and storage configuration. Even so, running 37 full database backups and recording the metrics takes some time.

Some Intel QAT Backup Compression Results

So, after all of that preamble, let’s see some results! For this run, the system was idle (except for the backups), and MAXTRANSFERSIZE = 4194304 (4MB). The script I used for this, with the complete results, is here. I’ll just present the more interesting results in this post.

One Backup File to One Drive

The most common database backup scenario that I see is just running the backup to a single file on a single logical drive. Running to NUL just reads the data with no compression and without actually writing the backup file anywhere. This also gives you an aspirational goal, to see how close you can get an actual backup compared to just running it to NUL.

One Backup File Results

With just one backup file, using no compression gives you the fastest backup on my system, but also uses the most disk space. Most production database servers will not have 4,900 MB/sec of sequential write performance on the backup drive, so using some sort of backup compression is usually faster than not using backup compression.

Even with only one backup file, QAT_DEFLATE – Software is more than twice as fast as using the legacy MS_EXPRESS backup compression, while using the same amount of CPU. QAT_DEFLATE – Hardware is slower than MS_EXPRESS, but only uses 3% of the CPU to do a compressed backup. The elapsed time gap between QAT_DEFLATE – Software and QAT_DEFLATE – Hardware would be smaller on a slower CPU (like a current real server would have).

Two Backup Files to Two Drives

This is a striped backup, where there are two backup files. In this case, they are each on two separate physical drives, but you also get a benefit from striping to just one logical drive (as long as that one logical drive is not limited by its sequential write performance).

Having two backup devices makes every actual backup type significantly faster. It also pushes up CPU utilization when using software backup compression on my system. Notice that QAT_DEFLATE – Hardware is now almost as fast as MS_EXPRESS with much lower CPU utilization.

Two Backup Files Results

Four Backup Files to Two Drives

Going to four backup files spread evenly across two physical drives lets QAT_DEFLATE – Software catch up to no compression, albeit with much higher CPU utilization. Even so, QAT_DEFLATE – Software has less CPU utilization than MS_EXPRESS. All forms of compressed backups are significantly faster with four files than with two files.

Some Intel QAT Backup Compression Results
Four Backup Files Results

Eight Backup Files to Two Drives

This configuration seems to be a recurring sweet spot on my test system. My system has 8C/16T and having one physical core per backup file appears to be very beneficial.

Of course, software backup compression is using 100% of the CPU in this situation, but QAT_DEFLATE – Software is over twice as fast as legacy MS_EXPRESS. If your main goal is to finish a backup as fast as possible, and you don’t care how much CPU is used, then QAT_DELATE – Software is your best choice on a system like this.

If you have SQL Server 2022 Enterprise Edition and QAT capable hardware, then QAT_DEFLATE – Hardware minimizes the CPU impact of running a compressed backup, while being much faster than MS_EXPRESS software compression.

Some Intel QAT Backup Compression Results
Eight Backup Files Results

Test Hardware Details

When I built this system, the AMD Ryzen Threadripper 5000 Series processors were not available for DIY use. Even if they were, they are frightfully expensive. For example, a 32C/64T AMD Ryzen Threadripper PRO 5975WX is currently $3,299.99 at Micro Center. This is a great CPU for professional content creators, but it was more than I wanted to spend on a lab system.

So, after some extensive research, I decided to build an Intel Z690 mainstream client desktop system with an Intel Core i7-12700K CPU. My objective was to try to eliminate any hardware or storage related bottlenecks as much as possible with the resources I had available. I also wanted to control thermal throttling so I would have consistent test results.

Lab Test System

Here are the details about my test system and how it is configured:

  • Open Benchtable
  • ASUS Prime Z690-P D4 motherboard, BIOS version 1402
    • This is a relatively affordable Z690 motherboard with three M.2 PCIe 4.0 slots
    • It has a Realtek 2.5Gps NIC that works with Windows Server 2022
  • Intel Core i7-12700K CPU “Alder Lake”
    • This is the best Intel Alder Lake SKU for this purpose from a price/perf perspective
    • The Intel Core i9-12900K is slightly faster, but is nearly double the cost
  • All-core enhancement is enabled
  • Disabled the E-cores in the BIOS, so we have 8C/16T, all P-cores
    • E-cores are great for laptops running Windows 11
    • Unfortunately, E-cores are terrible for SQL Server usage
  • Using the Intel UHD 770 integrated graphics (no discrete GPU)
    • This lets me use the primary PCI_E1 slot for the Intel QAT 8970 card
    • It also reduces the cost and power usage of the system
  • 128GB of DDR4-3200 CL16 RAM, XMP is enabled
  • There is a PCIe 4.0 x8 link between the CPU and Z690 chipset
    • The Intel Z690 was the only desktop chipset that had this capability
    • This lets two M.2 PCIe 4.0 x4 NVMe SSDs have four PCIe 4.0 lanes each
  • C: 1TB Samsung 870 EVO in SATA A (Z690 chipset)
    • The OS and SQL Server are installed here (by design)
    • I wanted all of my M.2 slots and lanes available for backups and restores.
  • P: 1TB Samsung 980 PRO in M2_1 slot (CPU, PCIe 4.0 x4)
    • All of the database files are on this drive
    • It can do 6,888 MB/sec of sequential reads
  • Q: 1TB Samsung 980 PRO in M2_2 slot (Z690 chipset, PCIe 4.0 x4)
    • It can do 4,929 MB/sec of sequential writes
  • R: 1TB Samsung 980 PRO in M2_3 slot (Z690 chipset, PCIe 4.0 x4)
    • It can do 4,913 MB/sec of sequential writes
  • Intel QAT 8970 in PCI_E1 slot (CPU, PCIe 5.0 x16)
    • The Intel QAT 8970 is only PCIe 3.0 x16, so it is running PCIe 3.0 x16 in this slot
  • Noctua DH-15 chromax.black CPU cooler
    • This keeps the CPU from thermal throttling while using software backup compression
  • Be quiet! MC1 Pro M.2 heatsink on Q: and R: drives
    • This helps keep the M.2 drives from thermal throttling during large, sustained writes

Here is an example of what the 1TB Samsung 980 PRO can do in CrystalDiskMark. This is the P: drive, where all of the database files are located.

1TB Samsung 980 PRO CDM Results

This is the Q: drive, which is an available backup target.

Some Intel QAT Backup Compression Results
1TB Samsung 980 PRO CDM Results

This is the R: drive, which is another available backup target. I can stripe backups to both drives at the same time and still get the full PCIe 4.0 x4 bandwidth (which is limited by the actual drive performance) on both drives.

Some Intel QAT Backup Compression Results
1TB Samsung 980 PRO CDM Results

This is because the Intel Z690 chipset has a PCIe 4.0 x8 DMI link between the chipset and the CPU. You can see that link in the Z690 block diagram.

Intel Z690 Chipset Block Diagram

These three CDM tests were run simultaneously, to confirm that the three Samsung 980 PRO drives were not being limited by PCIe bandwidth or the CPU’s ability to consume that bandwidth.

Analysis and Caveats

My lab system has faster local storage (in terms of sequential bandwidth) than most Production database servers. It is also configured (by design) to favor backup performance over restore performance and database performance. The system can do about 9,800 MB/sec for writes to the backup files and about 6,900 MB/sec for reads from the data files of the database.

This means that uncompressed database backups are extremely fast. If your backup drive(s) have much lower sequential write performance, backup compression will usually be faster than uncompressed backups.

The Intel Core i7-12700K processor in my lab system is much faster (for single-threaded performance) than any currently released server processor from AMD or Intel. This makes software backup compression very fast, since those fast Alder Lake cores are doing the compression work. It also makes the Intel QAT 8970 card look worse than it is compared to a slower server processor.

Overcoming Objections

You might be thinking, “This is interesting, but I can’t use it because of xxx”. Let me try to overcome some common objections and misconceptions about Intel QAT backup compression.

  • “We are running in the cloud”
    • If you are in a PaaS environment, then this is a problem. Database backups are typically abstracted by the cloud provider, so this is out of your control
    • If you are in an IaaS environment, you can always use QAT_DEFLATE Software. Some cloud providers will have QAT capable processors in 2023
  • “We are running virtualized”
    • You can always use QAT_DEFLATE Software on the guest OS
    • If the host machine has QAT capable hardware it might be exposed to the guest OS
  • “This requires a processor that has a QAT accelerator built in”
    • No, it does not. You can use an Intel QAT 8970 card in any system. Intel may have a newer replacement for the QAT 8970 in the near future
  • “This requires an Intel processor”
    • No, it does not. You can use an Intel QAT 8970 card in any system. Intel may have a newer replacement for the QAT 8970 in the near future that would work with any processor
  • “We only have SQL Server Standard Edition”
  • “We run SQL Server on Linux”
    • Ok, that is a showstopper right now. QAT backups are only supported on SQL Server 2022 on Windows

Final Words

If you care about the elapsed times of your database backups, then striped backups are your friend! Using QAT backup compression (either with software mode or hardware mode) in SQL Server 2022 also makes a huge difference in elapsed times compared to legacy backup compression.

With SQL Server 2022 Enterprise Edition, you can use either hardware mode (if you have QAT hardware) or software mode. Hardware mode lets you protect your CPU from the CPU hit you will see while running a compressed database backup, especially with a high stripe count.

If you only have SQL Server 2022 Standard Edition, you can only use QAT software mode. This is true even if you have QAT capable hardware.

It seems to me that if legacy backup compression makes sense with your data and infrastructure, then QAT backup compression should make even more sense.

I have two related posts about this feature:

Here is some additional background about Intel QAT:

If you have any questions about this post, please ask me here in the comments or on Twitter. I am pretty active on Twitter as GlennAlanBerryThanks for reading!

SQL Server 2022 , ,

Leave a Reply

%d bloggers like this: