Estimating Data Compression Savings in SQL Server

Introduction

SQL Server Data Compression was first introduced in SQL Server 2008. Unfortunately, it was an Enterprise Edition only feature until SQL Server 2016 SP1. This feature lets you compress individual row store indexes with either ROW or PAGE compression. Data compression can be a huge win for many SQL Server workloads. It can reduce both your required disk I/O and your memory usage at the cost of some added CPU usage in some scenarios. This post gives you a script you can use for estimating data compression savings in SQL Server. You can get the script from here.

By design, there is no “Compress Entire Database” command in SQL Server.

By design, there is no “Compress Entire Database” command in SQL Server. Instead, you need to evaluate individual indexes, based on their size, estimated compression savings and volatility. The ideal case is a large table that shows very good compression savings that is read-only. Often, you might have large audit or logging tables that only see INSERTS, which are usually great data compression candidates. A terrible candidate is a small table, that does not show much compression savings, with very volatile data.

I long ago got tired of manually running the sp_estimate_data_compression_savings system stored procedure for each index in a database. Instead, I decided to write some T-SQL that would somewhat automate the process. If you set the schema name, table name, and desired data compression type in the variable declarations at the top of the script, you will get some pretty detailed information about all of the indexes in that table.

Running my query will help you make a more informed decision about which indexes to compress.

In SQL Server 2019, sp_estimate_data_compression_savings was enhanced by adding the COLUMNSTORE and COLUMNSTORE_ARCHIVE compression types as options. SQL Server 2022 will add XML compression as an option.

**Estimating Data Compression Savings in SQL Server**

You should run the entire query at once, after you have supplied your own values. It may take some time to run, depending on your hardware, storage and on how large your tables are. You could also use some of this code in a stored procedure that you could call for each table in a database. Then, it could also write the results out to a table that you could easily query later.

I also have a video that demonstrates these queries.

Estimating Data Compression Savings in SQL Server

Final Words

Since the Data Compression feature is now available in SQL Server Standard Edition, I am seeing more organizations use it. Many organizations are moving from legacy versions of SQL Server to a modern version, so this makes perfect sense.

Modern processors are also significantly more powerful than they were back in 2008, so the CPU overhead of data compression is much less of an issue than it used to be. I have a couple of blog posts that describe the best current processors for SQL Server usage.

If you have any questions about this post, please ask me here in the comments or on Twitter. I am pretty active on Twitter as GlennAlanBerry. Thanks for reading!

2 thoughts on “Estimating Data Compression Savings in SQL Server”

Eitan Blumin December 4, 2020 — 11:55 pm

Hi Glenn, thank you for sharing this post.
I wrote a similar script a while ago (for similar reasons to your own), and published a blog post about it here:

https://eitanblumin.com/2020/02/18/ultimate-compression-savings-estimation-script-entire-database/

In writing that script, I’ve purposefully set out to implement the “ultimate” and “all-encompassing” compression savings estimation script, based on research and findings published by several people on the topic.

It takes everything into consideration:
Rate of updates vs. rate of selects.
Percentage of compressible data.
Row vs. Page estimation.

I’m interested to hear more about the new COLUMNSTORE and COLUMNSTORE_ARCHIVE compression types added in SQL 2019, and how the compression savings estimation procedure works with those.

At the time of this writing, my “ultimate” script does not yet check columnstore compression types.

Obviously, there would be additional considerations to using columnstore indexes besides the compression itself.
So, I’d be interested to learn more about that and consider how/whether I could integrate such considerations into my existing script.

I’d love to hear your thoughts.
Thanks,
Eitan

Loading...

Pingback: T-SQL Tuesday #136 – My Favorite Data Type

Estimating Data Compression Savings in SQL Server

Introduction