Microsoft SQL Server 2008 Data Compression

May 28, 2009

We’re all feeling the financial crunch these days, and an increasing number of companies are trying to figure out how to get the most out of their IT dollars. One of the cost-saving ideas being floated around IT departments is to extend the replacement cycle of hardware. This can be done by extending your hardware manufacture warranty for one or more years. But extending the hardware lifecycle can only do so much. What can you do when you are running low on hardware resources like disk space?

SQL Server 2008 introduces some new features to help deal with disk capacity issues without making IT go out and buy more disk drives. Some of SQL Server’s new compression options can help you reclaim some of your valuable disk space and postpone the purchase of additional disk capacity for a while, maybe even years. In this article, I will discuss table and backup compression options available in SQL Server 2008 to help you reduce your database disk space usage.

Data Compression was first introduced in SQL Server 2005 with Service Pack 2 where Microsoft introduced a new storage format called vardecimal to store decimal and numeric data types as variable length columns. In SQL Server 2008 this concept is applicable even to fixed length data types such as integer, char and float. The main advantage of using Data Compression is that it reduces the storage cost and increases the query performance by reducing I/O.  Database Administrator needs to be very careful before using Database Compression as it is more CPU intensive activity. If you are not facing CPU crunch and you have more I/O based activities then it is a good idea to implement data compression, which can give you better query performance

Compressing Tables and Indexes
SQL Server 2008 allows you to compress both tables and indexes. Compression happens at the row or page level. No application changes are required to implement table or index compression. Data compression is supported for the following table objects:

  • Tables stored as a heap
  • Tables stored as a clustered index
  • Non-clustered indexes
  • Indexes views
  • Each partition of a partitioned table

When a clustered index is compressed the non-clustered indexes are not automatically compressed. You need to set up compression on non-clustered indexes when they are created or altered. System tables cannot be compressed.

Row and Page Compression
What is the difference between Row and Page level compression? Row compression happens on each individual row, where as Page compressions considers all row in a page and compresses a page based on common strings of data throughout a page.

Row compression takes all fixed length fields and compresses them by making them variable length fields. When I say fixed length fields I don’t mean only character data, but also numeric data. The numeric data type fields are evaluated, and are compressed into the smallest possible amount of space. For instance, a BIGINT column normally takes 8 bytes. But if a BIGINT column only contains, say, the value “1” then it would be compressed to use only 1 byte, instead of 8 bytes. In addition to compressing these fixed length fields row compression reduces the space required for metadata in each row.

Page level compression looks at an entire page when it does compression and compresses common strings of data. This is done by using row compression plus prefix and dictionary compression techniques. Prefix compression is a process of removing commonly occurring prefixes from column values. These prefixes are placed in a Compression Information (CI) structure in the header of the page, and then the column values are changed so they reference the CI information in the header. Dictionary compression is done after prefix compression. Dictionary compression is the process of finding repeating occurrences of values in the page and then placing these values in the CI structure and replacing the values with appropriate index to the CI information.

The diagrams below, taken from SQL Server 2008 Books Online, visually represent how prefix and dictionary compression are done at the page level.

Page before compression Page after Prefix Compression Page after Dictionary Compression
clip_image001 clip_image002 clip_image003

When the first row is added to a page, page compression is not performed, only row compression is done. When the second row is added Page compression is undertaken. If the SQL Server engine determines that the space saving of page compression is not significant then page compress is not performed.

By using Row and Page compression SQL Server can save considerable amount of disk space. Let’s look at examples of how to set up compression.

Considerations for Compression
There are a few things you need to know about compression. First, compression is only available in the Enterprise and Developer editions of SQL Server 2008.

Compressing data allows more rows to be included in a page. For each I/O performed by the database engine the more rows of data are being read or written. Keep in mind that just because you are compressing rows, the maximum row size remains the same. SQL Server will not enable a table for compression if the maximum row size and the compression overhead space combined are greater than 8060. This is because SQL Server needs to ensure that a row can be stored in a single page even if each column is filled to it maximum capacity. If your table has multiple partitions you can enable data compression by partition, not all partitions need to be compressed. Indexes are not automatically compressed just because you compressed the table. You need to explicitly compress indexes if you want them compressed. There is one exception to this rule. If you define your table to use heap storage that is compressed, then when you create a clustered index for your table it will inherit the compression settings from the heap. If you change the compression settings of a heap, then all non-clustered index on that heap need to be rebuilt. This is so the row pointers in the non-clustered index can point to the new location for the row in the heap. If you store large record type data outside the row the large record data is not compressed.

Compressing data does have some impact on the internal workings of SQL Server. When a page is written to disk it is compressed and when it is read back it is uncompressed. This means additional CPU overhead is used to do this compression, which might be a concern if your machine does not have extra CPU capacity to burn.

How to Estimate Compression Savings
Microsoft has provided a stored procedure (SP) named “sp_estimate_data_compression_savings.” This SP helps you identify how much disk space will be saved if you implement compression on a table. With this SP you identify the object you want to compress and the type of compression you what to perform. The SP then takes your object and creates it in “tempdb” and then compresses it to determine how much space a compressed table will take. The amount of used space for the original table and the expected compressed file size are output from this SP. By using this SP you can determine which tables and compression types are best for your environment. Here is an example execution of this SP:

use AdventureWorks2008;


EXEC sp_estimate_data_compression_savings

@schema_name = ‘Production’

, @object_name = ‘TransactionHistory’

, @index_id = null

, @partition_number = null

, @data_compression = ‘ROW’;

In this example I’m checking how much space I can save by compressing the AdventureWorks200.Production.TransactionHistory table and all of its indexes using ROW compression. Here is the output I get from this SP:


From this output we can see that the AdventureWorks2008.Production.TransactionHistory clustered index is currently using 6,352 KB. The “sp_estimate_data_compression_savings” SP has identified that by using ROW compression the compressed size for clustered index will be 4,176 KB. This is a space savings of almost 35 percent. As you can see performing ROW compress on the other non-clustered indexes on this table is not so significant.

I can also check how much disk space will be saved if I use PAGE compression on the AdventureWorks.Production.TransactionHistory table by running the following command:

use AdventureWorks2008;


EXEC sp_estimate_data_compression_savings

@schema_name = ‘Production’

, @object_name = ‘TransactionHistory’

, @index_id = null

, @partition_number = null

, @data_compression = ‘PAGE’;

When I run this command I get the following output:


Here you can see the space savings by using PAGE compression is considerable more than using ROW compression for each index in the AdventureWorks.Production.TransactionHistory table. By using PAGE level compression you can see that index = 1 (the Clustered Index) saves around 65 percent of the currently used disk space for that index. The two non-clustered indexes do not have a huge amount of space savings, but still save some space is saved by using PAGE

How to Set Up Table Compression
There are a number of different ways to compress a table. Let me show you a few of these methods.


In SQL Server 2008 there is an inbuilt wizard namely Data Compression Wizard which enables you to estimate the savings in storage space when compression is enabled for a particular table or index in a database.
A DBA needs to first get connected to the SQL Server 2008 Instance and then expand the Databases Node | AdventureWorks Database | Tables, and right click Production.TransactionHistory table, select Storage and finally select Manage Compression as shown in the below snippet.
This will launch Data Compression Wizard Screen. Click Next to continue


Click the “Next” button, which will bring up this screen:


Here you can use the “Use same compression type for all partitions” checkbox option, and then the grayed out drop down box next to this label will become available to enter “None, “Row,” or “Page” compression option. This is useful when you have multiple partitions for your table. In my case The AdventureWorks2008.Production.TransactionHistory table resides on a single partition, so I can use the drop down under the “Compression” type column to select my compression type. I’ve decided to use “Page” compression. Once you have selected the compression type you might want to click on the “Calculate” button to determine the amount of space that will be used once the table is compress. The screenshot below shows what the wizard has determined will be my space savings in the “Requested compressed space” column:

Different Compression Type Options
Once you select the compression type option as Row and click the calculate button on the wizard, it will calculate and let you know the amount of storage space required when row compression type is used. 
Storage space required when row compression is used as compression type
If you select the compression type option as Page and click the calculate button on the wizard, it will calculate and let you know the amount of storage space required when page compression type is used. 


After reviewing the compressed space needed I can click on the “Next” button, which brings up this window:


Here I have multiple options for how to save the output of the wizard. I can create a script that can be used to compress my table, run the compression immediately, or schedule it. Click on “Next” or “Finish” will take you to a Data Compression Wizard Summary screen where you can click on the “Finish” button to complete whatever data compression output option selected. In my case I chose the “Run immediately” option.

You can also use T-SQL to compress an existing table. To do this you use the “ALTER TABLE” statement. Here is the script I would use to compress my AdventureWorks2008.Production.TransactionHistory table with the same compression options I selected using the wizard:

USE [AdventureWorks2008];


ALTER TABLE [Production].[TransactionHistory] REBUILD PARTITION = ALL


You don’t have to first create a table and then use the “ALTER TABLE” statement to compress it. You can also define that you want a table compressed as part of the CREATE TABLE statement. Here I have created an AuditLog table and requested that it use PAGE compression:


ID int identity(1,1),

LoginName varchar(128),

TransactionDescription varchar(100),

ApplicationName varchar(100),

TransactionDate date,

TransactionTime date



If your table has multiple partitions, each partition can have a different compression type. Here is an example of how to create a table where each partition has a different compression type:

CREATE TABLE OrderTracking

(ID int identity,

OrderID int,

OrderDate Date,

OrderTime Time,

DateSent Date,

TimeSent Time)

ON OrderDateRange (OrderDate)







In this code, partition 1 is not compressed, partition 2 is using ROW compression, and partition 3 uses PAGE compression.

Cost Savings
The cost and time saving advantages of using SQL Server 2008 compression feature should allow companies to immediately obtain some cost savings returns for their investment in upgrading to SQL Server 2008. If you are running up against space constraints on your server than potentially compressing tables and/or backups might allow you to postpone or eliminate an expensive disk acquisition in the future. If you are already using a third-party tool to compress your backups then you might also eliminate the need for this tool. If your backup window at night is not long enough to accomplish all your backups and you have CPU to burn, then compressing your database backups you should deliver a reduction in the time it takes to backup your SQL Server 2008 databases. The disk cost savings and shorter backup time related to using SQL Server 2008 compression features need to be factored in when you consider acquiring SQL Server 2008.