Basecamp Classic’s database is actually split across two sets of servers. One set contains a single table which is approximately 430 Gbs or more than half the entire volume of data (across both sets) in total.

Two years ago we separated this table because of its growth and size compared to the other tables. By separating the table we could scale the database hardware more closely to data growth, and we kept InnoDB buffer pool evictions to a minimum which made performance more stable.

Recently our monitoring showed some less than desirable metrics regarding this database pair: the least of which was that free storage would be exhausted in about 90 days. There was also a number of slow queries due to insufficient buffer pool space and slow queries from data “on disk”. We had already exhausted the normal tuning approaches and we needed to find a solution for these problems that didn’t involve significant time or money expenditures.

There are two common methods used to keep growing MySQL databases peforming optimally: buying new hardware or reducing the volume of data such that approximately 80% fits in memory. Buying new hardware is expensive and usually incurs a high time and staffing penalty. In most situations reducing the amount of the data is impossible because the database is actually growing through active use.

What if there was another solution that didn’t involve extra spending or awkward contortions to reduce the volume of data?


Our investigation starts with table structure. In this case, we identified a single text column which appeared to contain most of the data. We decided it would be a good idea to test what type of space savings we might see (1):

mysql> select avg(length(body)) as uncompressed, avg(length(compress(body))) as compressed, avg(length(body))/avg(length(compress(body))) as ratio from recording_bodies where id > 156800158\G;
*************************** 1. row ***************************
uncompressed: 1409.9729
compressed: 600.2354
ratio: 2.34895825

So the uncompressed version of the data is an average of 2.34 times the size of the compressed version. In other words we would probably see somewhere around a 57.5% reduction in the size of our data! These numbers encouraged us to move forward with exploring table based compression.

(To test on your data set, you might pick some random id ranges of a couple thousand rows in size. We found a few samples of 10,000 randomly selected ids to be sufficiently close to reality.)

Since we had significantly limited disk space we were unable to use pt-online-schema-change(2) to compress the table in place. Instead we slaved a new server (with plenty of extra storage) to the master and ran the migration there. Once replication was caught up, we did the same to the existing slave database and then used our mysql_role_swap tool to accomplish a zero downtime master to slave failover. (After this role swap we rebuilt the old master accordingly.)

What were the actual results?


The outcome of using 8kb (default) compression was a table size of 172 Gigabytes or a 60% reduction in table size!

Since there was significantly less data to hold in memory, and on disk, slow queries were nearly eliminated. Finally, the need for additional storage was dramatically reduced. The existing size of data on disk had been reduced and new records would be an average of 40% smaller due to compression.

Before you run out and compress all your InnoDB tables, here’s what you need to know:


InnoDB table compression works best on tables with large text/blob columns that are filled with strings that compress well. It’s also going to be in workloads that are not heavily write (vs read) intensive. InnoDB table compression is not entirely free. The documentation says increased CPU utilization is a penalty paid for the compress/decompress operations. (Having shared this warning, we’ve actually seen reduced CPU utilization with compressed tables.)

If you are still interested in testing table compression:


You need to be using the InnoDB storage engine.

mysql> show variables like 'storage_engine'\G;
*************************** 1. row ***************************
Variable_name: storage_engine
        Value: InnoDB
1 row in set (0.00 sec)


Or…

mysql> show create table my_table_name;
(Look for this string on the last line  "ENGINE=InnoDB")


You need to be using innodb_file_per_table.

mysql> show variables like 'innodb_file_per_table'\G;
*************************** 1. row ***************************
Variable_name: innodb_file_per_table
        Value: ON
1 row in set (0.00 sec)

If you do not see this you should set “innodb_file_per_table = 1” in your my.cnf.


You need to identify tables that are good candidates for compression.

mysql> show create table bigfatcats\G;
*************************** 1. row ***************************
       Table: bigfatcats
Create Table: CREATE TABLE `bigfatcats` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cat_name` varchar(255) DEFAULT NULL,
  `huge_serialized_furballs` text,
  PRIMARY KEY (`id`),
ENGINE=InnoDB AUTO_INCREMENT=373737813 DEFAULT CHARSET=utf8

Again, you want to find tables with heavy text or blob column types which contain strings that compress well. Then you want to test compression using the commands earlier in this article.

If you are able to identify a table that would benefit from compression you are a single alter table away from reclaiming some space:

mysql> alter table bigfatcats ROW_FORMAT=COMPRESSED;


But I still have a few questions…


How well does this work with IO constrained databases?


We’ve observed compression to work well in such environments. The hardware supporting the aforementioned database pair is two older Dell R710’s with 64G of ram, 2x Intel E5530 processors @ 2.40GHz and a 6x ssd disk raid (behind the integrated Perc 6i raid card). Running on (Percona’s 5.5 build) with a 50G innodb buffer pool.


How can you select the right block size with a high degree of accuracy?


You can’t. Try 8kb to start.


How can you measure the approximate fit/success of our block size selection?


The information_schema database has an innodb_cmp table which allows you to compare the number of compression operations that succeed out of the total number of compression operations. In the case we described above that’s about 98.82%:

mysql> select compress_ops_ok/compress_ops as compress_ratio from INFORMATION_SCHEMA.INNODB_CMP where page_size = 8192\G;
*************************** 1. row ***************************
compress_ratio: 0.9882
1 row in set (0.00 sec)


Remember that you want a number as close to 1 (or 100%) here, otherwise the 8kb size may be too small and your table may not be a good fit for compression.

Where can I share my experience using InnoDB compression?


Maybe this is a solution to a MySQL problem you are facing? I know we were excited with the ease of implementation and high value using InnoDB compression provided in our situation. Do you have additional questions or comments? Please take a moment and share below.

If you’d like some deeper level information:


http://dev.mysql.com/doc/refman/5.5/en/innodb-compression.html
http://dev.mysql.com/doc/innodb/1.1/en/innodb-information-schema-examples-compression-sect.html
http://dev.mysql.com/doc/refman/5.5/en/innodb-cmp-table.html
http://dev.mysql.com/doc/refman/5.5/en/innodb-cmpmem-table.html

Notes:


(1) Hat tip to Percona for this old article that explains this in more detail.
(2) What’s that? You’ve never used this tool? Oh man you’ve been missing out!