In 2009 we ran into some problems when failing over to the Basecamp slave database. Basecamp relies on a keeping large working set of recently-accessed data in its InnoDB buffer cache for speed. Normal MySQL replication only sends writes, not reads, to the slave. How could we ensure the data in the slave’s cache is up to date?

We contracted Percona to build a solution into their Maatkit toolset based on their experiments with SELECT query mirroring. It involves a clever usage of tcpdump to capture and replay SELECT queries from the master to the slave database.

Here’s the resulting command.

/usr/bin/mk-query-digest --statistics --iterations 4 --run-time 15m --type tcpdump
--filter '$event->{arg} && $event->{arg} =~ m/^SELECT/i'
--statistics --execute \"h=db-slave,P=3306,u=slave,p=password,D=production\" 
--execute-throttle 70,30,5

The tcpdump utility captures MySQL traffic from the master and feeds the data into the mk-query-digest script. This script filters only the SELECT queries and executes them on the slave database. The throttle argument sets the percentage of time the script should execute queries on the slave, how often to check that value, and a percentage probability that queries will be skipped when the threshold is exceeded.

Here’s some sample statistical output:

# execute_executed      124668
# throttle_checked_rate     29
# throttle_rate_avg      29.84
# throttle_rate_ok          29

According to these values, the script didn’t reach the 70% query execution threshold we set. Our queries are executing on the slave cleanly.

Since we began using this tool we switched production database servers without a performance reduction.