In 2009 we ran into some problems when failing over to the Basecamp secondary database. Basecamp relies on a keeping large working set of recently-accessed data in its InnoDB buffer cache for speed. Normal MySQL replication only sends writes, not reads, to the secondary. How could we ensure the data in the secondary’s cache is up to date?
We contracted Percona to build a solution into their Maatkit toolset based on their experiments with SELECT query mirroring. It involves a clever usage of tcpdump to capture and replay SELECT queries from the primary to the secondary database.
Here’s the resulting command.
/usr/bin/mk-query-digest --statistics --iterations 4 --run-time 15m --type tcpdump
--filter '$event->{arg} && $event->{arg} =~ m/^SELECT/i'
--statistics --execute \"h=db-secondary,P=3306,u=secondary,p=password,D=production\"
--execute-throttle 70,30,5
The tcpdump utility captures MySQL traffic from the primary and feeds the data into the mk-query-digest script. This script filters only the SELECT queries and executes them on the secondary database. The throttle argument sets the percentage of time the script should execute queries on the secondary, how often to check that value, and a percentage probability that queries will be skipped when the threshold is exceeded.
Here’s some sample statistical output:
# execute_executed 124668
# throttle_checked_rate 29
# throttle_rate_avg 29.84
# throttle_rate_ok 29
According to these values, the script didn’t reach the 70% query execution threshold we set. Our queries are executing on the secondary cleanly.
Since we began using this tool we switched production database servers without a performance reduction.
Note: This blog post was originally entitled “Keeping your slave warm”, and used the master/slave language throughout. We updated this to use the primary/secondary language in December of 2019, as the offensive nature of the original wording came to our attention.
seimen
on 12 Apr 10sounds like a ugly hack to me
James Byers
on 12 Apr 10I’ve found maatkit to be invaluable. For every awkward, time-consuming, or seemingly impossible MySQL administration task, it seems there’s an mk-something that eases the pain. Thanks for helping support Percona in building maatkit.
Pablo Torres
on 12 Apr 10You guys are known for loving the Mac. Do you use it for sysadminning too?
Ben Tucker
on 12 Apr 10Out of curiosity, why not have reads always distributed between the two DB servers from the clients and just have writes going to the primary DB? Seems this would solve the problem of keeping both caches warm with the added benefit of distributing the load. We’ve had great success using mysql-proxy for this task.
Anonymous Coward
on 12 Apr 10@37signals
I thought you were using Amazon EC2 for your entire deployment environment.
Unfortunately, per the article linked above, I never recall seeing the follow-up article how on well using EC2 went.
Anonymous Coward
on 12 Apr 10+1 Ben Tucker
I don’t understand why you want to “replicate” reads across your environment.
Only write’s should matter.
And then like Ben said, simply have multiple read-only servers.
Sounds like your current architecture needs some REWORK
MI
on 12 Apr 10Ben, the reason we don’t use mysql-proxy for that is that the last benchmarks I saw (admittedly quite old) showed it adding a very significant amount of latency: http://www.mysqlperformanceblog.com/2009/06/09/mysql-proxy-urgh-performance-and-scalability/
Splitting reads and writes is definitely something we could do, but it’s not particularly valuable if your read load can be handled by a single well tuned server. It also means you need at least three database servers instead of two to maintain N+1 and ensure that you can lose a server without performance impact.
wk
on 12 Apr 10Best. Title. Ever.
Alexis
on 13 Apr 10I like this idea. However what happens if a select statement is inside and insert or update ?
MI
on 13 Apr 10Alexis, if you look closely you’ll notice that the SELECT regular expression pattern is bound to the beginning of the string (ie: ^SELECT) so it will only match queries that begin with SELECT. We also lock down the user who does the slave warming such that it only has SELECT privileges.
markd
on 13 Apr 10This title is why most companies would have an approval process for blog posts. I hope you guys can scale and stay classy.
DL
on 13 Apr 10Nice work around to your issue. I agree with @seimen but at the same time the solution has a simple elegance to it that is very 37signals.
I love the title.
George Anderson
on 13 Apr 10I’m genuinely surprised by these comments: “Best. Title. Ever.” and “I love the title.” Sure, you have the right to title your posts as you see fit, but “Keeping your slave warm” seems downright insensitive. Clever in some respect, maybe, but insensitive nonetheless given the innuendo.
Adam
on 14 Apr 10Lighten up, George.
markd
on 14 Apr 10Looking at it again, I’m not sure it’s clear that my previous comment was negative. This title is pathetic, and completely not in keeping with 37signals’ character.
Mark French
on 14 Apr 10I’m not a database guy, but I’m familiar with the concept of “master” and “slave” when it comes to electronic musical devices, and I found it shocking then too. But I realize that to the database guys, the title is kinda funny. I have a question though: is there any other synonymous terminology in usage for this process that isn’t so… emotionally loaded?
Thanks.
Matthew Savage
on 17 Apr 10Wow, are we really so wound up and PC that we cant even use the words ‘master’ and ‘slave’ in a neutral context any more? I suppose if this is the case then we should also ban the name Adolf… right?
Come on people, sure slavery is bad, but lets think before we take things out of context. Perhaps twitter has changed our thought processing to the point where we say it first, regret it later?
This discussion is closed.