Making Database Maintenance Fun: mysql_role_swap

We’ve come a long way in the last year in the way we operate our sites. We’ve stabilized our applications, improved their response time, and increased their availability.

To accomplish these improvements we’ve done a series of database maintenances that varied from upgraded hardware, to new database servers, to configuration changes that required restart. In each of these operations we had one common goal: minimize the interruption to our customers.

Today we are releasing a small script that has made our lives, and our customer’s lives a whole lot better. We use this script to change the roles of our databases from replication masters to slaves, and vice versa. The fact that the script does all the steps previously performed by a human in a more timely and perfect manner is where we achieve all the gain.

Without this script we used to spend minutes accomplishing these maintenance tasks. With the script we’ve swapped databases under production load with no user noticeable interruption!

The script has lots of hard coded paths and users and other assumptions. But this is too good to keep to ourselves. We’re sharing it with you with the hope that it will improve your operations experience, and that you will contribute back changes that make it even better.

Taylor wrote this on Nov 30 2012 There are 10 comments.

Nick

on 30 Nov 12

Why MIT as the license as opposed to BSD?

Davy

I love seeing posts like this. Could you explain how your setup works as far as the proxy? When it is paused and restarted, what is occurring behind the scenes and what are you using for a proxy?

Mikael

You may have left one of your DB passwords in the README file on GitHub, check database_one.slave_password.

Taylor

@Davy No proxy. We just point the apps at the VIP. When we run the script we just restart the app / reset DB connections.

@Mikael No security issue there. Made them the same.

Joseph

@Taylor I was wondering who you prevent dropped writes, is that even possible? What’s “the damage” when running this? A minute or two of errors? What’s the /tmp/hold part doing?

on 01 Dec 12

@Joseph We usually just accept that a few writes will fail. In most cases that’s a number between 1 and 10 (usually closer to 1). The /tmp/hold part is for another blog post ;)

@Taylor Looking forward to the post about /tmp/hold. Hope you won’t keep us waiting too long :)

Aaron

Are you using XtraDB Cluster? Or just “standard” master-master replication? Or something else?

Valerie Parham-Thompson

on 03 Dec 12

I wonder if you used or considered MMM failover and found it didn’t work in your environment.

“When we run the script we just restart the app / reset DB connections.” Do you mean you manually restart the app? The use of vIPs should make that seamless.

@Taylor I think dropping writes for a short time span is completely acceptable. Looking forward to the /tmp/hold story! I’m currently looking at products like Tungsten to help accomplish the same.

Making Database Maintenance Fun: mysql_role_swap

Nick

Davy

Mikael

Taylor

Joseph

Taylor

Davy

Aaron

Valerie Parham-Thompson

Joseph

This discussion is closed.

About Taylor