As a part of our ongoing Nuts & Bolts series I asked for questions from readers about the kinds of things they’d like to see covered. One of the topics that came up several times was how we manage our database servers.
All of our applications, with the exception of Basecamp, follow a pretty similar model: We take a pair of Dell R710 servers, load them up with memory and disks, and setup a master/slave pair of MySQL servers. We use the excellent Percona Server for all of our MySQL instances and couldn’t be happier with it.
Here’s an example of one of our MySQL servers. In this case, the Highrise master database server:
- Dell R710
- 2 x Intel Xeon E5530 Processors
- 96GB RAM
- 6×146GB 15,000 RPM SAS drives
For the disk configuration we take the first two drives and put them into a RAID1 volume that is shared between the root filesystem and MySQL binary logs. The remaining drives are placed into a RAID10 volume which is used for the InnoDB data files.
We only use RAID controllers that have a battery backup for the cache, disable read-ahead caching, and turn on write-back caching. With this setup we’re able to configure MySQL to immediately flush all writes to the disk rather than relying on the operating system to periodically write the data to the drives. In reality, the writes will be staged to the controller’s cache, but with the battery backup we are protected from unexpected power outages which could otherwise cause data loss. In addition, since the controller is caching the writes in memory, it can optimize the order and number of writes that it makes to the physical disks to dramatically improve performance.
As far as MySQL configuration is concerned, our configuration is pretty standard. The most important tips are to maximize the InnoDB buffer pool and make sure that you have a BBU enabled RAID card for writes. There are other important configuration options, but if you do those two things you’re probably 75% of the way to having a performant MySQL server.
Here are some of the most important configuration options in the Highrise MySQL config file:
sync_binlog = 1 innodb_file_per_table innodb_flush_log_at_trx_commit = 1 innodb_flush_method = O_DIRECT innodb_buffer_pool_size = 80G
I’m not going to talk much about backups other than to say you should be using XtraBackup, also from our friends at Percona. It is far and away the best way to do backups of MySQL.
For Basecamp, we take a somewhat different path. We are on record about our feelings about sharding. We prefer to use hardware to scale our databases as long as we can, in order to defer the complexity that is involved in partitioning them as long as possible—with any luck, indefinitely.
With that in mind, we went looking for an option to host the Basecamp database, which is becoming a monster. As of this writing, the database is 325GB and handles several thousand queries per second at peak times. At Rackspace, we ran this database on a Dell R900 server with 128GB of RAM and 15×15,000 RPM SAS drives in a Dell MD3000 storage array.
We considered building a similar configuration in the new datacenter, but were concerned that we were hitting the limits of I/O performance with this type of configuration. We could add additional storage arrays or even consider SAN options, but SSD storage seemed like a much better long term answer.
We explored a variety of options from commodity SSD drives to PCI-express based flash memory cards. In the end, we decided to purchase a pair of MySQL appliances produced by Schooner Information Technology. They produce a pretty awesome appliance that is packed with a pair of Intel Nehalem processors, 64GB of RAM, 4×300GB SAS drives, 8 x Intel X25-E SSD drives. Beyond the hardware, Schooner has done considerable work optimizing the I/O path from InnoDB all the way down through the system device drivers. The appliances went into production a few weeks ago and the performance has been great.
I sat down with Jeremy Cole of Schooner a few weeks ago and recorded a couple of videos that go into considerably more detail about our evaluation process and some thoughts on MySQL scaling. You can check them out here and here.
MI
on 27 Jul 10I was just reminded on Twitter that I didn’t mention the filesystem we use. We use XFS on all of our MySQL servers and it performs very well for us. Percona has written about XFS pretty extensively on their blog if you want to read more about it in the context of MySQL.
Hank
on 27 Jul 10@Mark
Am I reading the post correctly that you moved from a Basecamp database that had 128 GB of RAM to a database server with only 64 GB of RAM?
Mi
on 27 Jul 10You are reading correctly. It is amazing what having disks that can return data in microseconds does for pwrformance. I talk about the testing we did to validate the performance in the videos I linked. I was skeptical too until we tested them.
RF
on 27 Jul 10Hank: 64 GB of RAM but one-hundredth the disk seek time! That can be a trade up.
Kishor Gurtu
on 27 Jul 10Have you considered using a NoSQL database like Mongo for scaling? Would be interesting to know your views on this.
Sean Porter
on 27 Jul 10Love it, the brushed aluminum gets me all hot and bothered.
Do you see a future when the monolithic approach (scaling up) won’t work for Basecamp? Or will Moore’s law take care of you?
I get the feeling that you would rather shard MySQL before making a move to a NoSQL solution, such as Mongo as Kishor mentioned.
Do you have cold cache nightmares?
Morten
on 27 Jul 10Do the Schooners come with a “fixed” MySQL build and my.cnf or are they subject to patching/tuning as you’d do any other MySQL installation?
MI
on 27 Jul 10Sean: Great questions. First, I think we’d probably shard MySQL before making a move to a NoSQL solution. That’s not an indictment on NoSQL at all, I personally am a huge fan of Riak and quite like some of the other entrants as well. The main motivation to stick with MySQL would just be because it would be easier. If we got to the point where we needed to take that step, we’d evaluate all the options and choose what makes the most sense at the time. The answer to that question changes over time, particularly as all of these solutions mature.
As far as cold caches are concerned, we’ve talked a little bit about how keep our slaves warm in the past. We honestly don’t have to deal with that on the Schooner appliances since they have so much I/O. When I benchmarked them, they absolutely crushed our old servers, even when they started cold and it had a completely warm cache. They’re damned fast.
Morten: They do come with a fixed version of MySQL, but it is kept pretty up to date with the mainline. Tweaking my.cnf is totally within bounds, and we have made a few small changes to ours for Basecamp. Schooner exposes tools for editing that in their management interfaces and you can drop to a shell if absolutely necessary.
yongfook
on 27 Jul 10What is your opinion on tenant policy? With SAAS this is often debated to a standstill :)
It sounds like Basecamp runs from a monolithic database, rather than having a one-database per tenant policy. Is there a reason for this? There are some advantages and disadvantages to both sides.
Although I guess a lot of the disadvantages of a monolithic database disappear once you start talking about some kind of master-slave-style redundant setup, which is I guess what you guys are rocking.
BillP
on 27 Jul 10Thanks Mark! You covered it all. Great info.
Tom G
on 27 Jul 10Do you think you’ll ever need to implement an archive or nearline process for Basecamp in the future?
Irving Park
on 27 Jul 10Are there issues with repeated writes to a given sector of an SSD drive?
Have controllers been optimized to the particular characteristics of an SSD drive versus a spinning drive?
Alejandro Moreno
on 27 Jul 10I imagine the 37accounts DB is in one of the smaller servers, and the account ID is just passed to the other apps and DBs?
plaqq
on 27 Jul 10Mark – awesome info!!
@ yongfook I suspect that using one database per tenant would be problematic if the database tables need to be changed to add or modify fields, they would need to modify 1000s+ of individual databases I think so perhaps that is why, although I am purely speculating
Kyle West
on 27 Jul 10Curious: who is the new datacenter? Is everything there?
Mark
on 28 Jul 10I’m curious, was the decision to go w/Schooner based solely on performance vs. SSD/PCI-X memory or was it TCO? E.g., You had a faster solution with straight SSD but it was too expensinve and you didn’t want to rely on a homebrew solution when a viable alternative w/more support and testing was available?
Thanks for the insights, Mark @marknca
Henrique
on 28 Jul 10Nice first post, Mark!
The I/O given by SSD RAIDs rock on a whole new level, it’s not even funny anymore. ;)
As yongfook, I’m also interested on the tenancy model used for the products, but from the description of how MySQL is ran, I believe it’s a monolithic DB.
A cool second post could be about the release cycle (test / beta / release) if you guys have one.
David Andersen
on 29 Jul 10Wonderful article, thanks.
FYI, ‘performant’ is not a word. (and for all you other IT folks out there, neither is ‘architecting’ or ‘architected’.)
Rob
on 31 Jul 10Incredible article, thanks so much for shedding some light, incredibly interesting to have a peek behind the scenes!
MI
on 31 Jul 10David: Thanks! I’m glad you enjoyed the article. Also, thanks for the tip about “performant”. It’s interesting to know that it’s not a word. I’m going to continue to use it since people seem to understand what I mean when I say it and, well, I rather like it. English is adaptable. :)
Shane
on 02 Aug 10Great article. Thanks for sharing. I’m considering using the X-25E drives in a server of mine. I read though that they ship with write cache enabled (for performance and wear levelling) – yeah, i know, big no-no for enterprise drives. With write cache disabled (required to survive a power outage without data loss and corruption) I read that write performance drops off considerably, not to mention the effect on drive durability. When you talk about the performance of the schooner servers, is that with write cache disabled on the X-25Es?
This discussion is closed.