You’re reading Signal v. Noise, a publication about the web by Basecamp since 1999. Happy !

Signal v. Noise: Sysadmin

Our Most Recent Posts on Sysadmin

Nuts & Bolts: Database Servers

Mark Imbriaco
Mark Imbriaco wrote this on 21 comments

As a part of our ongoing Nuts & Bolts series I asked for questions from readers about the kinds of things they’d like to see covered. One of the topics that came up several times was how we manage our database servers.

All of our applications, with the exception of Basecamp, follow a pretty similar model: We take a pair of Dell R710 servers, load them up with memory and disks, and setup a master/slave pair of MySQL servers. We use the excellent Percona Server for all of our MySQL instances and couldn’t be happier with it.

Here’s an example of one of our MySQL servers. In this case, the Highrise master database server:

  • Dell R710
  • 2 x Intel Xeon E5530 Processors
  • 96GB RAM
  • 6×146GB 15,000 RPM SAS drives

For the disk configuration we take the first two drives and put them into a RAID1 volume that is shared between the root filesystem and MySQL binary logs. The remaining drives are placed into a RAID10 volume which is used for the InnoDB data files.

We only use RAID controllers that have a battery backup for the cache, disable read-ahead caching, and turn on write-back caching. With this setup we’re able to configure MySQL to immediately flush all writes to the disk rather than relying on the operating system to periodically write the data to the drives. In reality, the writes will be staged to the controller’s cache, but with the battery backup we are protected from unexpected power outages which could otherwise cause data loss. In addition, since the controller is caching the writes in memory, it can optimize the order and number of writes that it makes to the physical disks to dramatically improve performance.

As far as MySQL configuration is concerned, our configuration is pretty standard. The most important tips are to maximize the InnoDB buffer pool and make sure that you have a BBU enabled RAID card for writes. There are other important configuration options, but if you do those two things you’re probably 75% of the way to having a performant MySQL server.

Here are some of the most important configuration options in the Highrise MySQL config file:

sync_binlog = 1
innodb_file_per_table
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_buffer_pool_size = 80G

I’m not going to talk much about backups other than to say you should be using XtraBackup, also from our friends at Percona. It is far and away the best way to do backups of MySQL.

For Basecamp, we take a somewhat different path. We are on record about our feelings about sharding. We prefer to use hardware to scale our databases as long as we can, in order to defer the complexity that is involved in partitioning them as long as possible—with any luck, indefinitely.

With that in mind, we went looking for an option to host the Basecamp database, which is becoming a monster. As of this writing, the database is 325GB and handles several thousand queries per second at peak times. At Rackspace, we ran this database on a Dell R900 server with 128GB of RAM and 15×15,000 RPM SAS drives in a Dell MD3000 storage array.

We considered building a similar configuration in the new datacenter, but were concerned that we were hitting the limits of I/O performance with this type of configuration. We could add additional storage arrays or even consider SAN options, but SSD storage seemed like a much better long term answer.

We explored a variety of options from commodity SSD drives to PCI-express based flash memory cards. In the end, we decided to purchase a pair of MySQL appliances produced by Schooner Information Technology. They produce a pretty awesome appliance that is packed with a pair of Intel Nehalem processors, 64GB of RAM, 4×300GB SAS drives, 8 x Intel X25-E SSD drives. Beyond the hardware, Schooner has done considerable work optimizing the I/O path from InnoDB all the way down through the system device drivers. The appliances went into production a few weeks ago and the performance has been great.

I sat down with Jeremy Cole of Schooner a few weeks ago and recorded a couple of videos that go into considerably more detail about our evaluation process and some thoughts on MySQL scaling. You can check them out here and here.

Nuts & Bolts: New Datacenter!

Mark Imbriaco
Mark Imbriaco wrote this on 28 comments

With all the recent talk about the fabulous new office space that the Chicago crew just moved into, I wanted to share a little bit about another long term move that is nearing completion. For the last four years our infrastructure has been hosted with Rackspace. As of last weekend, the vast majority of our traffic is now being served out of our own colocated server cluster.


Some of our Dell R710 Servers, we have a bunch of these.
Continued…

[Podcast] Episode #12: Being a Systems Administrator at 37signals

Matt Linderman
Matt Linderman wrote this on 7 comments

Time: 22:50 | 04/13/2010 | Download MP3



Mark, Joshua, and John on life as a 37signals Sys Admin
The Sys Admin team discusses hosting the 37signals apps, working with programmers, helping support, telecommuting, dealing with vendors, improving speeds in Europe, and more.

More episodes
Subscribe to the podcast via iTunes or RSS. Related links and previous episodes available at 37signals.com/podcast.

Spread the word
Like this episode? Please share it with your friends:

Tweet this podcast  Post to Facebook

Nuts & Bolts: Configuration management with Chef

Joshua Sierles
Joshua Sierles wrote this on 11 comments

Configuration management doesn’t sound sexy, but it’s the single most important thing we do as sysadmins at 37signals. It’s about documenting an entire infrastructure setup in a single code base, rather than a set of disparate files, scripts and commands. This has been our biggest sysadmin pain point.

Recently we hit a milestone of easing this pain across our infrastructure by adopting Chef, the latest in a long line of configuration management tools.

We struggled with a few other tools before settling on Chef. We love it. It’s open source, easy to hack on, opinionated, written in Ruby and has a great community behind it. It’s really changed the way we work. I think of it as a Rails for Sysadmins.

Here’s a snippet of all the data required to make Chef configure a bare Ubuntu Linux install as a Basecamp application server.


:basecamp => {
  :gems => ['fast_xs', ['hpricot', '0.8.1'], 'aws-s3', 'ruby-prof',
            ['net-ssh', '1.1.4'], ['net-sftp', '1.1.1'], ['tzinfo', '0.3.9']],
  :packages => ['imagemagick', 'elinks', 'zip'],
  :apache_modules => ["auth_token", "xsendfile", "rewrite"],
  :passenger => { :tune_gc => true }
}

As an early adopter, we’ve helped Chef grow and opened our repository of Chef recipes. If you’re interested in using Chef, take a look there for some example uses. Please fork and provide feedback on Github.

Nuts & Bolts: Campfire loves Erlang.

Mark Imbriaco
Mark Imbriaco wrote this on 39 comments

A couple of years ago a lot of buzz started in the Ruby community about Erlang, a functional programming language developed by Ericsson originally for use in telecommunications systems. I was intrigued by the talk of fault tolerance and concurrency, two of the cornerstones that Erlang was built on, so I ordered the Programming Erlang book written by Joe Armstrong and published by the Pragmatic Programmers and spent a couple of weeks working through it.

A year later, Kevin Smith began producing his excellent Erlang in Practice screencast series in partnership with the Pragmatic Programmers. It’s amazing how much difference it made for me to be able to watch someone develop Erlang applications while talking through his thought process along the way.

As I was learning Erlang, I kept threatening to rewrite the poller service that handles updating Campfire chat rooms when someone speaks in room. At some point my threats motivated Jamis, who was also playing with Erlang in his free time, to port our C based polling service to Erlang. Jamis invited me to look at the code and I couldn’t help myself from refactoring it within an inch of it’s life.

The code that Jamis wrote worked fine, but it was not very idiomatic Erlang. While I didn’t have much more experience developing Erlang code than Jamis, I had definitely seen more real Erlang code. I tried to pattern our work after what I had been exposed to, making improvements along the way. We ended up with 283 lines of pretty decent Erlang code.

parameter(Parameter, Parameters) ->
  case lists:keysearch(Parameter, 1, Parameters) of
    {value, {Parameter, Key}} -> Key;
    false -> undefined
  end.

For the curious, here’s a very simple example function from the real Campfire poller service. This function takes two arguments, the name of a parameter to search for, and the list of parameters. If it finds a matching parameter it returns the associated value, otherwise it returns the atom undefined. Atoms are like symbols if you’re a Ruby programmer.

Last Friday we rolled out the Erlang based poller service into production. There are three virtual instances running a total of three Erlang processes. Since Friday, those 3 processes have returned more than 240 million HTTP responses to Campfire users, averaging 1200-1500 requests per second at peak times. The average response time is hovering at around 2.8ms from the time the request gets to the Erlang process to the time we’ve performed the necessary MySQL queries and returned a response to our proxy servers. We don’t have any numbers to compare this with the C program that it replaced, but It’s safe to say the Erlang poller is pretty fast. It’s also much easier to manage 3 Erlang processes than it was the 240 processes that our C poller required.

Erlang definitely isn’t a replacement for Rails, but it is a fantastic addition to our collective toolbox for problems that Rails wasn’t designed to address. It’s always easier to work with the grain than against it, and adding more tools makes that more likely.

Nuts & Bolts: HAproxy

Mark Imbriaco
Mark Imbriaco wrote this on 28 comments

A common request we get from readers is to describe in more detail how our server infrastructure is setup. That question is so incredibly broad that it’s hard to answer it in any kind of comprehensive way, so I’m not going to try to. Instead, I’m keeping the general desire for more technical details in mind as I work through day-to-day issues with our configuration, and I’ll try to occasionally write about things that I think might be of interest. The topic for today is HAproxy.

Continued…

Behind the scenes at 37signals: Sysadmin and development

Matt Linderman
Matt Linderman wrote this on 24 comments

This is the third in a series of posts showing how we use Campfire as our virtual office. All screenshots shown are from real usage and were taken during one week in September.

CampfireThis time we’ll take a look at how Campfire is an integral part of our sysadmin and development efforts.

Discover and fix a code failure
Whenever someone checks in a piece of code, CIA (Continous Integration Agent) automatically runs our test suites and reports on any failed tests. one week in CF

Analyze a server problem
David and Mark discuss a server issue. one week in CF

Subversion shows changes to the code
Subversion tracks changes Ryan recently uploaded. Jason offers kudos on the copy edit made. one week in CF

Tell everyone about a server change
Sam deploys changes to Backpack and details what was changed. one week in CF

Continued…