You’re reading Signal v. Noise, a publication about the web by Basecamp since 1999. Happy !

Joshua Sierles

About Joshua Sierles

Flamenco by night, Rubyist by day.

Keeping your secondary database cache warm

Joshua Sierles
Joshua Sierles wrote this on 17 comments

In 2009 we ran into some problems when failing over to the Basecamp secondary database. Basecamp relies on a keeping large working set of recently-accessed data in its InnoDB buffer cache for speed. Normal MySQL replication only sends writes, not reads, to the secondary. How could we ensure the data in the secondary’s cache is up to date?

We contracted Percona to build a solution into their Maatkit toolset based on their experiments with SELECT query mirroring. It involves a clever usage of tcpdump to capture and replay SELECT queries from the primary to the secondary database.

Here’s the resulting command.

/usr/bin/mk-query-digest --statistics --iterations 4 --run-time 15m --type tcpdump
--filter '$event->{arg} && $event->{arg} =~ m/^SELECT/i'
--statistics --execute \"h=db-secondary,P=3306,u=secondary,p=password,D=production\"
--execute-throttle 70,30,5

The tcpdump utility captures MySQL traffic from the primary and feeds the data into the mk-query-digest script. This script filters only the SELECT queries and executes them on the secondary database. The throttle argument sets the percentage of time the script should execute queries on the secondary, how often to check that value, and a percentage probability that queries will be skipped when the threshold is exceeded.

Here’s some sample statistical output:

# execute_executed      124668
# throttle_checked_rate     29
# throttle_rate_avg      29.84
# throttle_rate_ok          29

According to these values, the script didn’t reach the 70% query execution threshold we set. Our queries are executing on the secondary cleanly.

Since we began using this tool we switched production database servers without a performance reduction.

Note: This blog post was originally entitled “Keeping your slave warm”, and used the master/slave language throughout. We updated this to use the primary/secondary language in December of 2019, as the offensive nature of the original wording came to our attention.

Nuts & Bolts: Configuration management with Chef

Joshua Sierles
Joshua Sierles wrote this on 11 comments

Configuration management doesn’t sound sexy, but it’s the single most important thing we do as sysadmins at 37signals. It’s about documenting an entire infrastructure setup in a single code base, rather than a set of disparate files, scripts and commands. This has been our biggest sysadmin pain point.

Recently we hit a milestone of easing this pain across our infrastructure by adopting Chef, the latest in a long line of configuration management tools.

We struggled with a few other tools before settling on Chef. We love it. It’s open source, easy to hack on, opinionated, written in Ruby and has a great community behind it. It’s really changed the way we work. I think of it as a Rails for Sysadmins.

Here’s a snippet of all the data required to make Chef configure a bare Ubuntu Linux install as a Basecamp application server.

:basecamp => {
  :gems => ['fast_xs', ['hpricot', '0.8.1'], 'aws-s3', 'ruby-prof',
            ['net-ssh', '1.1.4'], ['net-sftp', '1.1.1'], ['tzinfo', '0.3.9']],
  :packages => ['imagemagick', 'elinks', 'zip'],
  :apache_modules => ["auth_token", "xsendfile", "rewrite"],
  :passenger => { :tune_gc => true }

As an early adopter, we’ve helped Chef grow and opened our repository of Chef recipes. If you’re interested in using Chef, take a look there for some example uses. Please fork and provide feedback on Github.

Using the EC2 environment for fewer moving parts

Joshua Sierles
Joshua Sierles wrote this on 6 comments

One highlight of Amazon’s EC2 is having a wide range of generally available services to help reduce moving parts.

We store part of our cluster configuration in S3. The server instances pull this configuration and bootstrap from there using a simple set of rake tasks and a server provisioning tool, Sprinkle. You could use SimpleDB for a similar purpose. One could serve as a backup of the other, given their similar APIs. Either way means fewer moving parts.

Another vital EC2 feature is passing arbitrary data to an instance. Many bundled images now automatically execute a blob of text you pass to the instance on boot as a shell script, like those supplied by Alestic. We use this to sync configuration scripts and packages from S3.

While reading Tim Dysinger’s article on using EC2 as a simple DNS, I thought this was a great way to remove the need for an internal DNS server on EC2 for smaller setups. We use a similar technique: specifying a single EC2 Security Group for a host as its identifier. Each server generates its hosts file every minute. Simple, effective and one fewer moving part.

Security groups are useful for describing roles and other identifying information about each host. We use this information to generate Nagios monitoring configuration files. For example, a security group named “role: app” will automatically enable HTTP checks and Passenger memory checks.

All this means less dependence on a centralized configuration server or pushing large sets of commands over SSH manually. While these techniques are effective, they require more moving parts and their own care and maintenance.

As your application’s complexity increases, you’ll thank yourself for the opportunity to reduce the complexity underneath it.

Ta-da List on Rails 2.2, Passenger And EC2

Joshua Sierles
Joshua Sierles wrote this on 29 comments

Ta-da List just moved to three exciting platforms: Rails 2.2, Phusion Passenger, Amazon EC2.

This is an important milestone for 37signals. As proponents of simplicity, we really felt the difference the latter two technologies made for server provisioning and deployment. As we grow more comfortable with these services, we’ll be moving other applications this way and writing more about the results.

Phusion Passenger

After we comissioned Phusion to add global queueing to Passenger, we felt it was time to give it a try. Since we were already on Apache, it proved a simple transition. We’re really impressed with the ease of deployment and stability under Passenger. The app now requires less than 10 lines of configuration to launch and deploy. Passenger handles its own process spawning: done. Its command line tools for monitoring requests and memory usage complete the package for easy integration into monitoring tools.

Amazon EC2

This has been my ongoing project at 37signals. Avoiding the extra layer of low-level setup involved with our current Xen-based virtualization system brought me closer to the core concerns of our environment – how to best automate support for the applications from the systems side. More often than not, a traditional server deployment consists of a range or organically provisioned services and environments. Ours is no exception, due to the rapidly-changing requirements of each application. EC2’s lack of persistence forces you to think about automating this from the start. This turns out to be a blessing in disguise. Setting up a full environment consisting of dedicated instances for mail, Nagios/remote logging, application serving, a master and a slave now takes about 5 minutes.

In future posts we’ll detail how we used combination of image bundling and a custom EC2 deployment tool to build the Ta-da List environment.

If you haven’t documented your server deployment process in code or experimented with these technologies, now is the time.

So maybe a recession is a good time to start a startup. It’s hard to say whether advantages like lack of competition outweigh disadvantages like reluctant investors. But it doesn’t matter much either way. It’s the people that matter. And for a given set of people working on a given technology, the time to act is always now.

Joshua Sierles on Nov 20 2008 18 comments

Filmmaking and participation

Joshua Sierles
Joshua Sierles wrote this on 6 comments

Emphasis on participation and trust have been my favorite part about working 37signals this first month. It reminded me of my favorite director’s filmmaking process.

Traditional filmmaking, essentially: write a script, cast actors, go on set and film. British indie director Mike Leigh takes a different approach.

I start with no script. I do a brief of the film for myself, which is usually pretty fluid. Then I work with the actors for an extensive period creating the characters, through conversation, research and improvisation. Then we go out and invent the film on location, and structure it and shoot it as we go. To me, that’s what it’s all about. It’s about using film as a medium in its own right, not as a way of including the decisions of various committees (via MovieMaker).

It pays off. Leigh’s characters shine with curious originality. The sometimes strange dialogue and situations tend to provoke some response no matter how foreign to the viewer. I attribute this to level playing field; encouraging the cast to improvise and create.

An exemplary clip from ‘Naked’, developed almost exclusively by the two actors in a run of improvised sessions who then cull the cruft in together with the director.

This is a great example of how going in unprepared yields fruit, and how encouraging people to participate brings out the best in them.