A week ago, it took about 15 minutes to run all of Basecamp’s tests.
Now, if you adhere to the test-driven development (TDD) philosophy, you’ll know that tests are meant to be run often. Like, every few minutes, preferably. The tests are what give you confidence in your code, and the ability to refactor with impunity, not fearing that your changes are breaking existing features. They are the safety net for our daily tight-rope walk across our code base.
But if your test suite takes a quarter of an hour to run, that puts a severe damper on your willingness to run it often, or even at all. Those tests definitely don’t get run every few minutes. They might get run before you commit a change. More likely, they get run around the end of a development cycle, when the new work is preparing for deployment. Although they can still be marginally useful when run like that, they are a far cry from what they might be.
At the beginning of this latest development cycle, I begged leave to spend a few days digging into our test suite to see what could be done. Just about all of the programmers here have had a turn looking into the tests, but with other responsibilities it was hard to find enough time to do more than prove the hypothesis-du-jour wrong. I hoped that with some dedicated time I might be able to make some significant improvements.
Now, I’m going to give you the full travelogue here. Feel free to skip to the end if all you want is the conclusions.
My original inspiration for this dive was found at last year’s RubyConf, during Aman Gupta’s fantastic talk about tools for debugging Ruby. I couldn’t wait to try some of the things he demonstrated. This was my chance!
Armed with perftools and perftools.rb, it was immediately apparent that the worst offender was the garbage collector. My hypothesis was that there was a memory leak somewhere, causing objects to accumulate and making the GC work overtime, fruitlessly, trying to stem the tide.
This was straightforward to check. I wrote a simple harness to run the tests, which would enter a REPL when all the tests were done and allow me to query Ruby’s object space. The first thing I noticed was that when all tests were finished running there were still more than 32,000 ActiveRecord instances floating around.
Thirty. Two. Thousand.
That’s about three orders of magnitude more records than exist in the entire test database. And since all the tests were done running at this point, there shouldn’t have been anything still in memory. Something was obviously holding on to these records.
Fast-forward a few hours’ worth of experimentation. What was happening was this: Ruby’s Test::Unit library (which we use for Basecamp’s tests) creates a new instance of the test class for each test. Not only that, but it holds onto each of those instances, so if your test suite includes thousands of tests, then the test suite will have thousands of TestCase objects.
That, by itself, is not a big deal. A few thousands of objects are not normally much to worry about. The problem comes when those objects include persistent state. Like instance variables. Every instance variable you set in a test will remain until the interpreter terminates. Bet you didn’t know that! I sure didn’t.
So, the first thing I did was wipe almost all of the instance variables from the object in a teardown method.
teardown :scrub_instance_variables @@reserved_ivars = %w(@loaded_fixtures @test_passed @fixture_cache @method_name @_assertion_wrapped @_result) def scrub_instance_variables (instance_variables - @@reserved_ivars).each do |ivar| instance_variable_set(ivar, nil) end end
The difference was immediate and significant: that single change brought the number of leaked ActiveRecord objects from over 32,000, down to a little over 11,000. And the tests ran 40% faster!
That was encouraging. But there were still 11k objects being leaked; could reducing that number improve tests more?
The answer, sadly, was “not much”. I found one other significant source of memory leaks in the fixture caching code that Rails uses. Every time you use one of the fixture helper methods (e.g. the “accounts(:foo)” method that lets you access the account fixture named “foo”), the records accessed would be saved in an instance variable. It happened to be
@fixture_cache, one of the “reserved_ivars” that I didn’t scrub because it belonged to Rails itself. It turns out that it’s actually quite safe to scrub that one, too, and doing so brought the leaked record count down to around 5,000.
The memory win was significant, but the speed win was not; it only gained us about 5% more. At this point I began thinking that the easy wins were over, and that there was no more to be gained by chasing the GC bugbear.
I was wrong. There was (at least) one more quick win. While I was investigating some of the slower tests, timing individual lines and such to try and figure out where the slowness was coming from, I discovered that between one test run and the next the timings would change. First one line would be the slowpoke, and then another. This pointed the finger right at garbage collection, since GC will stop whatever the interpreter is currently doing at seemingly random times, whenever a memory threshold is reached.
So, on a whim, I tried disabling GC for that test. The test ran twice as fast.
(Now, it should be stated that I run a vanilla Ruby 1.8.7 for local development and testing. If you’re running REE with GC tweaks, the following bit about juggling the garbage collector may not win you as much. YMMV.)
Now, it’s not a good idea to disable GC for an entire test run. The interpreter would eat up memory faster than an unsupervised three-year-old in a candy shop, and your computer’s health would degrade rapidly. But, what about disabling the GC and manually running it every second or so?
Turns out that works a treat. Memory usage increased, as you’d expect, but the tests ran an additional 30% faster! Here’s all I did:
setup :begin_gc_deferment teardown :reconsider_gc_deferment DEFERRED_GC_THRESHOLD = (ENV['DEFER_GC'] || 1.0).to_f @@last_gc_run = Time.now def begin_gc_deferment GC.disable if DEFERRED_GC_THRESHOLD > 0 end def reconsider_gc_deferment if DEFERRED_GC_THRESHOLD > 0 && Time.now - @@last_gc_run >= DEFERRED_GC_THRESHOLD GC.enable GC.start GC.disable @@last_gc_run = Time.now end end
By reducing the number of memory leaks and reducing the frequency at which GC wanted to run, I was able to halve the time Basecamp’s tests took to run. I’ll be frank: 7 minutes to run tests is still not particularly TDD-friendly, but it’s a very encouraging improvement. I’ll be working on incorporating these tweaks into Rails itself so that everyone can benefit, but in the meantime, they should be easily added by reopening ActiveSupport::TestCase in your test_helper.rb and adding the hooks quoted above.
Hurray for faster tests!