The road to faster tests

Jamis wrote this on Jan 19 2011 33 comments

A week ago, it took about 15 minutes to run all of Basecamp’s tests.

Now, if you adhere to the test-driven development (TDD) philosophy, you’ll know that tests are meant to be run often. Like, every few minutes, preferably. The tests are what give you confidence in your code, and the ability to refactor with impunity, not fearing that your changes are breaking existing features. They are the safety net for our daily tight-rope walk across our code base.

But if your test suite takes a quarter of an hour to run, that puts a severe damper on your willingness to run it often, or even at all. Those tests definitely don’t get run every few minutes. They might get run before you commit a change. More likely, they get run around the end of a development cycle, when the new work is preparing for deployment. Although they can still be marginally useful when run like that, they are a far cry from what they might be.

At the beginning of this latest development cycle, I begged leave to spend a few days digging into our test suite to see what could be done. Just about all of the programmers here have had a turn looking into the tests, but with other responsibilities it was hard to find enough time to do more than prove the hypothesis-du-jour wrong. I hoped that with some dedicated time I might be able to make some significant improvements.

Now, I’m going to give you the full travelogue here. Feel free to skip to the end if all you want is the conclusions.

Travelogue.

My original inspiration for this dive was found at last year’s RubyConf, during Aman Gupta’s fantastic talk about tools for debugging Ruby. I couldn’t wait to try some of the things he demonstrated. This was my chance!

Armed with perftools and perftools.rb, it was immediately apparent that the worst offender was the garbage collector. My hypothesis was that there was a memory leak somewhere, causing objects to accumulate and making the GC work overtime, fruitlessly, trying to stem the tide.

This was straightforward to check. I wrote a simple harness to run the tests, which would enter a REPL when all the tests were done and allow me to query Ruby’s object space. The first thing I noticed was that when all tests were finished running there were still more than 32,000 ActiveRecord instances floating around.

Thirty. Two. Thousand.

That’s about three orders of magnitude more records than exist in the entire test database. And since all the tests were done running at this point, there shouldn’t have been anything still in memory. Something was obviously holding on to these records.

Fast-forward a few hours’ worth of experimentation. What was happening was this: Ruby’s Test::Unit library (which we use for Basecamp’s tests) creates a new instance of the test class for each test. Not only that, but it holds onto each of those instances, so if your test suite includes thousands of tests, then the test suite will have thousands of TestCase objects.

That, by itself, is not a big deal. A few thousands of objects are not normally much to worry about. The problem comes when those objects include persistent state. Like instance variables. Every instance variable you set in a test will remain until the interpreter terminates. Bet you didn’t know that! I sure didn’t.

So, the first thing I did was wipe almost all of the instance variables from the object in a teardown method.

teardown :scrub_instance_variables

@@reserved_ivars = %w(@loaded_fixtures @test_passed @fixture_cache @method_name @_assertion_wrapped @_result)

def scrub_instance_variables
  (instance_variables - @@reserved_ivars).each do |ivar|
    instance_variable_set(ivar, nil)
  end
end

The difference was immediate and significant: that single change brought the number of leaked ActiveRecord objects from over 32,000, down to a little over 11,000. And the tests ran 40% faster!

That was encouraging. But there were still 11k objects being leaked; could reducing that number improve tests more?

The answer, sadly, was “not much”. I found one other significant source of memory leaks in the fixture caching code that Rails uses. Every time you use one of the fixture helper methods (e.g. the “accounts(:foo)” method that lets you access the account fixture named “foo”), the records accessed would be saved in an instance variable. It happened to be @fixture_cache, one of the “reserved_ivars” that I didn’t scrub because it belonged to Rails itself. It turns out that it’s actually quite safe to scrub that one, too, and doing so brought the leaked record count down to around 5,000.

The memory win was significant, but the speed win was not; it only gained us about 5% more. At this point I began thinking that the easy wins were over, and that there was no more to be gained by chasing the GC bugbear.

I was wrong. There was (at least) one more quick win. While I was investigating some of the slower tests, timing individual lines and such to try and figure out where the slowness was coming from, I discovered that between one test run and the next the timings would change. First one line would be the slowpoke, and then another. This pointed the finger right at garbage collection, since GC will stop whatever the interpreter is currently doing at seemingly random times, whenever a memory threshold is reached.

So, on a whim, I tried disabling GC for that test. The test ran twice as fast.

(Now, it should be stated that I run a vanilla Ruby 1.8.7 for local development and testing. If you’re running REE with GC tweaks, the following bit about juggling the garbage collector may not win you as much. YMMV.)

Now, it’s not a good idea to disable GC for an entire test run. The interpreter would eat up memory faster than an unsupervised three-year-old in a candy shop, and your computer’s health would degrade rapidly. But, what about disabling the GC and manually running it every second or so?

Turns out that works a treat. Memory usage increased, as you’d expect, but the tests ran an additional 30% faster! Here’s all I did:

setup :begin_gc_deferment
teardown :reconsider_gc_deferment

DEFERRED_GC_THRESHOLD = (ENV['DEFER_GC'] || 1.0).to_f

@@last_gc_run = Time.now

def begin_gc_deferment
  GC.disable if DEFERRED_GC_THRESHOLD > 0
end

def reconsider_gc_deferment
  if DEFERRED_GC_THRESHOLD > 0 && Time.now - @@last_gc_run >= DEFERRED_GC_THRESHOLD
    GC.enable
    GC.start
    GC.disable

    @@last_gc_run = Time.now
  end
end

Conclusions

By reducing the number of memory leaks and reducing the frequency at which GC wanted to run, I was able to halve the time Basecamp’s tests took to run. I’ll be frank: 7 minutes to run tests is still not particularly TDD-friendly, but it’s a very encouraging improvement. I’ll be working on incorporating these tweaks into Rails itself so that everyone can benefit, but in the meantime, they should be easily added by reopening ActiveSupport::TestCase in your test_helper.rb and adding the hooks quoted above.

Hurray for faster tests!

Jamis wrote this on Jan 19 2011 There are 33 comments.

Matt B

on 19 Jan 11

But if your test suite takes a quarter of an hour to run, that puts a severe damper on your willingness to run it often, or even at all. Those tests definitely don’t get run every few minutes. They might get run before you commit a change. More likely, they get run around the end of a development cycle, when the new work is preparing for deployment.

There is another option in between these two as well – run your tests on a Continuous Integration server after each commit.

Brent Sordyl

on 19 Jan 11

Grease your Suite: Tips and Tricks for Faster Testing

Brent Sordyl

on 19 Jan 11

Some other great tips for speeding up tests were presented at by Nick Gauthier at WindyCityRails last year, this video is on their site at http://windycityrails.org/videos2010/#5

Grease your Suite: Tips and Tricks for Faster Testing

John Campbell

on 19 Jan 11

Are you using running the tests on a machine with an SSD? Unit tests are often Disk IO bound.

Joe Larson

on 19 Jan 11

This is really great! I find this kind of performance tinkering (in any context) really fun—about half the time, sometimes it is drudgery.

One thing I might recommend further though is to carve up your tests into pieces, and have some pieces running frequently and others running right before a commit (or really, all tests running before commit). They pieces running frequently might relate only to the areas of the system you are currently working on, or they might be only those pieces that can be run quickly, or some other combination. You might even have a scheme that rotates through different collections of tests so that over the course of a few hours everything has been tested—you might catch a bug from two hours ago but still have a fighting chance of remembering what on earth you were doing at that time.

Great job!

Trevor Turk

on 19 Jan 11

I saw something about the Basecamp tests running much faster (on Twitter?) and was hoping to see a write-up like this—thank you! I’m looking forward to having these changes in Rails itself, too, so that everyone will benefit. Awesome work!

Matt Hooks

on 19 Jan 11

I remember that triumphant tweet as well… Glad to hear you’ll be pulling these changes back into Rails.

whatsthebeef

on 19 Jan 11

Excellent post on performance analysis of tests. This approach is often one over looked when reducing test time.

I have encountered this problem on numerous occasions in multiple languages. We have had suites that run for hours (specially where automated UI testing comes in, the value of which is sometimes questionable).

It can be effective to create a smoke test suite that runs in conjunction with the suite of tests associated with the current component being worked on.

If you are working in git you can hook these into a local git commit and commit frequently, then when pushing to remote execute all the tests, or a wider subset of all tests.

Lee

on 19 Jan 11

It seems to me that you only really need to run related test(s) while you are developing (doing TDD.) Running the entire test suite each time you want to check if a test passes seems silly.

But, 15 minutes would still be a long time even if you just run the full suite before commiting, so I can definitely see value in speeding it up.

Jamis

on 19 Jan 11

@Lee, it is definitely preferable to just run the subset that your changes touch. However, it’s not always straightforward to see what that subset is. Consider the case where you are changing a partial view for displaying a glimpse of a milestone. Since it’s a view, you probably can forego running any of your unit tests. So it sounds like you’d just need to run the milestone controller tests, yeah? But what if the todo controller also uses that partial to display milestones related to the todo lists? And what if the posts controller uses that partial to show milestones related to the different messages? And what if the projects controller uses the partial to display milestones on the project dashboard?

Without some meticulously maintained notes, it’s going to be difficult to know, in general, exactly what subset of tests need to be run to verify a change. Running the milestones controller for this example would probably be fine for the 80% case, but you still need to be able to run the entire suite (at least a few times a day) in order to preserve your confidence level.

Stefan Kaes

on 19 Jan 11

The problem is not really Rails related. I suspect you’re running Test::Unit? It keeps instances of every test class until the very end of the test run, even though this isn’t necessary at all.

The following code fixes this problem:

class Test::Unit::TestSuite def run(result, &progress_block) yield(STARTED, name) test = nil @tests.reverse! while test = @tests.pop test.run(result, &progress_block) end yield(FINISHED, name) end end

Darren

on 19 Jan 11

At Zendesk, our test suite was taking somewhere around 20 minutes to run. Since most of us use REE, we decided to tune the GC, to help speed up tests. We got it down to about 11 minutes with these settings…

RUBY_HEAP_MIN_SLOTS=500000 RUBY_HEAP_SLOTS_INCREMENT=500000 RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_MALLOC_LIMIT=100000000 RUBY_HEAP_FREE_MIN=500000

I added your scrub_instance_variables tweak above. My first anecdotal test run didn’t show any significant difference (in fact it was worse than usual, but, that was just one test run). I’ll keep playing with it and see if it adds any benefit to a tuned GC in ree.

Jamis

on 19 Jan 11

@Darren, yeah, with REE all bets are off. You should notice at least an improved memory footprint with the ivar scubbing, though, even if you don’t get any speed improvements.

Laran Evans

on 19 Jan 11

Great story. It’s always a positive experience to shed the crud.

Joakim Kolsjö

on 19 Jan 11

Great post!

Thought I’d mention some tools that can help if the tests still are not fast enough. The first thing you should look at is parallel_tests to use more than one CPU core… it takes no time to setup and usually cuts the time in half or more.

Then there are the distributed tools like hydra and testbot. We use testbot at work to get ~30 minutes worth of rspec tests run in a few minutes :).

Matthew Higgins

on 19 Jan 11

Another easy win is mounting a RAM disk and pointing MySql at that. This has a greater benefit for those using factories rather than fixtures.

For running individual tests faster, turn off cache_classes in environments/test.rb. A combination of this and the use of autoload will require the minimal set of files to be loaded, reducing startup time.

Jamis

on 19 Jan 11

@Matthew, interestingly, the database does not appear to be the bottleneck in our tests. Using a RAM disk buys us less than a 5% performance improvement in the unit tests, and even less in our functional tests. Then again, we’re using fixtures, and you did say that those won’t see as much benefit.

Adam

on 19 Jan 11

Hi Jamis, very interesting, I’ll surely try that out. May I ask you if you can share with us your rake stats? Just to know how big Basecamp is and have an idea on the time/code :)

Richard

on 19 Jan 11

I’ve always found Ruby’s GC to be a severe issue. It’s nowhere near as robust and finely tuned as Java. The tools to diagnose such problems on ruby are in the stone age as well.

Eric Hodel

on 20 Jan 11

You can also switch from test/unit to minitest. MiniTest doesn’t keep around testcase instances until the tests finish which avoids the entire problem through better engineering.

Georg Ledermann

on 20 Jan 11

Thanks! I’m using parallel_tests on 4 cores with RSpec and Ruby 1.8.7 from MacPorts. With deferring GC it’s 40% faster.

Nic Benders

on 20 Jan 11

@Darren, those GC tuning changes for REE did the trick here! We got a 55% improvement in test time without having to make any other adjustments.

Joe Van Dyk

on 20 Jan 11

I believe Ola Bini posted about this back in 2007.

http://ola-bini.blogspot.com/2007/12/your-ruby-tests-are-memory-leaks.html

Joe Van Dyk

on 20 Jan 11

Ugh, linking to above mentioned blog post: http://ola-bini.blogspot.com/2007/12/your-ruby-tests-are-memory-leaks.html

Henning Koch

on 21 Jan 11

Instructions to hook up Joe’s GC deferrer to RSpec: http://makandra.com/notes/950-speed-up-rspec-by-deferring-garbage-collection

Avdi Grimm

on 21 Jan 11

Good stuff! I assume you’ve probably seen Nick Gauthier’s “Grease Your Suite” talk by now, but if not, you might find some more inspiration in there. There are recordings of the talk floating around.

Glenn Gillen

on 21 Jan 11

Is there any reason why we’d want to keep the existing Test::Unit behaviour? If not, sounds like something to push back up stream.

Glenn Gillen

on 21 Jan 11

Actually, I just took at look at the latest trunk of ruby-core and if I’m not mistaken it looks like Test::Unit is wrapping MiniTest these days, which Eric said doesn’t suffer from this same problem.

Robert Sullivan

on 23 Jan 11

Interesting to see the kind of testing I’ve been delving into the Ruby tools for testing and they look very interesting, BDD tools like Cactus. Judging by the significant outage in campfire recently, it would be interesting to know what type of performance testing is done. I see the list of root causes, seems like you ran into a perfect storm of sorts, so definitely kudos on getting to the bottom of these issues, that are very complex. On the other hand, if you are reaching for dial-tone type reliability, it looks like some changes need to be made in your processes as obviously there are systems out there that have dial tone reliability (this is the big selling point of platforms like Erlang). The modern facebook and google style of TDD and Agile development, fast cycles (perhaps they are reading rework) are all well and good and achieve their purpose, and have their benefits, but tried and true methods, that old line businesses do, would seem to have avoided this issue. Very simply, had 37signals done some proper performance testing (same CPUs, same Linux Kernel, yadayada, same mysql, same Rails) you should hit stuff like this, and if you don’t then you should figure out why. Best of luck, great blog and info, thanks.

Brent Royal-Gordon

on 23 Jan 11

I wonder if this might not be a code smell. Consider that Basecamp’s test suite presumably doesn’t test any of the underlying libraries; after all, they were separately tested before you installed them. Is the test suite taking too long because some parts of Basecamp should be treated like the libraries, as a separate, underlying entity? Could “my test suite is taking too long” be a sign that you’re treating a codebase as monolithic that should be split up into separately editable and testable components?

Maybe fifteen minutes isn’t the right point, but I’ve heard of test suites taking an hour or more. Maybe they’re just testing a system that’s too big to be treated as a single whole anymore.

Philippe Creux

on 24 Jan 11

Thanks for sharing this! I was wonderingg if such tips would be relevant to RSpec… so thanks to Henning Koch for sharing an RSpec related article.

Jonathan Hartley

on 25 Jan 11

That’s really great, thanks for posting… but!

But our tests take an hour to run. They used to take four hours, but we’ve optimised them already.

99% of that time is running system-level acceptance tests, which fire up our (Python, Django) web-app, and put it through its paces like a real user would.

We found there are problems when we drive the UI at full possible speed. Whether it’s race conditions in our code, or just browsers being slow, we found that tests would get flaky (e.g. typed characters are missed) if we allowed Selenium to type & click at full speed. So many of our common user-actions (click a button, type in a field) have ‘humanise_delays’ around them, to slow Selenium down. It still runs the tests at bionic speed, way faster than any human could.

Karle

on 25 Jan 11

Just wanted to say thanks! These changes, along with some REE GCC tuning took our tests from 15 minutes to 9 minutes.

Still some work to do, obviously, but this was a great start.

This discussion is closed.

About Jamis

He makes marshmallows and bow ties, plays guitar, and researches dead people. Sometimes all at the same time.

Read all of Jamis’s posts, and follow Jamis on Twitter.

If you liked this Programming post by Jamis, you’ll probably like reading Four tips for learning how to program and Making sense with Ruby's "unless"