You’re reading Signal v. Noise, a publication about the web by Basecamp since 1999. Happy !

Signal v. Noise: Programming

Our Most Recent Posts on Programming

Dragons on the far side of the histogram

David
David wrote this on 9 comments

Performance tuning is a fun sport, but how you’re keeping score matters more than you think, if winning is to have real impact. When it comes to web applications, the first mistake is start with what’s the easiest to measure: server-side generation times.

In Rails, that’s the almighty X-Runtime header — reported to the 6th decimal of a second, for that extra punch of authority. A clear target, easily measured, and in that safe realm of your own code to make it appear fully controllable and scientific. But what good is saving off milliseconds for a 50ms internal target, if your shit (or non-existent!) CDNs are costing you seconds in New Zealand? Pounds, not pennies, is where the wealth is.

Yet that’s still the easy, level one, part of the answer: Don’t worry too much about your internal performance metrics until you’ve cared enough about the full stack of SSL termination overhead, CDN optimization, JS/CSS asset minimization, and client-side computational overhead (the latter easily catching out people following the “just do a server-side API”, since the json may well generate in 50ms, but then the client-side computation takes a full second on the below-average device — doh!).

Level two, once reasonable efforts have been made to trim the fat around the X-Runtime itself, is getting some big numbers up on the board: Mean and the 90th percentile. Those really are great places to start. If your mean is an embarrassing 500ms+, well, then you have some serious, fundamental problems that need fixing, which will benefit everyone using your app. Get to it.

Keep going beyond even the 99th

Just don’t stop there. Neither at the mean or the 90th. Don’t even stop at the 99th! At Basecamp, we sorta fell into that trap for a while. Our means were looking pretty at around 60ms, the 90th was 200ms, and even the 99th was a respectable 700ms. Victory, right?

Well, victory for the requests that fell into the 1st to 99th percentile. But when you process about fifty million requests a day, there’s still an awful lot of requests hidden on the far side of the 99th. And there, young ones, is where the dragons lie.

A while back we started shining the light into that cave. And even while I expected there to be dragons, I was still shocked at just how large and plentiful they were at our scale. Just 0.4% of requests took 1-2 seconds to resolve, but that’s still a shockingly 200,000 requests when you’re doing those fifty million requests.

Yet it gets worse. Just 0.0025% of requests took 10-30 seconds, but that’s still a whooping 1,250 requests. While some of those come from API requests that users do not see directly, a fair slice is indeed from real, impatient human beings. That’s just embarrassing! And a far, far away land from that pretty picture painted by the 60ms mean. Ugh.

Finally, there was the true elite: The 0.0001%, for a total of 50 instances. Those guys sat and waited between 30 and 60 seconds on their merry request to complete. Triple ugh.

Dragon slaying

Since lighting the cave, we’ve already been pointed to big, obvious holes in our setup that we weren’t looking at before. One simple example was file uploads: We’d stage files in one area, then copy them over to their final resting place as part of the record creation process. That’s no problem when it’s a couple of 10MB audio files, but try that again with 20 400MB video files — it takes a while! So now we stage straight in the final resting place, and cut out the copy process. Voila: Lots of dragons dead.

There’s still much more work to do. Not just because it sucks for the people who actually hit those monster requests, but also because it’s a real drain on the rest of the system. Maybe it’s a N+1 case that really only appears under very special circumstances, but every time the request hits, it’s still an onslaught on the database, and everyone else’s fast queries might well be slowed down as a result.

But it really does also just suck for those who actually have to sit through a 30 second request. It doesn’t really help them very much to know that everyone else is having a good time. In fact, that might just piss them off.

It’s like going to the lobby of your hotel to complain about the cockroaches, and then seeing the smug smile of the desk clerk saying “oh, don’t worry about that, none of our other 499 guests have that problem… just deal with it”. You wouldn’t come back next Summer.

So do have a look at the far side of your histogram. And use actual request counts, not just feel-good percentiles.

Finding your workbench muse

David
David wrote this on 21 comments

Much intellectual capital is spent examining the logical advantages and disadvantages of our programming tools. Much ego invested in becoming completely objective examiners of productivity. The exalted ideal: To have no emotional connection to the workbench.

Hogs and wash. There is no shame in being inspired by your tools. There is no shame in falling in love with your tools. Nobody would chastise a musician for clinging to their favorite, out-dated, beat-up guitar for that impossible to explain “special” sound. Some authors even still write their manuscripts on actual type writers, just for the love of it.

This highlights the tension between programmers as either engineers or craftsmen. A false dichotomy, but a prevalent one. It’s entirely possible to dip inspiration and practice from both cans.

I understand where it’s coming from, of course—strong emotions often run counter to good arguments. It’s hard to convince people who’ve declared their admiration or love of something otherwise. Foolhardy, even. It can make other types of progress harder. If we all fell madly in love with Fortran and punch cards, would that still be the state of the art?

I find the benefits far outweigh the risks, though. We don’t have to declare our eternal fidelity to our tools for them to serve as our muse in the moment. And in that moment, we can enjoy the jolt of energy that can come from using a tool fitting your hand or mind just right. It’s exhilarating.

So much so that it’s worth accepting the limitations of your understanding. Why do I enjoy Ruby so very much? Well, there’s a laundry list of specific features and values to point to, but that still wouldn’t add up to the total sum. I’ve stopped questioning it constantly, and instead just embraced it.

Realizing that it’s not entirely rational, or explainable, also frees you from necessarily having to push your muse unto others. It’s understandable to be proud and interested in inviting others to share in your wonder, but mainly if they haven’t already found their own.

If someone is already beholden to Python, and you can sense that glow, then trying to talk them into Ruby isn’t going to get you anywhere. Just be happy that they too found their workbench muse.

At the end of the day, nobody should tell you how to feel about your tools (let alone police it out of you, under the guise of what’s proper for an engineer). There’s no medal for appearances, only great work.

Server-generated JavaScript Responses

David
David wrote this on 29 comments

The majority of Ajax operations in Basecamp are handled with Server-generated JavaScript Responses (SJR). It works like this:

  1. Form is submitted via a XMLHttpRequest-powered form.
  2. Server creates or updates a model object.
  3. Server generates a JavaScript response that includes the updated HTML template for the model.
  4. Client evaluates the JavaScript returned by the server, which then updates the DOM.

This simple pattern has a number of key benefits.

Benefit #1: Reuse templates without sacrificing performance
You get to reuse the template that represents the model for both first-render and subsequent updates. In Rails, you’d have a partial like messages/message that’s used for both cases.

If you only returned JSON, you’d have to implement your templates for showing that message twice (once for first-response on the server, once for subsequent-updates on the client) — unless you’re doing a single-page JavaScript app where even the first response is done with JSON/client-side generation.

That latter model can be quite slow, since you won’t be able to display anything until your entire JavaScript library has been loaded and then the templates generated client-side. (This was the model that Twitter originally tried and then backed out of). But at least it’s a reasonable choice for certain situations and doesn’t require template duplication.

Benefit #2: Less computational power needed on the client
While the JavaScript with the embedded HTML template might result in a response that’s marginally larger than the same response in JSON (although that’s usually negligible when you compress with gzip), it doesn’t require much client-side computation to update.

This means it might well be faster from an end-to-end perspective to send JavaScript+HTML than JSON with client-side templates, depending on the complexity of those templates and the computational power of the client. This is double so because the server-generated templates can often be cached and shared amongst many users (see Russian Doll caching).

Benefit #3: Easy-to-follow execution flow
It’s very easy to follow the execution flow with SJR. The request mechanism is standardized with helper logic like form_for @post, remote: true. There’s no need for per-action request logic. The controller then renders the response partial view in exactly the same way it would render a full view, the template is just JavaScript instead of straight HTML.

Complete example
0) First-use of the message template.

<h1>All messages:</h1>
<%# renders messages/_message.html.erb %>
<%= render @messages %>

1) Form submitting via Ajax.

<% form_for @project.messages.new, remote: true do |form| %>
  ...
  <%= form.submit "Send message" %>
<% end %>

2) Server creates the model object.

class MessagesController < ActionController::Base
  def create
    @message = @project.messages.create!(message_params)

    respond_to do |format|
      format.html { redirect_to @message } # no js fallback
      format.js   # just renders messages/create.js.erb
    end
  end
end

3) Server generates a JavaScript response with the HTML embedded.

<%# renders messages/_message.html.erb %>
$('#messages').prepend('<%=j render @message %>');
$('#<%= dom_id @message %>').highlight();

The final step of evaluating the response is automatically handled by the XMLHttpRequest-powered form generated by form_for, and the view is thus updated with the new message and that new message is then highlighted via a JS/CSS animation.

Beyond RJS
When we first started using SJR, we used it together with a transpiler called RJS, which had you write Ruby templates that were then turned into JavaScript. It was a poor man’s version of CoffeeScript (or Opalrb, if you will), and it erroneously turned many people off the SJR pattern.

These days we don’t use RJS any more (the generated responses are usually so simple that the win just wasn’t big enough for the rare cases where you actually do need something more complicated), but we’re as committed as ever to SJR.

This doesn’t mean that there’s no place for generating JSON on the server and views on the client. We do that for the minority case where UI fidelity is very high and lots of view state is maintained, like our calendar. When that route is called for, we use Sam’s excellent Eco template system (think ERB for CoffeeScript).

If your web application is all high-fidelity UI, it’s completely legit to go this route all the way. You’re paying a high price to buy yourself something fancy. No sweat. But if your application is more like Basecamp or Github or the majority of applications on the web that are proud of their document-based roots, then you really should embrace SJR with open arms.

The combination of Russian Doll-caching, Turbolinks, and SJR is an incredibly powerful cocktail for making fast, modern, and beautifully coded web applications. Enjoy!

Tom Ward & Zach Waugh join 37signals

David
David wrote this on 5 comments

We’d like to welcome two new members of our programming team.

Tom Ward hails from London and has been active in the Rails community since 2005. He was responsible for the SQLServerAdapter back in the day and has a cool 37 commits(!) under his belt for Rails. He’s formerly of Go Free Range, the team behind big parts of the GOV.UK project. We’re very happy to have him here, thanks to the great recommendation of fellow UK employee Pratik, who made the connection.

Zach Waugh is from Baltimore and the creator of the awesome Flint iOS and OSX clients for Campfire. Much of 37signals have already been enjoying Campfire through Zach’s clients, so we are thrilled that he’ll be able to join.

Both guys will of course stay where they’ve chosen to live and work remotely. We have a lot of both web and mobile projects to dig into, so great to have them both here. That now makes 41 of us!

Rethinking Agile in an office-less world

David
David wrote this on 28 comments

Much of agile software development lore exalts the virtue of in-person collaboration. From literal stand-up meetings to at-the-same-desk pair programming. It’s an optimization for the assumption that we’re all going to be in the same place at the same time. Under that assumption, it’s a great set of tactics.

But assumptions change. “Everyone in the same office” is less true now than it ever was. People are waking up to the benefits of remote working. From quality of life to quality of talent. It’s a new world, and thus a new set of assumptions.

The interesting, and tricky, part of choosing a work pattern is comparing these different worlds. What’s the value of a group of people who a) can only be picked from amongst those within a 30-mile radius of a specific office, b) who have to deal with the indignity of a hour-long daily commute, c) but who’s Agile with that capital A?

Versus a team composed of a) the best talent you could find, regardless of where they live, and b) who has the freedom to work their own schedule, c) but can’t do the literal daily stand-up meeting or pair in front of the same physical computer?

Obviously we’ve made our pick, and the latter setup won by a landslide. But it also made discussions about methodology more complicated.

When you’re talking to someone who thinks that the world is already defined by that first “everyone in the same office” assumption, they’ll naturally champion the hacks that makes that setup more workable. Without necessarily rethinking the overall value of the advice outside that assumption-laden context. Local maxima and all that.

It’s time for a reset. We need the same care and diligence that was put into documenting the agile practices of an office-centric world applied to an office-less world. There’s a new global maxima to be found. Let’s chart its path.

Beyond the default Rails environments

David
David wrote this on 22 comments

Rails ships with a default configuration for the three most common environments that all applications need: test, development, and production. That’s a great start, and for smaller apps, probably enough too. But for Basecamp, we have another three:

  • Beta: For testing feature branches on real production data. We often run feature branches on one of our five beta servers for weeks at the time to evaluate them while placing the finishing touches. By running the beta environment against the production database, we get to use the feature as part of our own daily use of Basecamp. That’s the only reliable way to determine if something is genuinely useful in the long term, not just cool in the short term.
  • Staging: This environment runs virtually identical to production, but on a backup of the production database. This is where we test features that require database migrations and time those migrations against real data sizes, so we know how to roll them out once ready to go.
  • Rollout: When a feature is ready to go live, we first launch it to 10% of all Basecamp accounts in the rollout environment. This will catch any issues with production data from other accounts than our own without subjecting the whole customer base to a bug.

These environments all get a file in config/environments/ and they’re all based off the production defaults.

Continued…

Intention revealing methods

Nick
Nick wrote this on 5 comments

The latest Basecamp for iPhone release involved an immense amount of refactoring with how HTTP requests and HTML pages were rendered. In a previous release, my coworker Sam wrote several methods that didn’t serve to reduce duplication, but instead their purpose was to increase clarity. Since the focus is on comprehension rather than just reusability, it’s easier to understand what is going on and jump in to solve the next problem.

This pattern has been eye-opening, and I’ve been using it to hide implementation details and remove comments explaining what chunks of code are doing. Instead, the name of the method explains what is going on. The gist here to communicate Intention Not Algorithm:

Continued…