You’re reading Signal v. Noise, a publication about the web by Basecamp since 1999. Happy .

David

About David

Creator of Ruby on Rails, partner at 37signals, best-selling author, public speaker, race-car driver, hobbyist photographer, and family man.

One-hit wonder

David
David wrote this on 15 comments

It’s incredibly hard to trace the success of any business, product, or project down to the skill of the founders. There’s plenty of correlation, but not much causation.

That’s a scary thought to a lot of people: What if my success isn’t based solely on my talent and hard work, but rather my lucky timing or stumbling across an under-served market by happenstance? What if I’m just a one-hit wonder!?

And I say, so what? So what if you are? I, for one, am completely at peace with the idea that Basecamp might very well be the best product I’ll ever be involved with. Or that Ruby on Rails might be the peak of my contributions to technology.

What about my life would be any different if I could truly trace down the success to personal traits of wonder? Even if I somehow did have a “magic touch” — and I very much believe that I don’t — why would I want to leave those ventures, just to prove that I could do it again?

Yet that siren song calls many a founder, entrepreneur, and star employee. The need to prove they’re not a one-hit wonder. That they’re really that good because they could do it again and again.

For every Elon Musk, there are undoubtedly thousands of people who left their one great idea to try again and fell flat on their faces, unable to go back to the shine and the heyday of that original success, and worse off for it in pride, blood, and treasure.

Life is short. Move on if you don’t love what you’re doing. But don’t ever leave a great thing just because you want to prove to others or yourself that you’re not a one-hit wonder.

My brother made a video tribute to one of my favorite artists: Elwood T. Risk. Woody worked as a construction worker for many years before “giving himself permission” to become an artist. Now he’s had work featured all over the place, including the show Californication. Great guy, great story, great art.

Captured by quality

David
David wrote this on 7 comments

Making stuff good is rewarding, making stuff great is intoxicating. It’s like there’s a direct line from perfection and excellence to the dopamine release. And the reverse is true as well. When you make crappy stuff, you feel crappy. No one likes to work in a broken shop on a broken stool.

So it’s hard to fault people from being attracted to sayings like “Quality is Free”. It validates the good feelings that flow from making stuff perfect, and it makes it seem like it’s a completely free bargain. Win-win and all that.

But like anything, it’s easy to take too far. Almost everything outside of life-critical software has diminishing returns when it comes to quality. Fixing every last bug, eradicating every last edge case, and dealing with every performance dragon means not spending that time on making new things.

You can make the best, most polished, least-defective saddle out there, but if the market has moved on from horses to cars for general transportation, it’s not going to save your business. And it doesn’t even have to be as dramatic as that. Making the best drum brakes is equally folly once disc brakes are on the market.

So you have to weigh the value of continuing to hone and perfect the existing with pursuing the new and the novel. This often happens in waves. You come up with a new idea, and a crappy first implementation. You then spend a couple of rounds polishing off the crap until the new idea is no longer new and crappy, but known and solid. Then it’s time to look hard at whether another round is worth it.

The bottom-line is to make that which is not good enough, good enough, and then skate to where the puck is going to be next.

Drive development with budgets not estimates

David
David wrote this on 11 comments

Much of software development planning is done through estimates. You give me a description of the feature, I give you a best guess on how long it’s going to take. This model has been broken since the dawn of computer programming, yet we keep thinking it’s going to work. That’s one definition of insanity.

What I’ve found to be a more useful model is simply to state what something is worth. If a feature is worth 5 weeks of development, for example, that’s the budget. Such a budget might well be informed by an estimate of whether some version of that feature can be possibly built in 5 weeks, but it’s not driven by it.

Because most features have scales of implementation that are world’s apart. One version of the feature might take 2 weeks, another might take 6 months. It’s all in where you draw the line, how comprehensive you want to be, and what you’re going to do about all those inevitable edge cases.

The standard response to the estimation approach is to propose a 100% implementation that’s going to take 100% of the effort to build. Some times that’s what you need. Nothing less than having everything is going to be good enough. I find that’s a rare case.

A more common case is that you can get 80% of the feature for 20% of the effort. Which in turn means that you can get five 80% features, improvements, or fixes for the price of one 100% implementation. When you look at it like that, it’s often clear that you’d rather get more done, even if it isn’t as polished.

This is particularly true if you don’t have all the money and all the people in the world. When you’re trying to make progress on a constrained budget, you have to pinch your development pennies. If you splurge on gold-plating for every feature, there’s not going to be anything left over to actually ship the damn thing.

That’s what proposing a budget based on worth helps you with. It focuses the mind on what assumptions we can challenge or even ignore. If we only have 5 weeks to do something, it’s just not going to work to go through the swamp to get there. We have to find a well-paved road.

In the moment, though, it can be frustrating. If we just had a little more time, we could do so much better! So much better for whom? Your developer pride? Or the customer? Will the latter actually care about all the spit and grit you poured into these particular corners? Don’t be so sure.

In the end, accepting a budget is about accepting constraints. Here are the borders of scope for our wild dreams and crazy colors. Much of invention lies in the fight within those constraints. Embrace that.

Responsive design works best as a nip'n'tuck

David
David wrote this on 15 comments

It’s pretty amazing how far you can take responsive design these days. Between the latest CSS tricks and a splattering of JavaScript, you can turn an elephant into a mouse, and make it dance on a parallax-scrolling ball. It’s time to reconsider whether you should, though.

There’s a point on the trade-off curve where rearranging everything, hiding half the page, and then presenting it as “the same template, just styled differently” is simply not meaningful. Nor is it simple. Nor is it efficient. A one-size-fits-all HTML base document is not a trophy-worthy accomplishment in itself, lest we forget.

The way we think about this at Basecamp is as a nip’n’tuck. If you’re just stretching or shrinking things a bit, then responsive can definitely be easier (we do that on this blog, for example). But the further you move towards a full make-over, the less it makes sense.

This is particularly true if your framework of choice doesn’t make it needlessly complicated to use separate templates for separate purposes. Rails 4.1 has a feature called variants that makes it trivial to share the controller and model logic of your application, but return different HTML templates when the devices call for it. It’s this feature we’re using for the Basecamp mobile views (which in turn are embedded in our mobile apps) instead of the prevailing responsive paradigm.

By going with dedicated templates, we don’t have to include needless data or navigation that’s going to be hidden by the responsive rules. We have less variables to think about on the individual page. It’s just a simpler workflow where it’s easier to make changes without considering a smather of side effects.

So the next time you’re marveling at a responsive design that’s able to make the best use of a 27” iMac at full screen and a fit neatly on a 3.5” iPhone as well, you might want to ask yourself why you’re trying to make one performer do so many tricks.

Hybrid sweet spot: Native navigation, web content

David
David wrote this on 41 comments

When we launched the iPhone version of Basecamp in February of last year, it was after many rounds of experimentation on the architectural direction. The default route suggested by most when we got started back in early/mid-2012 was simple enough: Everything Be Native!

This was a sensible recommendation, given the experiences people had in years prior with anything else. Facebook famously folded their HTML5 implementation in favor of going all native to get the speed they craved with the launch of their new app in August of 2012.

Thus their decision was likely driven by what the state of the art in HTML and on mobile looked like circa 2010-2011. In early 2010, people were rocking either the iPhone 3GS or 3G. By modern 2014 standards, these phones are desperately slow. Hence, any architectural decisions based in the speed of those phones are horribly outdated.

Continued…

Dragons on the far side of the histogram

David
David wrote this on 9 comments

Performance tuning is a fun sport, but how you’re keeping score matters more than you think, if winning is to have real impact. When it comes to web applications, the first mistake is start with what’s the easiest to measure: server-side generation times.

In Rails, that’s the almighty X-Runtime header — reported to the 6th decimal of a second, for that extra punch of authority. A clear target, easily measured, and in that safe realm of your own code to make it appear fully controllable and scientific. But what good is saving off milliseconds for a 50ms internal target, if your shit (or non-existent!) CDNs are costing you seconds in New Zealand? Pounds, not pennies, is where the wealth is.

Yet that’s still the easy, level one, part of the answer: Don’t worry too much about your internal performance metrics until you’ve cared enough about the full stack of SSL termination overhead, CDN optimization, JS/CSS asset minimization, and client-side computational overhead (the latter easily catching out people following the “just do a server-side API”, since the json may well generate in 50ms, but then the client-side computation takes a full second on the below-average device — doh!).

Level two, once reasonable efforts have been made to trim the fat around the X-Runtime itself, is getting some big numbers up on the board: Mean and the 90th percentile. Those really are great places to start. If your mean is an embarrassing 500ms+, well, then you have some serious, fundamental problems that need fixing, which will benefit everyone using your app. Get to it.

Keep going beyond even the 99th

Just don’t stop there. Neither at the mean or the 90th. Don’t even stop at the 99th! At Basecamp, we sorta fell into that trap for a while. Our means were looking pretty at around 60ms, the 90th was 200ms, and even the 99th was a respectable 700ms. Victory, right?

Well, victory for the requests that fell into the 1st to 99th percentile. But when you process about fifty million requests a day, there’s still an awful lot of requests hidden on the far side of the 99th. And there, young ones, is where the dragons lie.

A while back we started shining the light into that cave. And even while I expected there to be dragons, I was still shocked at just how large and plentiful they were at our scale. Just 0.4% of requests took 1-2 seconds to resolve, but that’s still a shockingly 200,000 requests when you’re doing those fifty million requests.

Yet it gets worse. Just 0.0025% of requests took 10-30 seconds, but that’s still a whooping 1,250 requests. While some of those come from API requests that users do not see directly, a fair slice is indeed from real, impatient human beings. That’s just embarrassing! And a far, far away land from that pretty picture painted by the 60ms mean. Ugh.

Finally, there was the true elite: The 0.0001%, for a total of 50 instances. Those guys sat and waited between 30 and 60 seconds on their merry request to complete. Triple ugh.

Dragon slaying

Since lighting the cave, we’ve already been pointed to big, obvious holes in our setup that we weren’t looking at before. One simple example was file uploads: We’d stage files in one area, then copy them over to their final resting place as part of the record creation process. That’s no problem when it’s a couple of 10MB audio files, but try that again with 20 400MB video files — it takes a while! So now we stage straight in the final resting place, and cut out the copy process. Voila: Lots of dragons dead.

There’s still much more work to do. Not just because it sucks for the people who actually hit those monster requests, but also because it’s a real drain on the rest of the system. Maybe it’s a N+1 case that really only appears under very special circumstances, but every time the request hits, it’s still an onslaught on the database, and everyone else’s fast queries might well be slowed down as a result.

But it really does also just suck for those who actually have to sit through a 30 second request. It doesn’t really help them very much to know that everyone else is having a good time. In fact, that might just piss them off.

It’s like going to the lobby of your hotel to complain about the cockroaches, and then seeing the smug smile of the desk clerk saying “oh, don’t worry about that, none of our other 499 guests have that problem… just deal with it”. You wouldn’t come back next Summer.

So do have a look at the far side of your histogram. And use actual request counts, not just feel-good percentiles.

Basecamp network attack postmortem

David
David wrote this on 13 comments

As we detailed in Basecamp was under network attack, criminals assaulted our network with a DDoS attack on March 24. This is the technical postmortem that we promised.

The main attack lasted a total of an hour and 40 minutes starting at 8:32 central time and ending around 10:12. During that window, Basecamp and the other services were completely unavailable for 45 minutes, and intermittently up and down or slow for the rest. In addition to the attack itself, Basecamp got put in network quarantine by other providers, so it wasn’t until 11:08 that access was restored for everyone, everywhere.

The attack was a combination of SYN flood, DNS reflection, ICMP flooding, and NTP amplification. The combined flow was in excess of 20Gbps. Our mitigation strategy included filtering through a single provider and working with them to remove bogus traffic.

To reiterate, no data was compromised in this attack. This was solely an attack on our customers’ ability to access Basecamp and the other services.

There are two main areas we will improve upon following this event. Regarding our shield against future network attacks:

  1. We’ve formed a DDoS Survivors group to collaborate with other sites who’ve been subject to the same or similar attacks. That’s been enormously helpful already.
  2. We’re exploring all sorts of vendor shields to be able to mitigate future attacks even faster. While it’s tough to completely prevent any interruption in the face of a massive attack, there are options to minimize the disturbance.
  3. Law enforcement has been contacted, we’ve added our statement to their case file, and we’ll continue to assist them in catching the criminals behind this attack.

Regarding the communication:

  1. There was a 20-minute delay between our first learning of the attack and reporting it to our customers via Twitter and status. That’s unacceptable. We’ll make changes to ensure that it doesn’t take more than a maximum of 5 minutes to report something like this again.
  2. Although we were successful at posting information to our status site (which is hosted off site), the site received more traffic than ever in the past, and it too had availability problems. We’ve already upgraded the servers that power the site and we’ll be conducting additional load and availability testing in the coming days.

We will continue to be on high alert in case there is another attack. We have discussed plans with our providers, and we’re initiating new conversations with some of the top security vendors.

Monday was a rough day and we’re incredibly sorry we weren’t more effective at minimizing this interruption. We continue to sincerely appreciate your patience and support. Thank you.

Basecamp was under network attack this morning

David
David wrote this on 12 comments

Criminals attacked the Basecamp network with a distributed denial-of-service attack (DDoS) this morning. The attackers tried to extort us for money to make it stop. We refused to give in and worked with our network providers to mitigate the attack the best we could. Then, about two hours after the attack started, it suddenly stopped.

We’ve been in contact with multiple other victims of the same group, and unfortunately the pattern in those cases were one of on/off attacks. So while things are currently back to normal for almost everyone (a few lingering network quarantine issues remain, but should be cleared up shortly), there’s no guarantee that the attack will not resume.

So for the time being we remain on high alert. We’re collaborating with the other victims of the same group and with law enforcement. These criminals are sophisticated and well-armed.

Still, we want to apologize for such mayhem on a Monday morning. Basecamp, and our other services, are an integral part of how most of our customers get work done. While no data was compromised in this attack, not being able to get to your data when you need it is unacceptable.

During the attack we were able to keep everyone up to date using a combination of status.basecamp.com, Twitter, and an off-site Gist (thank you GitHub!). We’ll use the same channels in case we’re attacked again. If the attack does not resume, we will post a complete technical postmortem within 48 hours.

We want to thank all our customers who were affected by this outage for their patience and support. It means the world to us. Thank you.

Finding your workbench muse

David
David wrote this on 21 comments

Much intellectual capital is spent examining the logical advantages and disadvantages of our programming tools. Much ego invested in becoming completely objective examiners of productivity. The exalted ideal: To have no emotional connection to the workbench.

Hogs and wash. There is no shame in being inspired by your tools. There is no shame in falling in love with your tools. Nobody would chastise a musician for clinging to their favorite, out-dated, beat-up guitar for that impossible to explain “special” sound. Some authors even still write their manuscripts on actual type writers, just for the love of it.

This highlights the tension between programmers as either engineers or craftsmen. A false dichotomy, but a prevalent one. It’s entirely possible to dip inspiration and practice from both cans.

I understand where it’s coming from, of course—strong emotions often run counter to good arguments. It’s hard to convince people who’ve declared their admiration or love of something otherwise. Foolhardy, even. It can make other types of progress harder. If we all fell madly in love with Fortran and punch cards, would that still be the state of the art?

I find the benefits far outweigh the risks, though. We don’t have to declare our eternal fidelity to our tools for them to serve as our muse in the moment. And in that moment, we can enjoy the jolt of energy that can come from using a tool fitting your hand or mind just right. It’s exhilarating.

So much so that it’s worth accepting the limitations of your understanding. Why do I enjoy Ruby so very much? Well, there’s a laundry list of specific features and values to point to, but that still wouldn’t add up to the total sum. I’ve stopped questioning it constantly, and instead just embraced it.

Realizing that it’s not entirely rational, or explainable, also frees you from necessarily having to push your muse unto others. It’s understandable to be proud and interested in inviting others to share in your wonder, but mainly if they haven’t already found their own.

If someone is already beholden to Python, and you can sense that glow, then trying to talk them into Ruby isn’t going to get you anywhere. Just be happy that they too found their workbench muse.

At the end of the day, nobody should tell you how to feel about your tools (let alone police it out of you, under the guise of what’s proper for an engineer). There’s no medal for appearances, only great work.