Ma Bell engineered their phone system to have 99.999% reliability. Just 5 minutes of downtime per year. We’re pretty far off that for most internet services.
Sometimes that’s acceptable. Twitter was fail whaling for months on end and that hardly seem to put a dent in their growth. But if Gmail is down for even 5 minutes, I start getting sweaty palms. The same is true for many customers of our applications.
These days most savvy companies have gotten pretty good about keeping a status page updated during outages, but it’s much harder to get a sense of how they’re doing over the long run. The Amazon Web Services Health Dashboard only lets you look at a week at the time. It’s the same thing with the Google Apps Status Dashboard.
Zooming in like that is a great way to make things look peachy most of the time, but to anyone looking to make a decision about the service, it’s a lie by omission.
Since I would love to be able to evaluate other services by their long-term uptime record, I thought it only fair that we allow others to do the same with us. So starting today we’re producing uptime records going back 12 months for our four major applications:
- Basecamp: 99.93% or about six hours of downtime.
- Highrise: 99.95% or about four hours of downtime.
- Campfire: 99.95% or about four hours of downtime.
- Backpack: 99.98% or just under two hours of downtime.
Note that we’re not juking the stats here by omitting “scheduled” downtime. If you’re a customer and you need a file on Basecamp, do you really care whether we told you that we were going to be offline a couple of days in advance? No you don’t.
While we, and everyone else, strive to be online 100%, we’re still pretty proud of our uptime record. We hope that this level of transparency will force us to do even better in 2012. If we could hit just 4 nines for a start, I’d be really happy.
I hope this encourages others to present their long-term uptime record in an easily digestible format.