We spend about $3 million every year to run all the versions of Basecamp and our legacy applications. That spend is spread across several on-premise data centers and cloud operations. It does not include the budget for our 7-person strong operations team, this is just the cost of connectivity, machines, power, and such.
There’s a lot of spend in that bucket. The biggest line item is the million dollars per year we spend storing 4.5 petabyte worth of files. We used to store these files ourselves, across three physical data centers for redundancy and availability, but the final math and operational hassle didn’t pan out. So now we’re just on S3 with a multi-region redundancy setup.
After that, it’s really a big mixed bag. We spend a lot of money on databases, which all run on MySQL. There’s ElasticSearch clusters that power our search. A swarm of Redis servers providing caching. There’s a Kafka pipeline and a Big Query backend for analytics. We have our own direct network connections between the data centers and the cloud.
Everything I’ve talked about so far is infrastructure we’d run and pay for regardless of our programming language or web framework. Whether we run on Python, PHP, Rust, Go, C++, or whatever, we’d still need databases, we’d still need search, we’d still need to store files.
So let’s talk about what we spend on our programming language and web framework. It’s about 15%. That’s the price for all our app and job servers. The machines that actually run Ruby on Rails. So against a $3 million budget, it’s about $450,000. That’s it.
Let’s imagine that there was some amazing technology that would let us do everything we’re doing with Ruby on Rails, but it was TWICE AS FAST! That would save us about ~$225,000 per year. We spend more money than that on the Xmas gift we give employees at Basecamp every year. And that’s if you could truly go twice as fast, and thus require half the machines, which is not an easy thing to do, despite what microbenchmarks might delude you into thinking.
Now imagine we found a true silver bullet. One where the compute spend could be reduced by an order of magnitude. So we’d save about $400,000/year, reducing everything we spend running our app and job servers to an unrealistically low $45,000/year. That reduction wouldn’t even pay for two developers at our average all-in cost at Basecamp!
Now let’s consider the cost of those savings. We spend more money on the 15-strong developer team at Basecamp than our entire operations budget! If we make that team just 15% less productive, it’ll cost us more than everything we spend to run Ruby and Rails at Basecamp!
Working with Ruby and Rails is a luxury, yes. Not every company pay their developers as well as we do at Basecamp, so maybe the rates would look a little different there. Maybe some companies are far more compute intensive to run their apps. But for most SaaS companies, they’re in exactly the same ballpark as we are. The slice of the total operations budget spent running the programming language and web framework that powers the app is a small minority of the overall cost.
For a company like Basecamp, you’d be mad to make your choice of programming language and web framework on anything but a determination of what’ll make your programmers the most motivated, happy, and productive. Whatever the cost, it’s worth it. It’s worth it on a pure cost/benefit, but, more importantly, it’s worth it in terms of human happiness and potential.
This is why we run Ruby. This is why we run Rails. It’s a complete bargain.
DHH
I don’t understand how hosting 4.5 PB in the cloud is anywhere close to being cost effective.
And when that’s your biggest line item, this seems to warrant another evaluation.
Hosting files on AWS S3 wasn’t cost effective for Dropbox, nor will it be for you.
https://www.wired.com/2016/03/epic-story-dropboxs-exodus-amazon-cloud-empire/amp
You can buy a single storage server with ~0.5 PB for ~$20k.
https://www.siliconmechanics.com/system/storform-ds4-44
That means it’d cost you ~$200k to satisfy your storage needs. Let’s even 2x that for additional redundancy.
So for $400k total, which is a ONE TIME cost, you could solve this problem.
Now of your $3m YEARLY spend you say it’s largely on storage.
Let’s say for argument sake you spend $1.25M per year on S3 storage. Over 5 years, that’s $6.25m just on S3 storage.
If you hosted yourself, that cost would only be $0.4m.
A 5-year savings of $5.85m is huge.
Thoughts / comments.
We spend $1m/year on storing and serving files. I think it’s something like $700K for the storage and $300k for transfer. But you’re comparing what a single chassis of storage would cost to what a multi-AZ, multi-region, automated-failover setup costs. It’s not a meaningful comparison. And no, buying 2 chassis do not add anywhere remotely the same level of redundancy or availability 😂
We used to host our own files. Operating your own multi-petabyte storage solution across three different data centers, dealing with the enterprise solutions that make this actually work, dealing with the upgrade scenarios, rebalancing, and whatever. We did the per-GB price comparison, and S3 is actually the most cost competitive option of all of the cloud.
I do appreciate the youthful incredulity of thinking you can save us a million dollars a year by pointing to a hard-drive rack 😄. It’s good to question the fundamentals! And we’ve done just that. Spreadsheets up the wazoo. We don’t spend a million dollars with glee. But you can try to cut corners on redundancy and availability, and then you can see how much leniency your customers will show you when you lose their files and have to explain yourself. I’m the one who has to say sorry! So I have to be able to look at our setup and believe we did everything possible to keep our customer’s data safe, and then some. This is that.
David
Wow. You’re getting phenomenal discounts from AWS. My company hosts 1 PB on S3 and it costs us $24,000 per month.
You’re hosting 5x more storage but only paying 2.5x more than us.
I just check my math using the S3 cost estimator
https://calculator.s3.amazonaws.com/index.html
Looks like 4.5 PB with responsible connectivity costs lost price around $140,000 per month ($1.7m annually)
As mentioned in the other reply, this includes both storage and transit costs. We do get some discounts, but just the standard stuff that’s in the books for long-term commits at our storage level and spend. I wish we had a secret list to price from, but unfortunately not!
Are you using “S3 Standard – Infrequent Access“?
Yeah, I was wondering about that too. And ratio of storage between S3 and Glacier, just out of curiosity.
But already thanks for sharing numbers, very interesting.
David
Have you consider using Backblaze for S3 like storage.
https://www.backblaze.com/b2/cloud-storage.html
It’s 1/3 the price if AWS and they are storage experts.
It will be nice to hear an opinion on that one.
Thank you for sharing.
David,
I like Ruby, but when faced with a choice of tech at work, the focus is usually not on infrastructure cost (within reasonable bounds), but on overall system performance. And so we inevitably choose Java. A usual argument is that a technology that is 2x faster, when applied to 2-5 layers of backend services, will produce an order of magnitude better latency to the end user due to compounding effect of latency improvement within each service. The argument follows that an order of magnitude better latency brings an order of magnitude more business to the company, because customers like fast services. This reasoning implies that when we evaluate the impact of a slower technology, we must account for the lost business due to slower services.
What is your opinion on this?
That our average backed response time is sub-100ms, and that taking that from 100ms to 10ms (ha!) wouldn’t be appreciated by users, because frontend latency from assets and whatever is a larger slice of the whole request. Besides, much of the backend request is spent on DB queries that would be the same regardless of programming language.
In conclusion, choosing Java because you think it’s going to have material business value for an app like Basecamp is just bad decision making.
DHH,
Thanks for sharing the info and i can understand the efforts, but i wonder if you’re making your teams more productive, happier and paying for the programing/dev team more than the market average to have the best output, even though there are no real/tangible developments with BC tools/features/functions/integrations since a long time, its like your objective now is just making everything stable and that’s all. The competition is very difficult and the missing features in BC are available in other applications.
FYI.
” the BC yellow logo is totally a useless step and confusing. The focus should be more on something beneficial, saving time and cost)
Putting the noise in SvN
You guys are quite generous when it comes to Christmas presents:-)
Breaking down the operating budget like this is a good move, but I think the final analysis treats the intangibles unfairly: operating costs are weighed against employee benefits without any acknowledgement of possible employee costs.
For example, what if there’s an employee premium required to achieve Basecamp’s level of performance and stability using Ruby? It’s very difficult to model and doesn’t yield a pithy blog post, but deserves mention in the full equation.