Fun with stats, the S3 edition.

Mark Imbriaco wrote this on Apr 01 2008 39 comments

Whenever I get the email from Amazon telling me that our monthly bill for web service usage is available, I take it as an excuse to spend a little while looking at our usage stats and how our storage needs have grown.

When I started working at 37signals in October of 2006, we were using less than 1.5TB of disk space for customer data for all of our applications and were starting to get to the point where redundancy and backups were becoming a headache. Shortly after I started, we decided to give S3 a trial run with Campfire and we became believers pretty quickly.

The fact that S3 is priced so reasonably (our last bill was $2,004.12) and the fact that it’s generally hassle free has enabled us to drastically increase the storage limits for all of our applications. Not having to worry about managing the file servers and backups is a pretty nice bonus as well.

Here are some stats from last month, and from October 2007 to compare how things have changed over the past 6 months.

March 2008:

8.8TB of data stored
1.5TB uploaded
2.9TB downloaded
12M requests

October 2007:

5.3TB of data stored
944GB uploaded
2TB downloaded
9.2M requests

Mark Imbriaco wrote this on Apr 01 2008 There are 39 comments.

B.Ackles

on 01 Apr 08

Thankyou for posting your usage and monthly bill. This is the first time I’ve ever seen a company post such information. Knowing these rates relative to the size of 37signals gives startups a better idea of what they’re getting into on the basis of others’ successes.

I’d love to see more data on the Ruby on Rails site. Maybe you could get some startups to publish there rates as well.

WD

on 01 Apr 08

full disclosure?

haha, actually was wondering if that $2,004.12 reflects any discounts, as my math gets a slightly higher total for October: $3,047

MI

on 01 Apr 08

WD: We don’t get any special discounts, we’re just like everyone else, so you might want to double check your math. Amazon has a handy little calculator that you can use to estimate pricing.

Chris Jones

on 01 Apr 08

wow that seems like very good value for the dollars, might have to look into this one a bit more.

thanks for the info (and being honest about the cost!)

Grant

on 01 Apr 08

Fun with stats indeed. My bill was $4.46. Haha.

Out of curiosity, have you guys taken advantage of any other amazon services, like ec2?

Matt Lee

on 01 Apr 08

WD: I get 2,159.83 using 9011.2, 1536, 2969.6, 12000000, 0

Rick

on 01 Apr 08

$2g a month is still a crap load of money.

DHH

on 01 Apr 08

Rick, it’s nothing in comparison to what it would cost us to do this ourselves. I think the last quote we got on a setup that could handle this like we wanted to was $10-15K. And then you have more management overhead too.

WD

on 01 Apr 08

Matt Lee: I get the same now that I’ve found the calculator (my excel spreadsheet was a quick hack job).

$2159 is still more than $2004, but closer.

dave

on 01 Apr 08

Look closely at the calculator: PUT/LIST requests cost significantly more than “other requests”, which in this case are just gets. Given that the vast majority of requests are going to be “get”s from the customers, their numbers make sense.

I put the 12M in the “other” and added 10,000 “PUT” requests for good measure and got a total of $2,004.10.

Maybe mine are the missing 2 cents ;)

Sean

on 01 Apr 08

Do you use another company/service besides Amazon S3 so that you have redundant backup/recovery system or does Amazon guarantee the safety of all data stored via the S3 service?

MI

on 02 Apr 08

All of the numbers I posted are rounded, which is why the math doesn’t work out perfectly. I also didn’t break down the number of requests between PUT/LIST vs. GET/other. For instance, in the March statement we sent 1,151,778 PUT/LIST requests and 10,924,032 GET requests to Amazon.

Sean: Yes, Amazon is built with redundancy kept at the forefront. If you’re intersted, it might be worth looking at the design principles on the S3 homepage.

HB

on 02 Apr 08

I’m curious, do you extend ActiveRecord and store everything there, or something smarter?

mother

on 02 Apr 08

Do you use EC2 instances too?

Patrick Giagnocavo

on 02 Apr 08

The real question going through everyone’s mind is, “could this be done as cheaply via a non-Amazon method”?

Assuming $2K per month, 10 TB storage, 3TB of bandwidth (most providers only care about outbound and will let you transfer as much as you want inbound), plus some amount of redundancy or backup to another system (let’s assume that 10TB can be compressed to a backup system of 2TB disk).

Variable costs ($1000 per month): 3TB of bandwidth is about 15Mbps continuous. 1 Rack including power and 15Mbps = $700 plus $300 for Internap bandwidth (as good as Amazon or better)

Fixed costs ($13,000) 10TB of storage could go into a $3K server box, adding 16×750GB drives at $100 apiece gets you say $5K for the storage backend (you can run OpenSolaris and do iSCSI for instance) 3 app /db servers @ $2K each = $6K whitebox backup server with 4×750 for 2TB storage = $1500 Cisco ASS5505 firewall with 8 port switch $500

So technically, over the course of 15 months you might come out a little ahead; over 24 months, you would save maybe ($48K for Amazon vs $37K for own solution) $11K .

Then again, you don’t have to worry about the hardware – that’s Amazon’s problem.

Tieg

on 02 Apr 08

AWS is indeed a glorious thing. Did you guys experience any problems/complaints because of that S3 downtime a couple weeks ago?

Glevik

on 02 Apr 08

I am bit confused. From what I know 37signals applications don’t involve upload of images and/or movies. I am not sure I understand what is stored on S3. Is transactional database data stored there? If so, how? Well now that I think about it Basecamp takes attachments, is that what it is?

JF

on 02 Apr 08

From what I know 37signals applications don’t involve upload of images and/or movies.

Basecamp, Backpack, Highrise, and Campfire all allow people to upload files. Files can be anything—images, movies, documents, whatever.

Raphael Campardou

on 02 Apr 08

Amazon has a great service, but I just couldn’t trust them with the only copy of my data (worst, data of my customers – if I had any). Even if the say the are fully redundant (which I’m sure they are), I would have a backup, somewehere else.

Actually I use their service as a backup for my personal files.

Per

on 02 Apr 08

@Raphael: With Bezos as an investor (and problably on the board?) I think 37s data is pretty safe :)

But you have a point.

Marcin

on 02 Apr 08

Once you commit to S3 and use it with your applications for a while, how feasible is it to move away? How do you take your 10TB of data with you and leave? Does your agreement with Amazon cover this contingency?

Also, what can Amazon do with your data from legal standpoint? Can they “peek” at it, monitor it in some way, make business decisions based on it? Or are they prohibited from any interactions with the data?

Thanks!

Martin

on 02 Apr 08

Amazon changed their pricing to also bill for requests some time ago. I am curious, are you better or worse of with the new pricing system?

MI

on 02 Apr 08

@Marcin: If we wanted to go somewhere else with our data, we’d just have to download it from S3 and put it somewhere else. It would be a time consuming process to download that much data, but it’s not terribly complicated. Amazon can’t legally “peek” into any files that are uploaded to S3.

@Martin: It hasn’t made any difference to us, I think they’re changes were more focused on people who were using S3 in ways that it wasn’t designed for and making huge volumes of tiny requests. On our $2000 March bill that I mentioned, only $22.44 of it was from the per-request charges.

Greg

on 02 Apr 08

total noob here regarding S3 … but how exactly do you use S3?

Do you simply back up the data on your servers to it? Or when I run a request in your programs does the data immediately hit Amazon and respond back to me … because there’s no lag whatsoever. Last I dabbled with AWS (years ago) was getting XML feeds for junky aws stores. LOL

Do your complete apps & live data all live over at S3?

Chris

on 02 Apr 08

Thanks for sharing this.

I have a similar question to Greg’s around your storage design – do you do any local caching of files coming back from S3 or do users just hit AWS servers each time they download ?

Also, how do you handle file renames in S3 ? From what I understand keys are immutable, so do you just upload a new copy of the file if it gets renamed ?

Sam Leibowitz

on 02 Apr 08

Thankyou for posting your usage and monthly bill. This is the first time I’ve ever seen a company post such information.

One of the guys at SmugMug posted a blog entry a year or so ago in which he talked about the savings they’ve had. Check it out here.

Rahsun McAfee

on 02 Apr 08

Thanks for posting this.

Seems that companies may not have the time or keep this type of information a secret for some reason. So it’s hard to gauge some sort of benefit without getting real data in real situations.

Martin Edic

on 02 Apr 08

For those of you calculating doing it yourself vs. S3: Don’t leave out labor and downtime costs, including the necessary multi-site redundancy, power back-ups and 24/7 live maintenance required to stay at the required service levels. This isn’t website hosting, it’s business data hosting. I think S3 is a no-brainer and a couple of grand is nothing in the context of the business.

Greg DeVore

on 02 Apr 08

Question for you. Do you store all customer files in a single bucket or do you have redundant buckets that you backup customer files too to protect agains accidentall deletion on your end?

Basically, do you have any protection for a case where you (37signals) might screw up and accidentally delete something? As far as I know Amazon won’t recover files that were accidentally deleted.

Not that I think that you would do this. Just wondering if you have a contingency for such a situation.

Stephen

on 02 Apr 08

Think there was a comment earlier in this thread about how you guys work with S3. I am interested in how you do this too – for instance, most if not all your files are only accessible by logged in users, so you cannot just allow people to access the S3 url directly (I think!).

Do you have some local storage with your apps that store recently accessed/uploaded files to speed things up, or does each request for a download go through your servers to S3 and back to the user via your servers again?

Just interested as it was something I was toying with for a while.

Thanks for posting the stats – interesting as always.

BradM

on 02 Apr 08

For those of you that were wondering about S3 and how it was used, there are several great sites on the web that will help you with that question.

For the Ruby / Rails developers go here http://tinyurl.com/333hak

I admit, I have no idea how 37S interfaces with the Service, but this should help alleviate some of those ailing questions.

Helped me out anyways.

Greg DeVore

on 02 Apr 08

@Stephen- I am not sure if that is how 37Signals is doing it but with S3 you can basically generate an authenticated url that expires after a certain amount of time. It uses your account keys (that are passed to it by your application) to generate the url. So basically, you end up with a situation where your web app is the only app that can generate the url for the protected object. Thus, when the customer loads a page that has a link to one of those files on S3 the link they get is directly to S3. But if they wait too long to click on the link then the link can expire (you can set what this time is). All they have to do is refresh the page to get a new url. Of course the user doesn’t ever really see any of this or notice that it is happening.

HB

on 02 Apr 08

BradM-

I think what people were looking for is how 37signals uses S3, not the API description. For a flickr-clone like SmugMug, which uses S3, it’s pretty straightforward: the image data for each of the pictures fits well as one of the S3 API objects to be stored. They’re big and not retrieved very often, so S3 is a easy and effective match for picture storage.

Maybe I’m speaking for myself, but I think people are interested in how 37signals is using it for things like ta-da-list and basecamp where there are many small things being manipulated and displayed all the time. These map really well to a RDBMS via ActiveRecord, but how do you map these to S3 without having tons of microtransactions and running up the bill? And of course there’s the performance of all these round trips to S3 for every view you generate.

All this data (not just the big things like pictures and attached files) needs to be persisted reliably. Do they still have an RDBMS and replicate it whole to S3 periodically? ... Or do they do away with the RDBMS entirely and use S3 for all of their storage (perhaps via some clever caching system)? And what role does Rails and ActiveRecord have in all of it?

What it boils down to is, what are the best practices for using something like S3 in the typical web app? And specifically, what are the best practices for using S3 with rails?

There weren’t any replies for any of these questions, though, so maybe it’s 37signals’ secret sauce. ;) Not that they’re obligated to share that information with us in any way – they’ve been more than generous to share their usage data. And of course rails itself! ;)

RBrown

on 02 Apr 08

An new alternative to managing servers, that I think is even better than S3 is Nirvanix. plus, with them… you can store gloablly!

http://www.nirvanix.com/comparison.aspx

MI

on 03 Apr 08

HB, et al: You guys are reading too much into it, we don’t do any fancy database type abstractions on top of S3, we still very much use a traditional database to store relational type data. We only use S3 to store actual files that have been uploaded to our applications (ie: Files and Attachments in Basecamp, Attachments in Highrise, and so on). Tada doesn’t use S3 at all because it doesn’t support user uploaded content.

George

on 03 Apr 08

Last time I checked, you can’t upload directly to S3, so for every uploaded MB you must add another one. For March 2008 this is 3TB of upload bandwidth and if you are paying for your bandwidth it might make a difference.

If your users upload fairly large files there is also some file management included. If for example a user uploads a 30M file you can’t upload it to S3 within the same request. You have to store the file locally, start the uploading procedure to S3 and when everything is ok, update your records. I think thats what 37Signals does, I’ve seen it on Basecamp :)

Other than that S3 is a great service.

RF

on 03 Apr 08

@George: You can let users upload directly to S3 without touching your servers, but it’s a recently-deployed feature and has the inherent quirks you’d expect. Search for “Browser-based uploads using POST”.

Bryan

on 06 Apr 08

@Marcin: I would add one small disclaimer that I would add to MI’s response to you. Although Amazon can’t legally peek at the data you upload to its S3 service, the US government can petition Amazon for access to it with (possibly without?) a court order.

The only reason that I add that caveat is because some countries (Canada and some European nations) have much stricter privacy laws prohibiting their citizens and corporations from storing certain data on servers located in jurisdictions (the US in this case) which don’t comply with their own laws. FWIA that’s one of the reasons why Amazon created S3 instances in Europe.

Of course read Amazon’s documentation and consult an attorney before taking anything I’ve said at face value.

@37: I really appreciate you posting this information. As someone that’s considering S3 for storage for web apps in development, I’ve been curious as to what numbers I could plug into their estimation calculator. Now that I see what kind of usage you’re experiencing, I’m not sweating it so much. Now if I could just find someone that’s using their EC2 service that’s willing to publish their monthly usage bill….

Anonymous Coward

on 07 Apr 08

the US government can petition Amazon for access to it with (possibly without?) a court order.

And? They can petition you too. And your ISP. And anyone with anything. Amazon isn’t unique or special in that way.

This discussion is closed.

About Mark Imbriaco

Ran ops for 37signals, Heroku, and LivingSocial.

Read all of Mark Imbriaco’s posts, and follow Mark Imbriaco on Twitter.

If you liked this post by Mark Imbriaco, you’ll probably like reading A little bit of empathy goes a long way., Nuts & Bolts: Storage, and Amazon S3 Stats