Giving away the secrets of 99.3% email delivery

Noah wrote this on Jan 31 2012 60 comments

We send a lot of mail for Basecamp, Highrise, Backpack, and Campfire (and some for Sortfolio, the Jobs Board, Writeboard, and Tadalist). One of the most frequently asked questions we get is about how we handle mail delivery and ensure that emails are making it to people’s inboxes.

Some statistics

First, some numbers to give a little context to what we mean by “a lot” of email. In the last 7 days, we’ve sent just shy of 16 million emails, with approximately 99.3% of them being accepted by the remote mail server.

Email delivery rate is a little bit of a tough thing to benchmark, but by most accounts we’re doing pretty well at those rates (for comparison, the tiny fraction of email that we use a third party for has had between a 96.9% and 98.6% delivery rate for our most recent mailings).

How we send email

We send almost all of our outgoing email from our own servers in our data center located just outside of Chicago. We use Campaign Monitor for our mailing lists, but all of the email that’s generated by our applications is sent from our own servers.

We run three mail-relay servers running Postfix that take mail from our application and jobs servers and queue it for delivery to tens of thousands of remote mail servers, sending from about 15 unique IP addresses.

How we monitor delivery

We have developed some instrumentation so we can monitor how we are doing on getting messages to our users’ inbox. Our applications tag each outgoing message with a unique header with a hashed value that gets recorded by the application before the message is sent.

To gather delivery information, we run a script that tails the Postfix logs and extracts the delivery time and status for each piece of mail, including any error message received from the receiving mail server, and links it back to the hash the application stored. We store this information for 30 days so that our fantastic support team is able to help customers track down why they may not have received an email.

We also send these statistics to our statsd server so they can be reported through our metrics dashboard. This “live” and historical information can then be used by our operations team to check how we’re doing on aggregate mail delivery for each application.

Why run your own mail servers?

Over the last few years, at least a dozen services that specialize in sending email have popped up, ranging from the bare-bones to the full-service. Despite all these “email as a service” startups we’ve kept our mail delivery in-house, for a couple of reasons:

We don’t know anyone who could do it better. With a 99.3% delivery rate, we haven’t found a third party provider that actually does better in a way they’re willing to guarantee.
Setup hassle Most of the third party services require that you verify each address that sends email by clicking a link that gets sent to that address. We send email from thousands and thousands of email addresses for our products, and the hassle of automatically registering and confirming them is significant. Automating the process still introduces unnecessary delivery delays.

Given all this, why should we pay someone tens of thousands of dollars to do it? We shouldn’t, and we don’t.

Read more about how we keep delivery rates high after the jump…

How we keep our mail delivery rates up

Lets be honest from the get-go. Mail delivery is more of an art than a science. We’ve found that even when you “play by the rules”, there’s still times when a major provider will reject all your mail without notice. Usually it takes a couple emails to to the providers abuse address, and things get resolved. In spite of these “out of our control” issues, we’ve found a few things help us keep delivery rates up:

Constantly monitor spam blacklists. We have a set of Nagios alerts that regularly check if we’re listed on any delivery blacklists, and whenever they go off we take whatever corrective action we need to get back off the blacklist.
Have valid SPF records. Don’t impersonate your users. When running a web app like Basecamp, which sends email that are generated by another user, it can be tempting to send the email from that user (e.g., so that a comment I wrote on Basecamp would appear to come from noah at 37signals dot com), which might make people feel more comfortable. Unfortunately, this is a surefire way to end up on spam lists, since you’ll likely be sending from an IP address that does not have the valid SPF records. And chances are, if the user’s domain does have an SPF record, it doesn’t include your application’s IP.
Sign the mail! DKIM and Domain Keys. Yahoo and Gmail both score signed email higher.
Dedicated and conditioned email sending IPs.
Configure reverse dns entries. Most of the “big boys” won’t accept mail from your servers if your reverse dns entries don’t match. You might need your IP provider to help with setting up these records.
Enroll in feedback loops. We haven’t automated our parsing of feedback, but a daily / weekly review of feedback loop emails helps us know when there’s an unhappy user, or other problem. Too many complaints and you’ve got trouble.

A problem we haven’t solved

By far the biggest cause of failed email delivery we see is due to bad email addresses that were entered in to the system—problems like ‘[email protected]’ or ‘[email protected]’. By and large, these pass a regular expression check for email addresses, but aren’t actually valid addresses. There’s no perfect solution here, but we’ve been experimenting with checking for valid DNS records or actually attempting to connect to the mail server as part of the validation of an email address, and with notifying people within the application when we aren’t able to deliver mail to them.

A few tools

MX Toolbox is a great site for doing a quick check on your mail servers and your customer’s mail servers.
Sender Score is really a marketing tool for Return Path, but it can be used to get insight about how some of the “big boys” are scoring your sending IPs.
Postmark offers a web tool and API to get the SpamAssassin score for a message, which can be helpful for identifying things you can improve to boost delivery rates.

Have questions about email delivery? Ask in the comments, and we’ll try our best to answer.

Curious about incoming email? We’ll share some info and statistics about how we handle that in a future post.

Noah wrote this on Jan 31 2012 There are 60 comments.

Jon Lim

on 31 Jan 12

Is there a particular way you guys are signing the emails? As in, do you generate the key for every email address that you are sending from, or is the server doing the hard work upon sending?

Christopher Lee

on 31 Jan 12

I ran into a similar problem with the typo email addresses in my app. For the most common ones, I whitelisted the popular email services and then calculated the levenshtein distance to compare an email address domain to the whitelist.

it’s not perfect, and i don’t know if it will work for the kind of scale you guys are at, but it certainly worked for us.

let me know if you are curious about it and need a hand.

Manuel F. Lara

on 31 Jan 12

We use SendGrid (although there are other services like them) for our most important emails (like invitations to use our service), and you definitely don’t have to pre-approve that email address in any way, that’s more of a requirement for newsletter sending-services like MailChimp.

You guys should try Amazon SES, SendGrid, MailGun or similar, but it may be more expensive than your current setup. And let’s be honest, if it ain’t broken..

NL

on 31 Jan 12

@Christopher – We tried something similar too, and it was pretty minimal in terms of impact—less than 5% of bad email addresses could be fixed that way.

Taylor

on 31 Jan 12

@Jon,

We actually don’t sign the mail in a manner I’m satisfied with. One, we use wildcard dns entries … which causes random lookup failures for the txt records. Two, we don’t have the mail segmented well across the keys. (For instance, we don’t sign mail with a unique key per subdomain/account.) I have a todo to work on this, however since our delivery rate remains high, it’s been on the back burner. (IE it’s a lot of work and I’m not sure how much closer to 100% it will bring us. .1% probably isn’t worth it.)

Taylor

on 31 Jan 12

@Manuel,

Thanks for your comment! I’m not sure if you are correct about which providers require pre-approval. For SES and Dynect’s email service you do have to pre approve the sending address. In both cases it would cost 10’s of thousands to send our volume of mail. It’s just not worth it.

MattBuei

on 31 Jan 12

A problem we haven’t solved By far the biggest cause of failed email delivery we see is due to bad email addresses….

Why don’t you just use the activation account system when the users place their e-mail address?

Glen Barnes

on 31 Jan 12

When starting out I definitely recommend something like SendGrid. We are sending around 10-15K emails a day and the implementation was really simple. If you had your own server you will spend more time configuring and keeping your email server up-to-date than you would in monthly SendGrid fees. They also help with all of the anti-spam issues for you and offer expert advice.

Obviously as you get bigger and are sending ~2 million emails a day having things in house probably makes a lot more sense.

PS: SendGrid doesn’t require that you validate email addresses first.

paul

on 31 Jan 12

this is a great article, many thanks for writing. Do y’all do any kind of pixel tracking or similar, or do you rely on NDRs solely for tracking purposes?

I’m kinda curious also how often you see a server just silently dropping messages as opposed to returning a bounceback.

Marcel

on 31 Jan 12

Noah, thanks for the excellent article. I love how you guys are sharing your experiences!

I’m curious: what metrics dashboard are you currently using? I assume statsd sends its data to Graphite, but are you using their dashboard as is? Or did you make a custom dashboard that uses these graphs? Thanks!

Joe Van Dyk

on 31 Jan 12

For tanga.com, we use Sendgrid for transactional emails and Mailchimp for bulk emails, both work great.

We’ve also certified the IP addresses we send mail from with returnpath, I think that also boosts deliverability rates.

NL

on 31 Jan 12

@Marcel – we actually aren’t using Graphite. We have our own statsd-server that’s “wireline” compatible with Etsy’s statsd (for the most part, we’ve added support for gauges, which are never aggregated, unlike counters and timers), but uses a hybrid of Redis + flat files for data storage.

We have an internally developed Rails app that then serves as our dashboard (for virtually everything—ops related, financial, customer support, etc.). We went for something we rolled ourselves because we think there’s great power in having all of your data sources together in one place, and there’s nothing out there that I’ve seen that quite does that.

Taylor

on 31 Jan 12

@Marcel

We store it in statsd. Noah invented a graphing library called “flyash” which he hopes to open source soon. We have our own metrics dashboard called “dash” that makes the graphed data easy to digest for all of our teams.

Iain Dooley

on 31 Jan 12

Isn’t measuring deliverability from the server response codes wildly inaccurate? Mail servers will generally accept mail for delivery and then decide whether it’s spam or not rather than rejecting mail.

Joshua

on 31 Jan 12

Hmm, my interest peaked when you said you didn’t have a good way to check for bad domains when users enter their email address.

Why not do a simple AJAX call when they tab out of the email address field (or on submit) to a backend service (maybe directly) to do a super quick dnslookup and return an mx. If it returns then you can at least confirm there is a mail server.. though that doesn’t test the users entry explicitly as they could still have typed the wrong domain in that just happened to be valid. Just a thought.

Taylor

on 31 Jan 12

@Ian,

For that reason we use a “fake” image to track if an html email is read / opened. It’s an imperfect system, but it’s far better than guessing.

Will Jessop

on 31 Jan 12

@Joshua: Noah mentioned that were were trying that, but it’s not perfect. For instance gmal.com and yahooo.com do actually have valid mail hosts (an MX record or fallback A record), though you’d expect them to be mis-spellings a DNS check would return success.

Russ

on 31 Jan 12

Thanks! Very helpful.

Few questions:

1) can you whitelist a domain without whitelisting the corresponding IP?

2) will whitelisting a domain (blah.com help ensure delivery of emails from a subdomain (email.blah.com?

Roberto Martinez

on 31 Jan 12

@Taylor For the [email protected] problem, have you tried BriteVerify ? Or the problem is beyond of what BV can deliver?

Steve

on 31 Jan 12

Just be careful validating emails by fake SMTP connections (without sending actually DATA) or even using a service offering this, it can get you marked as a spammer pretty quickly. Al Iverson has a great article about this: http://www.spamresource.com/2012/01/address-validators-what-are-you.html

Also, Mailchimp has a great free ebook “Email Delivery For IT Professionals” in their resources section: http://mailchimp.com/resources/guides/email-delivery-for-it-professionals/

Alex Hillman - Postmark

on 31 Jan 12

First, thanks for mentioning our Spamcheck API in this post.

99.3% delivery is an incredible achievement, and it’s awesome that you’re sharing the techniques you use to achieve that delivery rate. At Beanstalk, our transactional emails delivered through Postmark achieve an over 99% inbox delivery rate according to ReturnPath.

One workflow that we’ve built into Beanstalk and then provide to our customers through Postmark is our bounce hooks. In Beanstalk, we use Postmark to catch bounced emails and parse the bounce reason, alerting Beanstalk’s customers to anything from a typo’d email to a full inbox. So long as the customer is still logged into Beanstalk, they’ll be notified that they should fix a typo (or full inbox) before proceeding and then reactivate sending straight from inside their bounce inbox.

When we were running Newsberry, we had a “list hygiene” tool that looked for common typos in email domains and we offered an automatic correction in the list import workflow.

We really recommend ReturnPath for extra help along the way. Congrats on such a great bounce rate, that’s something to be proud of!

Benjy

on 31 Jan 12

An interesting read… especially given that we’ve been having issues here at my job w/ our Basecamp system emails getting through our Barracuda spam filter. Had to have our email administrator whitelist our basecamp domain and now it seems to be OK.

Mal Curtis

on 31 Jan 12

For important emails (invitations, enrollments etc) we marshall their mailers and store them, so that if they aren’t delivered they can be sent again.

This has reduced the overhead of users who enter incorrect email addresses. The support team can review email bounces, see if there are obvious spelling mistake in email address and then in one click update the email address and resend the original email[s] (the mailers are marshalled, so it’s before the template has been rendered – prevents incorrect times etc.). This has been a life saver.

In an ActiveMailer mailer method, we simply have to call “save_me” with an optional category, and it’s all handled.

Warwick Poole

on 31 Jan 12

Hey Noah

Curious if you have issues delivering successfully to Postini-managed addresses? We constantly struggle to get past their mysterious filters.

Taylor

on 31 Jan 12

@Warwick,

Yes. There’s a thing on the support page that helps you decipher how the message was scored … if you can get the original message from the user.

Alex

on 31 Jan 12

What if a person doesn’t want to give an email during registration? What this person should do if an email is requested?

Thomas Zacchi intoto

on 31 Jan 12

Is the Email delivery rate of 99,3% for US mail delivery or also for international delivery?

Could be interesting to hear about international delivery as my experience tell me, that this is a big problem.

Great of you to share your information.

Anonymous

on 31 Jan 12

The Nagios filter you linked to doesn’t actually include the list of blacklists you checked. Would you consider providing the list you use?

Anonymous

on 31 Jan 12

Can you elaborate on what you mean by “conditioned” mail-sending IPs? I can guess, but I still wonder how you manage that.

Anonymous

on 31 Jan 12

Regarding sending mail from user email addresses: could you dynamically check their SPF records to find out if they allow that, and only use your own “proxy” email addresses if they don’t? Yes, that seems like a lot of trouble, but in exchange for making users a lot more comfortable and having “reply” automatically do something vaguely sane, it seems worth doing.

(SPF is insane, broken, and now useless, but now we’re all stuck with it. Per http://craphound.com/spamsolutions.txt , “It will stop spam for two weeks and then we’ll be stuck with it”.)

Victoria-Tienda bebe

on 31 Jan 12

Really interesting post. Most of my questions and curiosities have been contemplated in the post. One question, are there more tools or ways to avoid email deliverability problems? Thank you so much,

Anonymous

on 31 Jan 12

Regarding third-party services, I definitely agree that you don’t want to use any of the services designed for “mailing lists”, because they won’t work very well for the kinds of mail you send. Someone else in the comments already mentioned SendGrid and similar services, which seem like a possibility, though it sounds like you’ve already considered and dismissed them as well. A third possibility, though: why not use one of the paid email services with dedicated mail servers, which (like any sensible mail server) allows you to send from any address you want, as long as you SMTP AUTH correctly?

Anonymous Coward

on 31 Jan 12

@37signals

Why don’t you sell this email solution as a service?

Taylor

on 31 Jan 12

@Thomas,

International. All mail.

@Anonymous,

There are tons of lists online. I think we check like the top 50 or so.

Re: Conditioned IPs. You have to throttle how much you send at first, and “warm” the IP until it has a good reputation.

George

on 31 Jan 12

Hey Noah & Taylor,

Tell me if I’m mistaken, but don’t you think the 99.3% figure is a little misleading? It sounds like your messages aren’t being outright rejected by the mail servers (which is good), but that doesn’t say much about whether or not they are actually reaching users and not being flagged as spam. Or are you able to track that last part too somehow?

According to SMTP.com, my company has only had ~50 bounces and 12,000+ successful deliveries, so according to your metric, our deliverability rating would be amazing (99.5+%). But in truth, we’ve had significant problems with messages getting flagged as spam and not reaching users.

Would be curious to hear your thoughts.

George

Brian

on 31 Jan 12

If you’re sending massive amounts of email you might try using ExactTarget. They have a really nice API and good delivery services with ReturnPath

Rodger

on 01 Feb 12

www.unlocktheinbox.com has a free email feedback loop submission form in the members area.

Alex

on 01 Feb 12

I would add Mailgun to that list: http://mailgun.net

They have an excellent API for sending, receiving, storing, and creating.

Sam Granieri

on 01 Feb 12

Isnt sortfolio still up for sale?

Cowardly Lion

on 01 Feb 12

Long time reader but I’m not a product user – mainly because Basecamp’s main screen is awful – but this is what your blog should have more of. They all don’t have to be as intense as this problem which many of us face but good valuable information.

Tor

on 01 Feb 12

@Taylor: Amazon SES does not require you to verify emails, once you are approved for production use, which is a pretty simple process.

mahmoudimus

on 01 Feb 12

@alex I also recommend mailgun. Those guys know email.

Marcel

on 01 Feb 12

@NL, @Taylor: thanks for your reply regarding statsd and your graphing dashboard. I’m not quite happy with Graphite, but I love the concepts of Graphite + statsd. Haven’t been able to find a decent alternative, so I was considering building something myself as well. Very curious to see what you’ve done with flyash :-)

Nazzareno

on 01 Feb 12

In my opition Delivery rate is an important metric but not the main one. It’s more a “non-bounce” rate, which is quite different from the “Inbox Delivery Rate” or “Inbox Placement Rate” (IPR).

If you have a correct configuration, a good recipients list and trasactional contents, your configuration is perfect to get accepted by mail servers.

But if you need to monitor IPR, reach the inbox, troubleshoot junk folder deliveries, monitor open rate and CTR on domain/device/geo-location basis, and keep your infrastructure updated (ie new black/white lists, new FBLs, DMARC, new antispam filters…), then you probably need a dedicated team just to run your mail server.

Graeme Mathieson

on 01 Feb 12

Yup, for most things, at some points during your service’s growth it makes economic sense to outsource “context” (as opposed to “core”) services, and at some other points it makes economic sense to bring these services in house. It’s the same with email as it is with hosting (cloud or data centre?), and probably with a dozen other services we all use.

You talk about paying someone “tens of thousands of dollars” to do it for you. I wonder if you could share the flip side:

How much did it cost (in terms of engineering effort, capital expenditure on equipment, and maybe even commercial agreements with large email providers?) to build out your email system? How much (in terms of ongoing devops-stylee maintenance, customer support, and the portion of your hosting costs) does it cost to maintain the email cluster?

We’re currently in the situation where we’re relying on a third party for email delivery and, while I’m confident we’re getting good delivery rates, I’m sure there will be a point where it’s economically more sensible to bring it in house. I wonder when that’ll be…

Anonymous

on 01 Feb 12

There are tons of lists online. I think we check like the top 50 or so.

Exactly why I asked. Would you consider sharing the list? :)

Mark

on 01 Feb 12

My frustration with sending email is that while delivery rates are almost always perfect (~99.x%), you can’t really assure your view rate. For example, “opens” only rate those emails which have been selected by the user, while not accounting for those who actually only saw your email in their preview window.

The one way around is, of course, to put a link in the email so you can get a count for “clicks”, however that seems a cheesy equivalent to posting a large article on several pages in order to artificially bump page views. Especially in the cases where there is no need for a link.

NL

on 01 Feb 12

@Mark, et al—

You’re absolutely right—“delivery rates” should really be called “remote mail server acceptance rates” (some people call this the ‘hard bounce’ rate), and aren’t a perfect measure of whether an email actually gets to the user.

We do track open rates for emails that are already HTML formatted and making remote requests for images, but you’ll never get 100% accuracy with that metric because many people use plain text emails or don’t load images. Our experience is that the best you’ll ever see is between 60-70% “open” rate because of this. Some of our applications in some contexts only send plain text emails as well, so we don’t track open rate there at all.

Why is remote server acceptance rate important anyway? Some thoughts:

1) First, because hard bounces really do happen a lot, and at our scale, a 1% difference in hard bounce rate means 160k messages per week that aren’t making it to users, which means a poor experience for many and many support requests coming in to us. Based on all the information we’ve been able to find, we’re pretty sure a 0.7% hard bounce rate at our scale is pretty good.

2) Second, because it is a relative metric of overall deliverability. Our experience has been that when we do get on a blacklist, servers start hard bouncing our mail until we get off of it. As we’ve improved our SpamAssassin type scores over the last few years, we’ve seen an improvement in hard bounce rate. We also see a strong correlation between hard bounce rate and the number of email delivery related support requests we get. While it’s not the perfect measure, it’s the best measure we have available that we can reliably monitor.

Again, there’s no perfect way that I know of to reliably tell whether an email is getting to a user, since read status isn’t particularly accurate. We use whatever we can (hard bounce rate, open rate, number of support requests relating to email) to get as close to that as we can.

Mark

on 01 Feb 12

Kinda makes you do a double take when people say that email is the killer app. Especially when you consider the amount of time and energy it takes just to make sure an email is delivered to an opted-in recipient. In that regard, that same money spent on stamps (or even a phone call) seems to be the better option—generally speaking, of course.

Kredit

on 01 Feb 12

Noah, thanks for the excellent article. We really recommend ReturnPath for extra help along the way. Congrats on such a great bounce rate, that’s something to be proud of!

Taylor

on 01 Feb 12

@Kredit,

Returnpath stopped trying to sell to us when they looked into our existing delivery and reputation metrics. They went from “you really need us” to “we can’t explain how much of that last 1% we will help you capture.” Hopefully we continue to find the same success as RP is extremely expensive and again feels a bit like you are dealing with the mafia: “Pay us or else your mail won’t make it to the big boys!”

dimitrios mistriotis

on 01 Feb 12

The “[email protected]” problem has to do with the do so called “Doppelganger Domain” problem, I have a mini-app here: doppelradar.heroku.com . Can provide with my solution email if you are interested :-)

Brandon Cordell

on 01 Feb 12

I’m not sure how twitter verifies email on the backend (maybe something to do with checking for send failures) but when my MX records were messed up I would log into twitter and get a message that they haven’t been able to reach me on my profiles email address and I should change it if I no longer use that email.

Anon

on 01 Feb 12

Among other things, I am finding it difficult to learn how to stay in Postini’s good graces. Anybody have a link to some info on this? Any other, similar, commonly used services like Spam Assasin and Barracuda as well.

Ari

on 01 Feb 12

Would you be able to point me to a list of spam blacklists that you check against? I’d love to implement that Nagios check, but am having a hard time finding a definitive lists of blacklist servers.

Nick Nelson

on 02 Feb 12

For VPNHQ and VPS.NET we’ve always used Postmark – sure there might be a 5 minute setup per email address, but then, we don’t think about it again…

Mail servers? What are those? We’re a hosting provider, but if we could get rid of email operations..we would in a second…Postmark just makes it easy.

I do agree with the above, the problem isn’t bounces for us – it would be marking the email as spam or in gmail, “unimportant”.

That doesn’t happen to us anymore.

Yousaf

on 02 Feb 12

Will to turn this data into an infographic! TFS

JD

on 02 Feb 12

Yousaf, Here you go. http://imgur.com/TfVQM

juefeng ge

on 03 Feb 12

We use Briteverify to validate all of our emails before sending. We ran it against some 2-300K emails that we’ve delivered to in the last few months and the 9% of emails they flagged as “bad” emails had a delivery rate of less than 5%! So all in all, Briteverify has been the best service we’ve found to scrub out bad emails.

Michele

on 05 Feb 12

About the problem you haven’t solved: how about applying Levenstein distance for the domains that are not correct? Like the Google’s function: “search instead for “

This discussion is closed.

About Noah

Noah Lorang is the data analyst for Basecamp. He writes about instrumentation, business intelligence, A/B testing, and more.

Read all of Noah’s posts, and follow Noah on Twitter.

If you liked this post by Noah, you’ll probably like reading API design for humans, The performance impact of "Russian doll" caching, and Pssst... your Rails application has a secret to tell you