About Noah

Noah Lorang is the data analyst for Basecamp. He writes about instrumentation, business intelligence, A/B testing, and more.

What being on the front page of Hacker News does for our bottom line

Noah wrote this on Mar 08 2012 32 comments

There’s been some speculation that we significantly increased the amount of posts here on SvN in the build up to the launch of the new Basecamp, and in particular that we targeted the front page of Hacker News for those articles. Some people aren’t happy about this.

I’d like to bring a little context and fact to bear on this to put these speculations to rest.

In the month before the launch of the new Basecamp, we published 25 posts here on Signal vs. Noise. For comparison, during the same period in prior years, we published (before 2007 we used a different blogging engine, so I don’t have those numbers handy):

29 posts in 2011
50 posts in 2010
36 posts in 2009
49 posts in 2008
42 posts in 2007

Relatively speaking, this was actually a pretty low level of posting activity for us. During all the years prior to this one in that period, we were also maintaining a separate product blog, whose posts aren’t included in these totals.

During that period, there were 24,826 first time visitors to any of our sites who we could identify as having first gotten to us via Hacker News (in all, we received more like 105,000 unique visitors from Hacker News, but many of those were repeat visitors). 97 of those visitors signed up, with more than 85% of them electing the free plan. This conversion rate pales compared to our average conversion rate, particularly for non-search-engine traffic.

When all is said and done, what’s our likely financial outcome from Hacker News visitors for those 25 posts? About $300 total per month.

We typically write on SvN because we have an announcement to make, or because we have something we’re thinking about that we’d like to share.

Do we benefit from other people noticing our blog posts and linking them up from their blogs or other outlets? Absolutely – we’ve been talking about the power of word-of-mouth marketing for almost a decade.

As a writer, do I like it when more people read what I’ve written? Sure.

Is there any business value for us in getting on the front page of Hacker News? Not really.

Upvote us, downvote us, ignore us – I don’t care, but I hope you’ll make that decision based on the merits of the content of a given post, not because you think we’re trying to manipulate the front page of Hacker News for our gain.

Pssst... your Rails application has a secret to tell you

Noah wrote this on Feb 14 2012 27 comments

What would you say if I told you that you could get more precise, actionable, and useful information about how your Rails application is performing than any third party service or log parsing tool with just a few hours of work?

For years, we’ve used third party tools like New Relic in all of our apps, and while we still use some of those tools today, we found ourselves wanting more – more information about the distribution of timing, more control over what’s being measured, a more intuitive user interface, and more real-time access to data when something’s going wrong.

Fortunately, there are simple, minimally-invasive options that are available virtually for “free” in Rails. If you’ve ever looked through Rails log files, you’ve probably seen lines like:

Feb  7 11:27:49 bc-06 basecamp[16760]: [projects]   Person Load (0.5ms)  SELECT `people`.* FROM `people` WHERE `people`.`id` = ? LIMIT 1
Feb  7 11:27:49 bc-06 basecamp[16760]: [projects] Rendered events/_post.rhtml (0.4ms)
Feb  7 11:27:50 bc-06 basecamp[16760]: [projects] Rendered project/index.erb within layouts/in_global (447.2ms)
Feb  7 11:27:50 bc-06 basecamp[16760]: [projects] Completed 200 OK in 529ms (Views: 421.7ms | ActiveRecord: 58.0ms)

You could try to parse these log files, or you could tap into Rails’ internals to extract just the numbers, but both of those are somewhat difficult and open up a lot of areas for things to go wrong. Fortunately, in Rails 3, you can get all this information and more in whatever form you want with just a few lines of code.

All the details you could want to know, after the jump…

Continued…

No framework needed

Noah wrote this on Feb 08 2012 21 comments

It goes without saying that we use Rails a lot here at 37signals. Often times, when we look at a problem, we turn to Rails or something similar, because when you have a high-performance precision screwdriver, everything starts to look like a finely engineered screw. Sometimes, what you really need is a big hammer, because what you’re looking at is a nail.

Our public sites – sites like 37signals.com and basecamphq.com – are a perfect example of this.

Let me tell you about our journey with these sites over the years, and how we’ve landed on a simple solution that boosted conversion rate by about 5%.

Good enough

There’s nothing particularly dynamic about these sites; we might throw a “Happy Monday” in there, or we might make some tweaks based on a URL parameter, and we A/B test them extensively, but there’s no database or background services involved.

Stretching back to the pre-Basecamp days, the 37signals.com site was written with PHP. There was no Rails back then, Ruby wasn’t commonly used for web development, and DHH and others worked in PHP, so it was the logical choice. As we added sites, they continued to use PHP since it was fast and easy. This worked well for years and years—our public sites were relatively performant and rock-stable, and we didn’t really have many problems. The biggest pain was in setting up for local development, which ended up being quite the pain to get set up in OS X in a way that behaved well with Pow, Passenger, etc.

Getting better

A few years ago, Sam Stephenson and Josh Peek wrote Brochure as a way to translate our marketing sites to Rack apps. This solved the local development challenges, and let us use a language we were all generally more comfortable with. It was a little slower than PHP, and meant dealing with Passenger on deployment, but it was a fair compromise at the time. We moved one site to brochure, and then ran out of steam to move the rest – work on our applications took a higher priority.

A few months ago I took a serious look at our public sites’ performance. They were making a lot of requests for individual assets and page load times were pretty poor – Basecamp itself loaded much faster than the essentially static signup page for it. Local setup problems with the PHP sites also meant that it was harder to work on the sites, and so we were less productive and less inclined to work on them.

Back to the basics for fun and profit

Our solution to this (in addition to spriting images and cleaning up unused styles and Javascript) was to switch to using totally static HTML pages. We’re using the stasis gem to compile .html.erb files locally and on deploy, along with Sprockets to pre-process and concatenate stylesheets and Javascript. Our web server ends up serving plain old HTML and a single CSS and Javascript file, with no interpretation.

This makes local development easy, and what you see locally is always what will be deployed. This also makes it trivial to distribute the marketing site to multiple datacenters or distribution networks around the world—just upload the compiled files, rather than worrying about dependencies for running an interpreted site.

While we haven’t done that yet, just from some mild spriting and cleanup and moving to static HTML, we shaved about half a second off the total load time for basecamphq.com, and saw about a 5% improvement in conversion rate result from that (the link between page speed and conversion rate has been studied more rigorously as well by the likes of Google, Amazon, etc.).

Lessons from Moneyball: don't get left behind

Noah wrote this on Feb 06 2012 15 comments

I recently read and watched “Moneyball”, and enjoyed both greatly. It’s a great story in and of itself, but I also found it to be an interesting parallel to the state of the “web software” industry today.

Moneyball starts in the week before the 2002 baseball draft, with a set of meetings that pit Oakland A’s general manager Billy Beane against his team of scouts. The scouts’ primary mechanism of evaluating players was visual – did the guy look, walk, and talk like a major league baseball player? On the other hand, Billy, with his assistant Paul DePodesta, had a largely objective system for evaluating baseball players based on things like how often they got on base.

Billy won the fight over talent selection and picked players that met his system, even if his scouts disagreed. This pattern continued throughout the season, and the A’s went on to set a league record for consecutive wins.

When I started writing I thought if I proved X was a stupid thing to do people would stop doing X. I was wrong.
—Bill James in his 1984 Baseball Abstract

In many ways, the “web software” industry is still where these scouts are. For most people, the primary way of evaluating their software is with their own eyes and emotions. Over the years, people have tried to bring some objectivity or framework to do thing this with things like “personas”, but the process is still a largely subjective one, just like a scout looking at how a player swings and never really looking at whether he gets on base.

The reality, of course, is that this is no longer necessary. Just like baseball in the years since Bill James coined “sabermetrics”, we have the tools now as an industry to do better. We can identify the outcomes we want to see, and we can objectively evaluate a design in the context of those outcomes.

It’s never been easier to test your designs and find out what works where the rubber meets the road. You can use a tool like Optimizely for any site or something like A/Bingo in a Rails app and have a test running in a matter of minutes. Measuring and understanding behavior in other ways has also never been easier—there are new tools and startups helping to do this every week.

For Billy Beane and the Oakland A’s, using data was about leveling the playing field between their meager salary budget and the huge budget of teams in places like New York and Boston. For the web industry, the playing field is already fairly level – it doesn’t take much more than a web browser and a text editor to build something. What data does for web software is reduce the role that blind luck plays. You’re more likely to – on average – find success if you evaluate your work using real data about the outcomes that matter.

You can choose to keep working like those scouts did and go on gut instinct alone. It might work for a while, but I think most people would say that baseball’s moving forward now, and the people who haven’t made the switch are being left behind. Our industry will move forward too—do you want to be left behind?

Giving away the secrets of 99.3% email delivery

Noah wrote this on Jan 31 2012 60 comments

We send a lot of mail for Basecamp, Highrise, Backpack, and Campfire (and some for Sortfolio, the Jobs Board, Writeboard, and Tadalist). One of the most frequently asked questions we get is about how we handle mail delivery and ensure that emails are making it to people’s inboxes.

Some statistics

First, some numbers to give a little context to what we mean by “a lot” of email. In the last 7 days, we’ve sent just shy of 16 million emails, with approximately 99.3% of them being accepted by the remote mail server.

Email delivery rate is a little bit of a tough thing to benchmark, but by most accounts we’re doing pretty well at those rates (for comparison, the tiny fraction of email that we use a third party for has had between a 96.9% and 98.6% delivery rate for our most recent mailings).

How we send email

We send almost all of our outgoing email from our own servers in our data center located just outside of Chicago. We use Campaign Monitor for our mailing lists, but all of the email that’s generated by our applications is sent from our own servers.

We run three mail-relay servers running Postfix that take mail from our application and jobs servers and queue it for delivery to tens of thousands of remote mail servers, sending from about 15 unique IP addresses.

How we monitor delivery

We have developed some instrumentation so we can monitor how we are doing on getting messages to our users’ inbox. Our applications tag each outgoing message with a unique header with a hashed value that gets recorded by the application before the message is sent.

To gather delivery information, we run a script that tails the Postfix logs and extracts the delivery time and status for each piece of mail, including any error message received from the receiving mail server, and links it back to the hash the application stored. We store this information for 30 days so that our fantastic support team is able to help customers track down why they may not have received an email.

We also send these statistics to our statsd server so they can be reported through our metrics dashboard. This “live” and historical information can then be used by our operations team to check how we’re doing on aggregate mail delivery for each application.

Why run your own mail servers?

Over the last few years, at least a dozen services that specialize in sending email have popped up, ranging from the bare-bones to the full-service. Despite all these “email as a service” startups we’ve kept our mail delivery in-house, for a couple of reasons:

We don’t know anyone who could do it better. With a 99.3% delivery rate, we haven’t found a third party provider that actually does better in a way they’re willing to guarantee.
Setup hassle Most of the third party services require that you verify each address that sends email by clicking a link that gets sent to that address. We send email from thousands and thousands of email addresses for our products, and the hassle of automatically registering and confirming them is significant. Automating the process still introduces unnecessary delivery delays.

Given all this, why should we pay someone tens of thousands of dollars to do it? We shouldn’t, and we don’t.

Read more about how we keep delivery rates high after the jump…

Continued…

Windows to Mac to Windows to Mac to... Linux? It doesn't matter.

Noah wrote this on Jan 10 2012 76 comments

Over the last 20 years, my primary computing environment has gone from Windows 3.1, to Mac OS 6/7/8/9, to Windows for about a decade, and then back to a Mac a couple of years ago. Recently, I switched to using a Linux desktop as my primary computer. I can’t say that there’s a dramatic reason why I switched (it’s not some political statement about free and open source software); I just wanted to use some hardware that was impractical to get from Apple.

Something crazy happened when I switched: absolutely nothing changed.

I basically used three programs on the Mac: Google Chrome (web browsing), iTerm (terminal), and Adium (IM). Now, I use Google Chrome (web browsing), Terminator (terminal), and Empathy (IM). Switching was a matter of copying over a couple of directories and configuration files and connecting Chrome and Dropbox to sync. When I wanted to do some real work, getting my development environment running for our applications was just as easy as on a Mac.

Perhaps surprisingly to some people, Linux hardware support has improved to the point that everything worked perfectly out of the box, just like on a Mac. In a shift from what David saw a few years ago, and despite being largely panned by critics, I find the stock interface in Ubuntu 11.10 to be just as nice as Mac OS X Lion.

I’m just as productive on Linux as I was on OS X, and there’s no reason you couldn’t be too if you wanted or needed to switch. All you need these days to build great things is a browser, a text editor, and the programming language or tool of your choice. As long as it works for you, it really doesn’t matter whether you build your killer social-media-photo-sharing-Facebook-tweeting app on OS X, Linux, or Windows.

Automatically save your work in Basecamp, Highrise, Backpack, and Writeboard

Noah wrote this on Dec 14 2011 74 comments

Today we’re bringing autosave to Basecamp (for messages and comments), Highrise (for notes), Backpack (for messages, comments, and notes), and Writeboard (for documents and comments), as well as right here on SvN.

Autosave keeps a local copy of your work in your browser’s storage as you write, so you’re always protected against accidental refreshes, closing the wrong tab, a browser crash, or clicking a link that opens in the same window. The local copy will be kept in your browser’s local storage until you submit that message or comment.

If you accidentally close your browser or refresh the page, everything you’ve typed will be restored automatically when you return. You don’t have to do anything, it just works.

Autosave works with modern browsers: Internet Explorer 9+, Firefox 3.5+, Chrome 4.0+, and Safari 4.0+. If you’re not already using one of these versions, now’s a great time to upgrade!

We hope this gives you even more confidence in working with our products. Losing something while you’re writing it stinks – hopefully this helps cut those incidents down dramatically.

The rhythms of 37signals

Noah wrote this on Sep 29 2011 26 comments

I was thinking this morning about what I perceived to be my normal working pattern—lots in the morning, then tapering out from mid-day on with an occasional bump in the evenings. I wanted to see if this was quantifiable through git logs, and I decided to look across a wide range of our repositories.

The chart below shows the portion of each person’s commits that occur within a given hour of the day in their local time.

As you can see, there’s a wide range in preferred working hours – one of the great advantages of working in slow time is that this is absolutely fine. There’s enough overlap in hours for people to be able to work together, but enough flexibility to work when you want to.

API design for humans

Noah wrote this on Sep 28 2011 28 comments

One of the things about working with data at 37signals is that I end up interacting with a lot of different APIs—I’ve used at least ten third-party APIs in the last few months, as well as all of our public APIs and a variety of internal interfaces. I’ve used wrappers in a couple different languages, and written a few of my own. It’s fair to say I’ve developed some strong opinions about API design and documentation from a data consumer’s perspective.

From my experience, there are a few things that really end up mattering from an API usability perspective (I’ll leave arguments about what is truly REST, or whether XML or JSON is actually better technically to someone else).

Tell me more: documentation is king

I have some preferences for actual API design (see below), but I will completely trade them for clear documentation. Clear documentation includes:

Examples that show the full request. This can be a full example using curl like we provide in our API documentation, or just a clear statement of the request like Campaign Monitor does for each of their methods.

Examples that show what the expected response is. One of the most frustrating things when reading API documentation is not knowing what I’m going to get back when I utilize the API—showing mock data goes along way towards this. Really good API documentation like this would let you write an entire wrapper without ever making a single request to the API. Campaign Monitor and MailChimp both have good, but very different takes on this.

A listing of error codes, what they mean, and what the most common cause of receiving them is. I’m generally not the biggest fan of the Adwords API in many ways, but they are a great example of exhaustively documenting every single response code they return.

A searchable HTML interface. Whether it’s visually appealing doesn’t really matter much, and Google indexing it is plenty of search. What doesn’t work for me is when the API documentation is in PDF, or I have to authenticate to get access to it.

Communication of versioning and deprecation schedules. There’s some debate about whether versioning is better than gradual evolution, but regardless, anytime you’re changing something in a way that might break someone’s existing code, fair warning is required, and it should be on your documentation site. Sometimes you have to make a change for security reasons that don’t allow much advance notice, but wherever possible, providing a couple of weeks notice goes a long way. The Github API clearly shows what will be removed when and shows the differences between versions clearly.

Continued…

A/B Testing Tech Note: determining sample size

Noah wrote this on Sep 20 2011 9 comments

In discussions on our posts about A/B testing the Highrise home page, a number of people asked about sample size and how long to run a test for. It’s a good question, and one that’s important to understand. Running an A/B test without thinking about statistical confidence is worse than not running a test at all—it gives you false confidence that you know what works for your site, when the truth is that you don’t know any better than if you hadn’t run the test.

There’s no simple answer or generic “rule of thumb” that you can use, but you can very easily determine the right sample size to use for your test.

What drives our needed sample size?

There are a few concerns that drive the sample size required for a meaningful A/B test:

1) We want to be reasonably sure that we don’t have a false positive—that there is no real difference, but we detect one anyway. Statisticians call this Type I error.

2) We want to be reasonably sure that we don’t miss a positive outcome (or get a false negative). This is called Type II error.

3) We want to know whether a variation is better, worse or the same as the original. Why do we want to know the difference between worse vs same? I probably won’t switch from the original if the variation performs worse, but I might still switch even if it’s the same—for a design or aesthetic preference, for example.

What not to do

There are a few “gotchas” that are worth watching out for when you start thinking about the statistical significance of A/B tests:

1) Don’t look at your A/B testing tool’s generic advice that “about 100 conversions are usually required for significance”. Your conversion rate and desired sensitivity will determine this, and A/B testing tools are always biased to want you to think you have significant results as quickly as possible.

2) Don’t continuously test for significance as your sample grows, or blindly keep the test running until you reach statistical significance. Evan Miller wrote a great explanation of why you shouldn’t do this, but briefly:

If you stop your test as soon as you see “significant” differences, you might not have actually achieved the outcome you think you have. As a simple example of this, imagine you have two coins, and you think they might be weighted. If you flip each coin 10 times, you might get heads on one all of the time, and tails on the other all of the time. If you run a statistical test comparing the portion of flips that got you heads between the two coins after these 10 flips, you’ll get what looks like a statistically significant result—if you stop now, you’ll think they’re weighted heavily in different directions. If you keep going and flip each coin another 100 times, you might now see that they are in fact balanced coins and there is no statistically significant difference in the number of heads or tails.

If you keep running your test forever, you’ll eventually reach a large enough sample size that a 0.00001% difference tests as significant. This isn’t particularly meaningful, however.

3) Don’t rely on a rule of thumb like “16 times your standard deviation squared divided by your sensitivity squared”. Same thing with the charts you see on some websites that don’t make their assumptions clear. It’s better than a rule of thumb like “100 conversions”, but the math isn’t so hard it’s worth skipping over, and you’ll gain an understanding of what’s driving required sample size in the process.

How to calculate your needed sample size

Instead of continuously testing or relying on generic rules of thumb, you can calculate the needed sample size and statistical significance very easily. For simplicity, I’ve assumed you’re doing an A vs B test (two variations), but this same approach can be scaled for other things.

Continued…

← More posts

Newer posts →