We rely way too much on averages to run our business. We have our average response time, our average conversion rate, our average lifetime value per customer, and a thousand other averages.
The problem with averages are that they tell you nothing about the actual incidents and often gives you a misleading big picture.
Our average response time for Basecamp right now is 87ms, says New Relic. That sounds fantastic, doesn’t it? And it easily leads you to believe that all is well and that we wouldn’t need to spend any more time optimizing performance.
Wrong. That average number is completely skewed by tons of super fast responses to feed requests and other cached replies. If you have 1000 requests that return in 5ms, then you can have 200 requests taking 2,000 ms and still get a respectable 170ms average. Useless.
What we need instead are histograms. That way you can pick off clusters much easier and decide whether you want to deal with them or not. Outliers are given a more appropriate weight and you’re more likely to make good decisions from the data.
P.S.: Listing the standard deviation helps very little when there’s great variability. When some of your requests take 5ms and others take 5,000ms, the standard deviation is not of much use.