This is the final part of a three part series on how we use Twitter as a support channel. In the first part, I described the tools we use to manage Twitter; in the second part, we built a model to separate tweets into those that need an immediate response or not.
In the arc of a three part series, the final part is supposed to be either a story of triumph or an object lesson.
The triumphant story would be about how we implemented the model we built previously in production, integrated it with our Rails-based Twitter client, and saw massive quantifiable improvements resulting from it. I would look smart, competent, and impactful.
The object lesson is that sometimes practical concerns win out over a neat technological solution, and that’s the story here.
Sometimes good isn’t good enough
The model we built had a false positive rate of about 7%. That’s fair, and in many applications, that would be just fine. In our case, we want to be very confident we’re not missing important tweets from people that need help. Practically, that means that someone would have to check the classification results occasionally to find the handful of tweets that do need an immediate response that slipped through.
After talking to the team, it became pretty clear that checking for mis-classified tweets would be more work than just handling the full, unclassified feed with the manual keyword filtering we have been using. This is a great example of a case where absolute numbers are more important than percentages: while the percentage impact in terms of filtering out less urgent tweets would be significant, the actual practical impact is much more muted because we’ve optimized the tool to handle tweets quickly.
Part of the reason why we’re able to get away with keyword filtering rather than something more sophisticated is because of just how accurate it is with essentially no false positives. There’s actually a surprising amount of duplication in tweets—excluding retweets, the last 10,000 tweets we’ve indexed have only 7,200 unique bodies among them. That means that when a person looks at the first tweet using a phrase they can instantly identify that there’s a keyword that’s going to reoccur (for example, as soon as I started this series, we added “Behind the Scenes: Twitter” to the keyword list) and add it to the keyword list.
Most of the benefit with little effort
Rather than putting the predictive model into production, I switched our keyword filters from a hardcoded array to a Redis set, and added a form to add new keywords to list in a manner of seconds. This took five minutes to add rather than the five hours it might have taken to hook up a live predictive model.
I haven’t forgotten about the model. It’s in a proverbial desk drawer, and in a few months I’ll revisit it. I might have some new ideas of things to try to improve the false positive rate, and we’ll have a bigger training set as well. Heroics and fancy technology might not have won today, but the important thing is to meet the business’s needs in as efficient a manner as possible.
Chris B.
on 16 Nov 12Hey Noah,
Interesting application of Bayesian classification; too bad it didn’t work out. Though can’t help thinking that this problem you were trying to solve was self-inflicted (not you personally but 37signals). After all, you are encouraging people to mix in their support requests with all the other stuff that goes on on twitter. I would guess that very few of your customers would expect to receive support in response to a tweet, instead of an email or web form, if you did not encourage people to do so on your help site.
I would be very interested to hear the rationale for using twitter for support.
David
on 16 Nov 12This was a really interesting set of articles with a surprising twist to the story. It shows a great deal of technological mastery, modesty and pragmatism. I enjoyed it.
GeeIWonder
on 16 Nov 12This reminds me of everything Bayesian. Ever.
lxkxqdyf
on 20 Nov 121
This discussion is closed.