You’re reading Signal v. Noise, a publication about the web by Basecamp since 1999. Happy !

Signal v. Noise: Sysadmin

Our Most Recent Posts on Sysadmin

Making Database Maintenance Fun: mysql_role_swap

Taylor
Taylor wrote this on 10 comments

We’ve come a long way in the last year in the way we operate our sites. We’ve stabilized our applications, improved their response time, and increased their availability.

To accomplish these improvements we’ve done a series of database maintenances that varied from upgraded hardware, to new database servers, to configuration changes that required restart. In each of these operations we had one common goal: minimize the interruption to our customers.

Today we are releasing a small script that has made our lives, and our customer’s lives a whole lot better. We use this script to change the roles of our databases from replication masters to slaves, and vice versa. The fact that the script does all the steps previously performed by a human in a more timely and perfect manner is where we achieve all the gain.

Without this script we used to spend minutes accomplishing these maintenance tasks. With the script we’ve swapped databases under production load with no user noticeable interruption!

The script has lots of hard coded paths and users and other assumptions. But this is too good to keep to ourselves. We’re sharing it with you with the hope that it will improve your operations experience, and that you will contribute back changes that make it even better.

Nagios Monitoring Performance

Eron
Eron wrote this on 15 comments

Since I joined 37signals, I have been working to improve our monitoring infrastructure. We use Nagios for the majority of our monitoring. Nagios is like an old Volvo – it might not be the prettiest or the fastest, but it’s easy to work on and it won’t leave you stranded.

To give you some context, in January 2009 we had 350 Nagios services. By September of 2010 that had grown to 797, and currently we are up to 7,566. In the process of growing that number, we have also drastically reduced the number of alerts that have escalated to page someone in the middle of the night. There have certainly been some bumps along the road to better monitoring, and in this post I hope to provide some insight into how we use Nagios and some helpful hints for folks out there who want to expand and improve their monitoring systems.

Continued…

New Basecamp: Available 99.99% of the time since launch

David
David wrote this on 17 comments

We launched the new Basecamp on March 6. Since then we’ve deployed 891 new versions with all sorts of new features, bug fixes, and tweaks. Through all of that we’ve had just six incidents of either scheduled or unscheduled downtime for a total of 19 minutes offline.

Today, that means we’ve been available 99.99% of the time since launch. That’s worth celebrating! Our fantastic operations teams consisting of Anton, Eron, John, Matt, Will, and Taylor have worked tirelessly to eliminate interruptions and they deserve our applaud.

Since we count “scheduled” downtime the same as “unscheduled” (have you ever met a customer who cared about the difference?), that has meant making good progress on stuff like database migrations.

In the past, when we focused mainly on unscheduled downtime as a measure of success, we wouldn’t think too much of taking a 30-minute window to push a major new feature. Not so these days. Thanks to Percona’s pt-online-schema-change, we’re able to migrate the database much easier without any downtime or master-slave swappero.

So three cheers to the four 9’s! Our next target is five 9’s, but that only allows for 5 minutes of downtime in a whole year, so we have our work cut out for us.

You can follow along and see how we’re doing on basecamp.com/uptime.

Behind the Scenes: The Hardware that Powers Basecamp, Campfire and Highrise

Taylor
Taylor wrote this on 20 comments

A few of us recently attended Velocity Conference in San Jose, CA. In the “hallway track sessions” a number people asked about the hardware that powers Basecamp, Campfire and Highrise.

Application Servers
All of our Ruby/Rails application roles run on Dell C Series 5220 servers. We chose C5220 servers because they provide high density, high performance, and low cost compute sleds at a decent cost point. The C5220 sleds replaced invidual Dell R710 servers which consumed a greater amount of power and rack space in addition to offering expandability we were not utilizing.

We use an 8 sled configuration with E31270 3.40GHz processors, 32/16G of ram, an LSI raid card and 2 non Dell SSDs. (For those of you thinking of ordering these … get the LSI raid card. The built in Intel raid is unreliable.) Each chassis with 8 sleds takes up 4U of rackspace: 3 for the chassis and 1 for cabling.

Job / Utility Servers
We use a combination of C6100 and C6220 servers to power our utility/jobs and API roles. We exclusively use the 4 sled version (of each) which means we get 4 “servers” in 2U. Each sled has 2x X5650 processors, 48-96G of ram, 2-6 ssds, and 4×1G or 1×10G network interfaces. This design allows to have up to 24 disks in a single chassis while consuming the same space as a single R710 server (which holds 8 disks max).

Search
For Solr we run R710s filled with SSDs. Each instance varies, but a common configuration is 2x E5530 processors, 48G of ram, 4-8 ssds, and 4×1g network interfaces. For Elastic Search we run a mix of Poweredge 2950 servers and C 5220 sleds with 12-16G of ram and 2×400G ssds in a raid 1.

Database and Memcache/Redis Servers
For Database roles we use R710s with 2x X5670 processors, 1.2TB Fusion-IO duo cards and varying amounts of memory. (Varies based on the database size.) We also have a number of older R710s powering Memcache and Redis instances. Each of these has has 2x E5530 processors and 2-4 disks with 4×1G network interfaces.

Storage
We have around 400TB / 9 nodes of Isilon 36 and 72NL storage. We serve all of the user uploaded content off this storage with backups to S3.

OS Choice
Database servers run RHEL or Centos 6 while application and utility servers run Ubuntu LTS.

Taking the Pain Out of MySQL Schema Changes

John
John wrote this on 19 comments

A common obstacle we face when releasing new features is making production schema changes in MySQL. Many new features require additional columns or indexes. Running an “ALTER TABLE” in MySQL to add the needed columns and indexes locks the table, hanging the application. We need a better solution.

Option 1: Schema Change in Downtime
This is the simplest option. Put the application into downtime and perform the schema change. It requires us to have the application down for the duration of the “ALTER TABLE”. We’ve successfully used this option for smaller tables that can be altered in seconds or minutes. However, for large tables the alter can take hours making it less than desirable.

Continued…

One month in Operations

Matthew
Matthew wrote this on 25 comments

I’ve been with 37signals as a remote system administrator for a month now and thought I would share some observations on Operations:


User Experience
I’ve been thoroughly impressed by how much everyone genuinely cares about the user experience with the applications they maintain. Everything from page response times increasing by a few milliseconds to minimizing interruptions during deployments or even the impact of the number of http redirects on load times – it’s all constantly being discussed and debated. This is often supplemented with hard data in logs and pretty graphs (thanks Noah!) from the multitude of resources made available internally to help diagnose issues.

It’s more than just observation and discussion, though – frequently these chats morph into immediate direct action in code or configuration changes.

Communication
37signals definitely embraces remote employees and treats them like any other. In my first month I’ve used Campfire, Basecamp, Jabber, Skype, Gmail, Github, Confluence, Google Docs, and more to stay connected and learn how some of the pieces fit together. Thanks to the resources made available from day one and the accessibility of the other Operations team members, I honestly haven’t felt left out in the slightest when working from home.

Accountability
Operations definitely sets the bar high for accountability to users and the rest of the company. Every issue I’ve seen raised by support, programmers, or other admins is looked into and taken as seriously as any other.

When larger production issues are dealt with, detailed postmortems are made available so the same mistakes aren’t repeated. These have been great for learning what not to do.

Monitoring
The amount and depth of monitoring at 37signals is impressive. They have thousands of checks for everything from hardware issues, application error rates, backups, and scheduled jobs. On top of this, we have remote monitoring and systems in place for notifying users of ongoing issues.

Overall I’ve been impressed with the high standard of work in the Operations group and I’ll be doing my best to live up to it.


It’s been an interesting and enjoyable first month with 37signals, largely thanks to everyone who’s made me feel so welcome and valued, especially Taylor, John, Eron, Will and Anton. Thank you!

Benchmarking Basecamp's uptime against five other web apps

David
David wrote this on 36 comments

As we announced at the beginning of the month, we’re always on a mission to improve our uptime. Inaccessible apps are the cause of much frustration and users don’t care whether that’s because they’re scheduled or not.

While publishing our own uptimes have been a great step towards getting everyone in the company focused on improving, we also wanted to compare ourselves to others in the industry. So since December 16, we’ve been tracking five other applications through Pingdom to compare and contrast.

The goal is to have the least amount of downtime and here are the results from the period December 16 to January 31:

  1. Github, down for 6 minutes
  2. Freshbooks, down for 14 minutes
  3. Basecamp, down for 16 minutes
  4. Campaign Monitor, down for 21 minutes
  5. Shopify, down for 1 hour and 53 minutes
  6. Assistly (now Desk), down for 6 hours and 46 minutes

Congratulations to Github for the number one spot on the list. We are definitely going to be gunning for them! We’ll publish another edition of this list in a month or so.

Basecamp Next's caching hardware

David
David wrote this on 69 comments

From the very start, we wanted Basecamp Next to be fast. Really, really fast. To do so we built a russian-doll architecture of nested caching that I’ll write up in detail soon. But for now I just wanted to share where all this caching is going to live as we just installed it at the hosting center.

It kinda reminds me of what pictures of a drug raid look like when they lay out all the coke and cash on the table, but this is what 864GB of RAM looks like:

Cost of the loot was $12,000.

I heard you like numbers...

Taylor
Taylor wrote this on 61 comments

In 2011

  • Our support team responded to 100,000 cases
  • Our syslog server logged an average of 1,500,000,000+ messages each day
  • Our solr indexer processed 428,000,000 indexing requests for Highrise alone
  • We hit our 100,000,000th person/company creation in Highrise
  • And a Basecamp user uploaded the 100,000,000th file (It was a picture of a cat!)

In the first 10 business days of 2012

  • We stored 100,000,000 unique statsd measurements
  • Our syslog server logged over 20,000,000,000 messages
  • Our applications sent more than 2,000 email notifications per minute
  • And Basecamp accepted an average of 75,000 file uploads from users per day

Interested in numbers other than revenue and profit? Leave a comment and we (mostly Noah) will dig them up for a future post.

Why Attacking Application Exceptions is Important

Taylor
Taylor wrote this on 14 comments

Background

Every week, a few of our programmers focus on responding to customer problems that might indicate a bigger issue with one of our applications. In addition, we’re constantly looking at issues (Rails exceptions, performance “hotspots”, etc.) that don’t bubble up to the customer level, but that are just “good housekeeping”. Taking care of these issues can have both real and measurable impacts — for example, we reduced application exceptions by 43% in December, allowing us to proactively fix even more problems before customers noticed.

Lets enter the house of Campfire, our real time chat product. Campfire uses memcache in a number of ways, one of which is to store fragments of messages that have been “spoken” in a room. When a user requests the transcript for that room we attempt to build a list of these messages from the cache before we hit the database. To do this, we build an array of message keys which are dispatched to a memcache-client multi_get request.

For a long time Campfire has generated exceptions showing multi_get failures, but the errors aren’t bringing down the site or causing users problems. When I first came to 37signals the exceptions were described as being caused by a bug in older versions of memcached (server), but no one knew “for sure”. Lacking any open issues or similar bug reports, as a work around our developers had patched the memcache-client to rescue ENOPIPE errors, raise by these failed get requests. The patch didn’t make the problem go away, but it kept it from being customer impacting…

Identifying the Problem

While testing a new branch of Campfire in staging, one of our developers was repeatedly encountering this error to the point where an exception was being raised for almost every page. Remember how I said these things pop up at the most inconvenient times? I decided to try and figure out these exceptions once and for all. (Ultimately, it turns out we hadn’t bundled the patched gem and that’s why the exception was being raised so often.)

After reviewing the code for memcache-client, I decided there wasn’t anything obviously wrong with its implementation for multi_get requests. To start my debugging, I fired up telnet, set two keys, and issued a “get keyone keytwo”. This worked fine. I had established that multi gets in the most basic form do work.

Next I used strace to attach to the unicorn (Ruby on Rails) process and see what the client was sending:

write(15, "get enter_messages/343555238/hide_author=false&hide_date=false&last_read=343841341 timestamp_messages/343575826/hide_author=false&hide_date=false&last_read=343841341 enter_messages/343575827/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343582929/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343582930/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343589192/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343589193/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343592436/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343592437/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343606646/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343606647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343654455/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343654456/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343663288/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663294/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663301/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663459/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663763/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663833/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343664243/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664244/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664249/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664928/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665538/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665661/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665730/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343668494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343668495/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343670646/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343670647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343674727/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343674728/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675113/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675330/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675512/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675662/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675783/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675850/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675926/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675962/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676003/hide_author=true&hide_date=true&last_read=343841341 text_messages/343676419/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676627/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343683775/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343683778/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343697190/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343697191/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698503/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698601/hide_author=true&hide_date=true&last_read=343841341 text_messages/343698676/hi", 4096) = 4096
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 525125})  = 1 (out [15], left {30, 525122})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(15, "de_author=false&hide_date=true&last_read=343841341 text_messages/343698734/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343699275/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699336/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699454/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343699907/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700196/hide_author=true&hide_date=true&last_read=343841341 text_messages/343700303/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343700490/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700491/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701033/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701249/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701495/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343701499/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701536/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701811/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343705859/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705860/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705927/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343707289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707290/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707541/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707657/hide_author=false&hide_date=true&last_read=343841341 text_messages/343708101/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343711297/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711298/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711359/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343714551/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714552/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714779/hide_author=false&hide_date=true&last_read=343841341 text_messages/343715190/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715303/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715380/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715452/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715558/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716023/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716257/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716298/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716338/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716577/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716745/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716816/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716900/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716999/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343717255/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717256/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717265/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717555/hide_author=true&hide_date=true&last_read=343841341 text_messages/343717960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718088/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718097/hide_author=false&hide_date=true&last_read=343841341 text_messages/34371819", 4096) = 4096
--- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
getrusage(RUSAGE_SELF, {ru_utime={4, 270000}, ru_stime={0, 190000}, ...}) = 0
timer_settime(0x14, 0, {it_interval={0, 12712000}, it_value={0, 12712000}}, NULL) = 0
rt_sigreturn(0x4b5412a22c0ef)           = 4096
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 517198})  = 1 (out [15], left {30, 517196})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(15, "2/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718248/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718670/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718779/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718955/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343725746/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725747/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725757/hide_author=true&hide_date=true&last_read=343841341 text_messages/343725762/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343729823/hide_author=false&hide_date=true&last_read=343841341 text_messages/343729824/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343758537/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343758538/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343764428/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764429/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764474/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764528/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764620/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343773529/hide_author=false&hide_date=true&last_read=343841341 text_messages/343773530/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343783468/hide_author=false&hide_date=true&last_read=343841341 text_messages/343783469/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343783615/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783650/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783875/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784004/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784277/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784372/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343784578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784579/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343802940/hide_author=false&hide_date=true&last_read=343841341 text_messages/343802941/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343802954/hide_author=true&hide_date=true&last_read=343841341 text_messages/343803300/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343812622/hide_author=false&hide_date=true&last_read=343841341 text_messages/343812623/hide_author=false&hide_date=true&last_read=343841341 text_messages/343813469/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343813500/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813695/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813765/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814066/hide_author=true&hide_date=true&last_read=343841341 enter_messages/343814107/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343814360/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814556/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343814680/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814681/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815559/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815889/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343815893/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815915/hide_author=true&hide_date=true&last_read=343841341 text_messages/343815953/hide_author=true&hide_date=true&last_read=343841341 text_messages/343816359/hide_author=false&hide_date=true&last_read=343841341 text_messages/343816657/hide_author=true&hide_date=true&last_read=3438413", 4096) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sendto(9, "<13>Dec 29 20:43:11 campfire[17363]: [37s] Generic failure: Errno::EPIPE: Broken pipe", 85, MSG_NOSIGNAL, NULL, 0) = 85
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 516190})  = 1 (out [15], left {30, 516188})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 516071})  = 1 (out [15], left {30, 516069})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(15, "get enter_messages/343555238/hide_author=false&hide_date=false&last_read=343841341 timestamp_messages/343575826/hide_author=false&hide_date=false&last_read=343841341 enter_messages/343575827/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343582929/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343582930/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343589192/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343589193/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343592436/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343592437/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343606646/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343606647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343654455/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343654456/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343663288/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663294/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663301/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663459/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663763/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663833/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343664243/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664244/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664249/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664928/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665538/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665661/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665730/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343668494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343668495/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343670646/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343670647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343674727/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343674728/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675113/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675330/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675512/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675662/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675783/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675850/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675926/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675962/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676003/hide_author=true&hide_date=true&last_read=343841341 text_messages/343676419/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676627/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343683775/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343683778/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343697190/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343697191/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698503/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698601/hide_author=true&hide_date=true&last_read=343841341 text_messages/343698676/hi", 4096) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sendto(9, "<13>Dec 29 20:43:11 campfire[17363]: [37s] Generic failure: Errno::EPIPE: Broken pipe", 85, MSG_NOSIGNAL, NULL, 0) = 85
close(15)                               = 0
munmap(0x7f7114948000, 4096)            = 0
close(15)                               = -1 EBADF (Bad file descriptor)

Here’s a basic run down of what you see above: The unicorn process (Rails) opens a socket to our memcache server in staging, and then issues a get command which is broken up into multiple 4096 byte packets. However, instead of receiving a reply from the server, our client socket gets a Broken Pipe error and it’s closed.

“OK…” I thought. I need to reference the memcache protocol specification and make sure this request conforms to the “get” protocol command. Was this request within in the protocol description? Yes.

At this point I knew the client was sending a legitimate request, so I needed to debug the server. Since our application uses multiple instances, I downed all but one instance, and fired up strace (strace -s 64000 -fp 7133).

[pid  7136] epoll_wait(19, {{EPOLLIN, {u32=32, u64=32}}}, 32, 4294967295) = 1
[pid  7136] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7136] read(32, "get enter_messages/343555238/hide_author=false&hide_date=false&last_read=343841341 timestamp_messages/343575826/hide_author=false&hide_date=false&last_read=343841341 enter_messages/343575827/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343582929/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343582930/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343589192/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343589193/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343592436/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343592437/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343606646/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343606647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343654455/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343654456/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343663288/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663294/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663301/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663459/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663763/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663833/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343664243/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664244/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664249/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664928/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665538/hide_author=false&hide_date=true&last_read=343841341", 2048) = 2048
[pid  7136] read(32, " text_messages/343665582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665661/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665730/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343668494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343668495/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343670646/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343670647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343674727/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343674728/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675113/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675330/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675512/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675662/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675783/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675850/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675926/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675962/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676003/hide_author=true&hide_date=true&last_read=343841341 text_messages/343676419/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676627/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343683775/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343683778/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343697190/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343697191/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698503/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698601/hide_author=true&hide_date=true&last_read=343841341 text_messages/343698676/hi", 2048) = 2048
[pid  7136] read(32, "de_author=false&hide_date=true&last_read=343841341 text_messages/343698734/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343699275/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699336/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699454/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343699907/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700196/hide_author=true&hide_date=true&last_read=343841341 text_messages/343700303/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343700490/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700491/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701033/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701249/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701495/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343701499/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701536/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701811/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343705859/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705860/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705927/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343707289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707290/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707541/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707657/hide_author=false&hide_date=true&last_read=343841341 text_messages/343708101/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343711297/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711298/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711359/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343714551/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714552/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714779/hide_author=false&hide_date=true&last_read=343841341 text_messages/343715190/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715303/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715380/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715452/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715558/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716023/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716257/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716298/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716338/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716577/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716745/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716816/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716900/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716999/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343717255/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717256/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717265/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717555/hide_author=true&hide_date=true&last_read=343841341 text_messages/343717960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718088/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718097/hide_author=false&hide_date=true&last_read=343841341 text_messages/34371819", 4096) = 4096
[pid  7136] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7136] read(32, "2/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718248/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718670/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718779/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718955/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343725746/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725747/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725757/hide_author=true&hide_date=true&last_read=343841341 text_messages/343725762/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343729823/hide_author=false&hide_date=true&last_read=343841341 text_messages/343729824/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343758537/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343758538/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343764428/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764429/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764474/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764528/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764620/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343773529/hide_author=false&hide_date=true&last_read=343841341 text_messages/343773530/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343783468/hide_author=false&hide_date=true&last_read=343841341 text_messages/343783469/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343783615/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783650/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783875/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784004/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784277/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784372/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343784578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784579/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343802940/hide_author=false&hide_date=true&last_read=343841341 text_messages/343802941/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343802954/hide_author=true&hide_date=true&last_read=343841341 text_messages/343803300/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343812622/hide_author=false&hide_date=true&last_read=343841341 text_messages/343812623/hide_author=false&hide_date=true&last_read=343841341 text_messages/343813469/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343813500/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813695/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813765/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814066/hide_author=true&hide_date=true&last_read=343841341 enter_messages/343814107/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343814360/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814556/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343814680/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814681/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815559/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815889/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343815893/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815915/hide_author=true&hide_date=true&last_read=343841341 text_messages/343815953/hide_author=true&hide_date=true&last_read=343841341 text_messages/343816359/hide_author=false&hide_date=true&last_read=343841341 text_messages/343816657/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343816830/hide_author=false&hide_date=true&last_read=343841341 text_messages/343817036/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343817042/hide_author=true&hide_date=true&last_read=343841341 text_messages/343817189/hide_author=false&hide_date=true&last_read=343841341 text_messages/343817229/hide_author=true&hide_date=true&last_read=343841341 text_messages/343817955/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343818321/hide_author=false&hide_date=true&last_read=343841341 text_messages/343818322/hide_author=false&hide_date=true&last_read=343841341 text_messages/343819079/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343825796/hide_author=false&hide_date=true&last_read=343841341 text_messages/343825797/hide_author=false&hide_date=true&last_read=343841341 text_messages/343825802/hide_author=true&hide_date=true&last_read=343841341 text_messages/343825809/hide_author=true&hide_date=true&last_read=343841341 text_messages/343826383/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343826403/hide_author=true&hide_date=true&last_read=343841341 text_messages/343826708/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343826722/hide_author=true&hide_date=true&last_read=343841341 text_messages/343827244/hide_author=true&hide_date=true&last_read=343841341 text_messages/343827298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343827299/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343830049/hide_author=false&hide_date=true&last_read=343841341 text_messages/343830050/hide_author=false&hide_date=true&last_read=343841341 text_messages/343830187/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343831000/hide_author=false&hide_date=true&last_read=343841341 text_messages/343831001/hide_author=false&hide_date=true&last_read=343841341 text_messages/343831123/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831211/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831487/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343831492/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831675/hide_author=false&hide_date=true&last_read=343841341 text_messages/343831725/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831788/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831979/hide_author=false&hide_date=true&last_read=343841341 text_messages/343833552/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343834848/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343834849/hide_author=false&hide_date=true&last_read=343841341 text_messages/343834896/hide_author=true&hide_date=true&last_read=343841341 text_messages/343835031/hide_author=false&hide_date=true&last_read=343841341 text_messages/343836063/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343840446/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343840447/hide_author=false&hide_date=true&last_read=343841341 text_messages/343840829/hide_author=false&hide_date=true&last_read=343841341 text_messages/343840885/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841050/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343841076/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841236/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841341/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841351/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841428/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841603/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841692/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841855/hide_author=true&hide_date=true&last_read=343841341", 8192) = 8192
[pid  7136] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7136] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7136] read(32, " text_messages/343841924/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841969/hide_author=false&hide_date=true&last_read=343841341 text_messages/343842042/hide_author=false&hide_date=true&last_read=343841341 text_messages/343842523/hide_author=true&hide_date=true&last_read=343841341 text_messages/343842910/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843039/hide_author=true&hide_date=true&last_read=343841341 text_messages/343843171/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843284/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343843427/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843428/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343843476/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843477/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843547/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843686/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343843878/hide_author=true&hide_date=true&last_read=343841341 text_messages/343843929/hide_author=false&hide_date=true&last_read=343841341 text_messages/343844070/hide_author=false&hide_date=true&last_read=343841341 text_messages/343844547/hide_author=false&hide_date=true&last_read=343841341 text_messages/343844842/hide_author=true&hide_date=true&last_read=343841341 text_messages/343844983/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343847757/hide_author=false&hide_date=true&last_read=343841341 text_messages/343847758/hide_author=false&hide_date=true&last_read=343841341 text_messages/343848887/hide_author=true&hide_date=true&last_read=343841341 text_messages/343849266/hide_author=false&hide_date=true&last_read=343841341 text_messages/343849330/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343850692/hide_author=false&hide_date=true&last_read=343841341 text_messages/343850693/hide_author=false&hide_date=true&last_read=343841341 text_messages/343850994/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343851014/hide_author=true&hide_date=true&last_read=343841341 text_messages/343851040/hide_author=false&hide_date=true&last_read=343841341 text_messages/343851100/hide_author=true&hide_date=true&last_read=343841341 text_messages/343851171/hide_author=false&hide_date=true&last_read=343841341 text_messages/343852048/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343852052/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852107/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852129/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852168/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852282/hide_author=false&hide_date=true&last_read=343841341 text_messages/343852487/hide_author=false&hide_date=true&last_read=343841341 text_messages/343852631/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852765/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343855843/hide_author=false&hide_date=true&last_read=343841341 text_messages/343855844/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343856603/hide_author=false&hide_date=true&last_read=343841341 text_messages/343856604/hide_author=false&hide_date=true&last_read=343841341 text_messages/343856802/hide_author=false&hide_date=true&last_read=343841341 text_messages/343856970/hide_author=false&hide_date=true&last_read=343841341 text_messages/343857838/hide_author=false&hide_date=true&last_read=343841341 text_messages/343858343/hide_author=false&hide_date=true&last_read=343841341 text_messages/343858380/hide_author=false&hide_date=true&last_read=343841341 text_messages/343858474/hide_author=true&hide_date=true&last_read=343841341 text_messages/343858544/hide_author=true&hide_date=true&last_read=343841341 text_messages/343858596/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343863819/hide_author=false&hide_date=true&last_read=343841341 text_messages/343863820/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343863844/hide_author=true&hide_date=true&last_read=343841341 text_messages/343864027/hide_author=true&hide_date=true&last_read=343841341 text_messages/343864037/hide_author=false&hide_date=true&last_read=343841341 text_messages/343864071/hide_author=false&hide_date=true&last_read=343841341 text_messages/343864588/hide_author=false&hide_date=true&last_read=343841341 text_messages/343864630/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343864755/hide_author=false&hide_date=true&last_read=343841341 text_messages/343865034/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343866397/hide_author=false&hide_date=true&last_read=343841341 text_messages/343866398/hide_author=false&hide_date=true&last_read=343841341 text_messages/343866591/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343866596/hide_author=true&hide_date=true&last_read=343841341 text_messages/343867002/hide_author=true&hide_date=true&last_read=343841341 text_messages/343867646/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868189/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868443/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868603/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868688/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343868946/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343868947/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343869494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343869517/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343875073/hide_author=false&hide_date=true&last_read=343841341 text_messages/343875074/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343875398/hide_author=true&hide_date=true&last_read=343841341 text_messages/343875689/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343877689/hide_author=false&hide_date=true&last_read=343841341 text_messages/343877690/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878129/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878477/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878485/hide_author=true&hide_date=true&last_read=343841341 text_messages/343878491/hide_author=true&hide_date=true&last_read=343841341 text_messages/343878499/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878588/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343880810/hide_author=false&hide_date=true&last_read=343841341 text_messages/343880811/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343880830/hide_author=true&hide_date=true&last_read=343841341 text_messages/343880919/hide_author=false&hide_date=true&last_read=343841341 text_messages/343880994/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343885259/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885260/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885374/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885436/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885452/hide_author=true&hide_date=true&last_read=343841341 enter_messages/343885679/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885711/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885723/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885744/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885878/hide_author=true&hide_date=true&last_read=343841341 text_messages/343886065/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343886343/hide_author=false&hide_date=true&last_read=343841341 text_messages/343886344/hide_author=false&hide_date=true&last_read=343841341 text_messages/343888444/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343890769/hide_author=false&hide_date=true&last_read=343841341 text_messages/343890770/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343894891/hide_author=false&hide_date=true&last_read=343841341 text_messages/343894892/hide_author=false&hide_date=true&last_read=343841341 text_messages/343895056/hide_author=false&hide_date=true&last_read=343841341 text_messages/343895130/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895152/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895269/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895393/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895542/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343902958/hide_author=false&hide_date=true&last_read=343841341 text_messages/343902959/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903053/hide_author=true&hide_date=true&last_read=343841341 text_messages/343903267/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903512/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903628/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903725/hide_author=true&hide_date=true&last_read=343841341 text_messages/343903864/hide_author=true&hide_date=true&last_read=343841341 text_messages/343903882/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903968/hide_author=false&hide_date=true&last_read=343841341 text_messages/343904055/hide_author=false&hide_date=true&last_read=343841341 text_messages/343904090/hide_author=true&hide_date=true&last_read=343841341 text_messages/343904553/hide_author=false&hide_date=true&last_read=343841341 text_messages/343904624/hide_author=true&hide_date=true&last_read=343841341 text_messages/343904678/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343908400/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908401/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908458/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908510/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908566/hide_author=true&hide_date=true&last_read=343841341 text_messages/343908629/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343910155/hide_author=false&hide_date=true&last_read=343841341 text_messages/343910156/hide_author=false&hide_date=true&last_read=343841341 text_messages/343910180/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910261/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910355/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910536/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910565/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343912945/hide_author=false&hide_date=true&last_read=343841341 text_messages/343912946/hide_author=false&hide_date=true&last_read=343841341 text_messages/343912955/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343917638/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343917639/hide_author=false&hide_date=true&last_read=343841341 text_messages/343919040/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343928070/hide_author=false&hide_date=true&last_read=343841341 text_messages/343928071/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343944056/hide_author=false&hide_date=true&last_read=343841341 text_messages/343944057/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343946376/hide_author=false&hide_date=true&last_read=343841341 text_messages/343946377/hide_author=false&hide_date=true&last_read=343841341 text_messages/343946786/hide_author=true&hide_date=true&last_read=343841341 text_messages/343947443/hide_author=true&hide_date=true&last_read=343841341 text_messages/343948263/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343951411/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343951412/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343958275/hide_author=false&hide_date=true&last_read=343841341 text_messages/343958276/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343958310/hide_author=true&hide_date=true&last_read=343841341 text_messages/343959039/hide_author=false&hide_date=true&last_read=343841341 text_messages/343959484/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343961703/hide_author=false&hide_date=true&last_read=343841341 text_messages/343961704/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343972558/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343972559/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343976120/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343976121/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343982780/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343982781/hide_author=false&hide_date=true&last_read=343841341 text_messages/343982834/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343984059/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343984060/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343987418/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343987419/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343997513/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343997514/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343997666/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344008833/hide_author=false&hide_date=true&last_read=343841341 enter_messages/344008834/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344009870/hide_author=false&hide_date=true&last_read=343841341 text_messages/344009871/hide_author=false&hide_date=true&last_read=343841341 text_messages/344010043/hide_author=false&hide_date=true&last_read=343841341 upload_messages/344010066/hide_author=true&hide_date=true&last_read=343841341 upload_messages/344010083/hide_author=true&hide_date=true&last_read=343841341 text_messages/344010323/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344015438/hide_author=false&hide_date=true&last_read=343841341 kick_messages/344015439/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344018437/hide_author=false&hide_date=true&last_read=343841341 kick_messages/344018438/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344033626/hide_author=false&hide_date=true&last_read=343841341 enter_messages/344033627/hide_author=false&hide_date=true&last_read=343841341 text_messages/344033767/hide_author=false&hide_date=true&last_read=343841341 text_messages/344033769/hide_author=true&hide_date=true&last_read=343841341 text_messages/344033772/hide_author=true&hide_date=true&last_read=343841341 text_messages/344033774/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344041143/hide_author=false&hide_date=true&last_read=343841341 upload_messages/344041144/hide_author=false&hide_date=true&last_read=343841341 text_messages/344041191/hide_author=true&hide_date=true&last_read=343841341 text_messages/344041227/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344041414/hide_author=false&hide_date=true&last_read=343841341 text_messages/344041415/hide_author=false&hide_date=true&last_read=343841341 text_messages/344041637/hide_author=true&hide_date=true&last_read=343841341 text_messages/344041654/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344052470/hide_author=false&hide_date=true", 16384) = 16384
[pid  7136] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7136] --- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
[pid  7136] getrusage(RUSAGE_SELF, {ru_utime={12, 90000}, ru_stime={13, 600000}, ...}) = 0
[pid  7136] rt_sigreturn(0x4b54187fd1996) = 0
[pid  7136] epoll_ctl(19, EPOLL_CTL_DEL, 32, {EPOLLIN, {u32=32, u64=32}}) = 0
[pid  7136] close(32)                   = 0
[pid  7136] listen(28, 1024)            = 0
[pid  7136] listen(27, 1024)            = 0
[pid  7136] brk(0)                      = 0x1c52000
[pid  7136] brk(0)                      = 0x1c52000
[pid  7136] brk(0)                      = 0x1c52000
[pid  7136] brk(0x1c4b000)              = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0x1c49000)              = 0x1c49000
[pid  7136] brk(0)                      = 0x1c49000
[pid  7136] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7136] epoll_wait(19,  <unfinished ...>
[pid  7133] <... epoll_wait resumed> {}, 32, 85) = 0
[pid  7133] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7133] --- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
[pid  7133] getrusage(RUSAGE_SELF, {ru_utime={12, 90000}, ru_stime={13, 600000}, ...}) = 0
[pid  7133] timer_settime(0x104, 0, {it_interval={0, 12812000}, it_value={0, 12812000}}, NULL) = 0
[pid  7133] rt_sigreturn(0x4b54187fe0371) = 0
[pid  7133] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7133] epoll_wait(3, {}, 32, 1)    = 0
[pid  7133] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7133] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7133] epoll_wait(3, {}, 32, 1000) = 0
[pid  7133] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7133] --- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
[pid  7133] getrusage(RUSAGE_SELF, {ru_utime={12, 90000}, ru_stime={13, 600000}, ...}) = 0
[pid  7133] timer_settime(0x104, 0, {it_interval={0, 12787000}, it_value={0, 12787000}}, NULL) = 0
[pid  7133] rt_sigreturn(0x4b541880d6c8e) = 0

(Strace Tip: I used -f to follow the forked processes, and -s 64000 to make sure I got enough string output to see the full read/write data sent to the socket.)

You can see that the memcache server reads the first 2048 bytes, then reads another 2048 bytes, then reads another 4096 bytes, then reads 8192 bytes, and finally reads another 16384 bytes. So after the first read, it doubles the amount of data it’s reading for each iteration. However, after the fourth read, the epoll call returns 0, and the client’s connection is closed.

Identifying the Root Cause

In this case, I knew Jeremy had worked on this bug before and I thought he would understand the all the data I had gathered. So with Jeremy’s help, I started looking at the memcache code on Github. Within minutes Jeremy pinged me saying he had noticed this code that’s designed to nuke junk connections. Which makes sense, but also doesn’t help us here!

This code path does exactly what I had seen in the strace above: It reads from buffer 4 times, while doubling the amount read each time. After 4 times, it stops reading and returns.

Conclusion and Resolution

Although memcache’s behavior is by design, it’s not compatible with the expected behavior of the protocol. In the issue I opened, I recommended a configuration option to set the amount of data to read, or the number of times to attempt reading more data. I’m not sure if this is a realistic solution though.

Hypothetically we could work around this by increasing the initial buffer size and recompiling memcache so that in 4 iterations the server would receive all of the get command. We could also patch the memcache-client multi_get method to break the multi_get requests into multiple requests with fewer keys. For now we’ve left our patched client in place, with hope that the memcache team will be able to get this resolved soon.

As I said previously, the goal is to identify a root cause. And, with Jeremy’s help, I’ve done just that. It feels great to have a complete understanding of this exception, and where the break down is in our stack.

We now know, we don’t have to be afraid of upgrading memcache with a fear of causing more multi_get problems. We also know that this is a legitimate problem that’s causing a significant performance penalty in our application. (Since large multi_gets fail, that means we aren’t getting the data back from memcache and hitting the database more than we need to.) Finally, for me, it shows that I don’t have to be bothered by debugging with strace. I’m no C code expert, but even with my limited understanding, strace provided enough clues for me to follow this all the way to the source in less than one hour. And since you all have an extra hour to spare, I hope this encourages you to take a look at that problem that’s been lingering in your stack.