Options, Not Roadmaps

Since Shape Up came out, many people asked some version of this question:

I understand you make bets six weeks at a time. But how do you plan in the longer term? Don’t you have some kind of a roadmap?

The short answer is: no. We don’t have roadmaps. We think about what to do at the timescale larger than single bets, but we do it in a different way.

Why not a roadmap?

No matter how you try to hedge it, a roadmap communicates a plan—a series of commitments—to other people.

We of course have lots of ideas about what we’re going to do next. But we don’t want to make any commitments, internally or externally. This is very important.

What’s wrong with committing to future work?

First, there’s the uncertainty. We don’t have a crystal ball. Say we have features A, B, and C we want to build. We don’t know if A is going to work out as planned, and what that would mean for B. We don’t know if we’ll have a eureka in the bathtub one day, and new idea X will suddenly feel much more important than B or C. Or we might start building out B, only to realize that it’s not what we want or it’s harder than we thought and want to bail on it.

In short, we don’t know enough to make good on any promises.

Second, there are expectations. Leadership might be ok with changing course, but what about everyone who saw the roadmap and was eagerly awaiting feature C? What about the conversations with customers where someone in the company assured them to just hold tight because C is coming? Despite our best intentions, if we say we’re going to do something, it’s going to be really hard to back out of that, both internally and externally.

Third, there’s the guilt. Yeah, guilt. Have you ever looked at a long list of things you said were you going to do but haven’t gotten around to yet? How does that list make you feel? The realities of life and uncertainty show us that 100% of the things on the roadmap are not going to happen on time the way we imagine. And meanwhile, the world is not going to stop and wait for us to finish the old list. New ideas are constantly coming up. New requests and new problems constantly arise. If we hold ourselves completely to the roadmap, we’ll have to say no to new important things we actually want to do. And if we interrupt the roadmap to do those new important things, we’ll have to push back other things we promised. And that won’t feel good.

Our solution was to stop making commitments and start collecting options.

A portfolio of options

An option is something you can do but don’t have to do. All our product ideas are exactly that: options we may exercise in some future cycle—or never.

Without a roadmap, without a stated plan, we can completely change course without paying a penalty. We don’t set any expectations internally or externally that these things are actually going to happen.

That means no explicit promises and no implicit promises either. A list on the wall or in some official strategy document is an implicit promise: “this is what we’re doing next.“ There is no official list of what we’re doing next anywhere in the company.

When Jason (CEO) and David (CTO) decided the company would spend X cycles building out HEY, they didn’t have a roadmap. They had what they thought was a good set of options. There were enough good ideas for how to flesh out the app that they felt confident saying “we’ll dedicate X cycles to this.” They decided on which actual ideas to build one cycle at a time.

The overwhelming majority of our good ideas have never been built and never will be. There are things we have badly wanted to build for years that still don’t exist. Why? Something always came up. And that’s ok!

Because we aren’t committing to a roadmap, we aren’t setting expectations. And because we don’t set expectations, we don’t feel guilty when that great idea never gets any build time because we decided something else was more important.

Inside a CODE RED: Network Edition

I wanted to follow up to Jeremy’s post about our recent outages with a deeper, more personal look behind the scenes. We call our major incident response efforts “CODE REDs” to signify that it is an all-hands-on-deck event and this definitely qualified. I want to go beyond the summary and help you see how an event like this unfolds over time. This post is meant for both people who want a deeper, technical understanding of the outage, as well as some insight into the human side of incident management at Basecamp.

Keep reading “Inside a CODE RED: Network Edition”

Three Basecamp outages. One week. What happened?

Basecamp has suffered through three serious outages in the last week, on Friday, August 28th, on Tuesday, September 1, and again today. It’s embarrassing, and we’re deeply sorry.

This is more than a blip or two. Basecamp has been down during the middle of your day. We know these outages have really caused issues for you and your work. We’ve put you in the position of explaining Basecamp’s reliability to your customers and clients, too.

We’ve been leaning on your goodwill and we’re all out of it.

Here’s what has happened, what we’re doing to recover from these outages, and our plan to get Basecamp reliability back on track.

What happened

Friday, August 28

  • What you saw: Basecamp 3 Campfire chat rooms and Pings stopped loading. You couldn’t chat with each other or your teams for 40 minutes, from to 12:15pm to 12:55pm Central Time (17:1517:55 UTC). Incident timeline.
  • What we saw: We have two independent, redundant network links that connect our two redundant datacenters. The fiber optic line carrying one of the network links was cut in a construction incident. No problem, right? We have a redundant link! Not today. Due to a surprise interdependency between our network providers, we lost the redundant link as well, resulting in a brief disconnect between our datacenters. This led to a failure in our cross-datacenter Redis replication when we exceeded the maximum replication buffer size, triggering a catastrophic replication resync loop that overloaded the primary Redis server, causing very slow responses. This took Basecamp 3 Campfire chats and Pings out of commission.

Tuesday, September 1

  • What you saw: You couldn’t load Basecamp at all for 17 minutes, from 9:51am to 10:08am Central Time (14:5115:08 UTC). Nothing seemed to work. When Basecamp came back online, everything seemed back to normal. Incident timeline.
  • What we saw: Same deal, with a new twist. Our network links went offline, taking down Basecamp 3 Campfire chats and Pings again. While recovering from this, one of our load balancers (a hardware device that directs Internet traffic to Basecamp servers) crashed. A standby load balancer picked up operations immediately, but that triggered a third issue: our network routers failed to automatically synchronize with the new load balancer. That required manual intervention, extending the outage.

Wednesday, September 2

  • What you saw: You couldn’t load Basecamp for 15 minutes, from 10:50am to 11:05am Central Time (15:5016:05 UTC). When Basecamp came back online, chat messages felt slow and sluggish for hours afterward. Incident timeline.
  • What we saw: Earlier in the morning, the primary load balancer in our Virginia datacenter crashed again. Failover to its secondary load balancer proceeded as expected. Later that morning, the secondary load balancer also crashed and failed back to the former primary. This led to the same desynchronization issue from yesterday, which again required manual intervention to fix.

All told, we’ve tickled three obscure, tricky issues in a 5-day span that led to overlapping, interrelated failure modes. These woes are what we plan for. We detect and avert these sorts of technical issues daily, so this was a stark wake-up call: why not today? We’re working to learn why.

What we’re doing to recover from these outages

We’re working multiple options in parallel to recover and manage any contingencies in case our recovery plans fall through.

  1. We’re getting to the bottom of the load balancer crash with our vendor. We have a preliminary assessment and bugfix.
  2. We’re replacing our hardware load balancers. We’ve been pushing them hard. Traffic overload is a driving factor in one outage.
  3. We’re rerouting our redundant cross-datacenter network paths to ensure proper circuit diversity, eliminating the surprise interdependency between our network providers.
  4. As a contingency, we’re evaluating moving from hardware to software load balancers to decrease provisioning time. When a hardware device has an issue, we’re days out from a replacement. New software can be deployed in minutes.
  5. As a contingency, we’re evaluating decentralizing our load balancer architecture to limit the impact of any one failure.

What we’re doing to get our reliability back on track

We engineer our systems with multiple levels of redundancy & resilience precisely to avoid disasters like this one, including practicing our response to catastrophic failures within our live systems.

We didn’t catch these specific incidents. We don’t expect to catch them all! But what catches us by surprise are cascading failures that expose unexpected fragility and difficult paths to recovery. These, we can prepare for.

We’ll be assessing our systems for resilience, fragility, and risk, and we’ll review our assessment process itself. We’ll share what we learn and the steps we take with you.

We’re sorry. We’re making it right.

We’re really sorry for the repeated disruption this week. One thing after another. There’s nothing like trying to get your own work done and your computer glitching out you or just not cooperating. This one’s on us. We’ll make it right.

We really appreciate all your understanding and patience you’ve shown us. We’ll do our best to earn back the credibility and goodwill you’ve extended to us as we get Basecamp back to rock-solid reliability. Expect Basecamp to be up 24/7.

As always, you can follow along with live updates about Basecamp status here and follow the play-by-play on Twitter, and get in touch with our support team anytime.

How Basecamp Became a 100% Remote Company

Moving is never fun. It’s bad enough when it’s your stuff, but ten years of stuff at an office you only spent two years in can be daunting! I’m Navid, and part of my job at Basecamp the last two years has been taking care of our office in Chicago. As folks outside of Basecamp learned of our impending office closure, I began to get some questions. The most common being “what did you do with the stuff? What about mail and important documents?” Of course we had to work out some logistical puzzles to keep things running smoothly. Here’s how we used Basecamp and a new service to bid adieu to our office, to make my job remote, and to become a 100% remote company.

We didn’t close down our office because of COVID-19, though it certainly factored in the decision. Basecamp has always been remote. Remote is Basecamp. We wrote the book on it, literally. Our lease was due to expire, and it just didn’t seem worth it to keep it going at the new price. We’d outgrown it as a space for meetups, and it was always too big for the number of Basecampers that reside in Chicago. On a busy day there’d be six people working from the office.

On the other hand, having an office afforded us the standard ways of handling a lot of day-to-day business items. Mail, packages, meetings, storage. It was simple, easy, and the path that most of the world has taken. Losing the office and going 100% remote would take us further down a path less travelled.

Once we came to terms with leaving the office, I got to work on figuring out what to do to meet this goal. I won’t bore you with the minute details that are common to every move, you want to know what we are doing now. How we got to 100% remote. The biggest hurdles to jump were: 1. Primary business address (as most government agencies require a physical address), 2. How to handle the mail/packages, and 3. How to manage key document storage. 

I looked at a few options for our business address and for mail/packages. When the pandemic started, we re-routed our mail to a PO Box near my home. This eliminated the need for me to take public transit or a ride-share to check the mail. The PO Box would’ve been a great long-term solution if it weren’t for two things. 1) We need a new business address, and 2) it still ties me to Chicago.

I also looked at a UPS Store Mailbox. UPS is a great service! You can use it as your business address and they receive your mail and packages, then forward it at your request to anywhere you want. The drag on this is that all the mail will be bundled and shipped, creating further delays in getting the items. So if there is any urgency, you’ll need to get to the mailbox yourself.

In the midst of all of this, someone from Earth Class Mail (ECM) reached out to David via Twitter. ECM, like UPS, offers a business address and they receive your mail and packages. The main, and biggest, difference is that they scan all of your mail for you to review online. If you need any originals, they ship it to you. They also deposit checks for you via overnight shipping to your bank.

Of course, I opted for ECM in the end. They tick all the boxes to make Basecamp 100% remote, and they meet needs we hadn’t considered, like the check deposits. In the first few weeks, I have only tested the mail scanning service, which is working great. I’m looking forward to seeing how mail/package forwarding and check deposits go.

Another question I’ve answered recently is how we handle document retention. I’m definitely not holding onto these items in my home. We use Basecamp! Not long after I started here I began saving digital copies of everything important to Basecamp. I save each document in Basecamp with a name, the amount, and any relevant notes. Keeping only digital copies of invoices, checks, and tax paperwork saves on office space, a luxury we no longer have, and more importantly the documents are secure, searchable, and accessible to anyone who needs them.

When I’m not sure, I check in with our accountants about anything we should keep hard copies of. If there is any chance we would need an original paper copy, we keep it. At the moment we don’t have a permanent solution for these instances (honestly, it isn’t much), so they are locked up in storage. The goal will be to eventually not need a storage space.

That covers how we are remote now! Did I miss anything? Feel free to leave a comment.

We’re hiring Rails programmers

We have two rare openings on our Core Product team for Rails programmers. We’ll be accepting applications for the next two weeks, aiming for a flexible start date in October.

We strongly encourage candidates of all different backgrounds and identities to apply. This is an opportunity for us to bring in a different perspective and we’re eager to further diversify our company. Basecamp is committed to building an inclusive, supportive place for you to do the best work of your career. We aren’t looking for ideological clones, but for people who share our beliefs about writing software well.

Keep reading “We’re hiring Rails programmers”

Remote work is a platform

Back in the mid-90s, just as Netscape Navigator was giving us our first look at what the visual internet could be, web design came in two flavors.

There was the ultra basic stuff. Text on a page, maybe a masthead graphic of some sort. Nothing sophisticated. It often looked like traditional letterhead, or a printed newsletter, but now on the screen. Interactions were few, if any, but perhaps a couple links tied a nascent site together.

And there was the other extreme. Highly stylized, lots of textures, 3D-style buttons, page curls, aggressive shadows, monolithic graphics cut up with image maps to allow you to click on different parts of a single graphic, etc. This style was aped from interactive CD/DVD interfaces that came before it.

Both of these styles — the masthead with text, and the heavily graphical — were ports. Not adaptations, but ports. Designs ported from one medium to another. No one knew what to make of the web at that time, so we pulled over things we were familiar with and sunk them in place. At that time, Web design wasn’t web design – it was print design, multimedia/interactive design, and graphic design. It took years for native web design to come into its own.

The web became great when designers started designing for the web, not bringing other designs to the web.

Porting things between platforms is common, especially when the new thing is truly brand new (or trying to gain traction). As the Mac gained steam in the late 80s and early 90s, and Windows 3 came out in 1990, a large numbers of Windows/PC developers began to port their software to the Mac. They didn’t write Mac software, they ported Windows software. And you could tell – it was pretty shit. It was nice to have at a time when the Mac wasn’t widely developed, but, it was clearly ported.

When something’s ported, it’s obvious.

Stuff that’s ported lacks the native sensibilities of the receiving platform. It doesn’t celebrate the advantages, it only meets the lowest possible bar. Everyone knows it. Sometimes we’re simply glad to have it because it’s either that or nothing, but there’s rarely a ringing endorsement of something that’s so obviously moved from A to B without consideration for what makes B, B.

What we’re seeing today is history repeat itself. This time we’re not talking about porting software or technology, we’re talking about porting a way to work.

In-person office work is a platform. It has its own advantages and disadvantages. Some things are easier in person (meetings, if you’re into those), and some things are harder (getting a few hours to yourself so you can focus, if you’re into that).

Remote work is another platform. It has its own unique flavor, advantages, and disadvantages. Its own efficiencies, its own quirks, its own interface. Upsides, downsides, insides, and outsides. It’s as different from in-office work as the Mac is from Windows. Yes, they’re both operating systems, and methods of computing, but they’re miles apart where it matters. The same is true for the difference between in-office work and remote work. Yup, it’s all still the same work, but it’s a different way to work.

In-office and remote work are different platforms of work. And right now, what we’re seeing a lot of companies attempt to port local work methods to working remotely. Normally have four meetings a day in person? Then let’s have those same four meetings, with those same participants, over Zoom instead. It’s a way, but it’s the wrong way.

Simulating in-person office work remotely does both approaches a disservice.

This is often what happens when change is abrupt. We bring what we know from one to the other. We apply what we’re familiar with to the unfamiliar. But, in time, we recognize that doesn’t work.

The enlightened companies coming out of this pandemic will be the ones that figured out the right way to work remotely. They’ll have stopped trying to make remote look like local. They’ll have discovered that remote work means more autonomy, more trust, more uninterrupted stretches of time, smaller teams, more independent, concurrent work (and less dependent, sequenced work).

They won’t be the ones that just have their waste-of-time meetings online, they’ll be the ones that lay waste to the meetings. They won’t be the ones that depend on checking in on people constantly throughout the day, they’ll be the ones that give their employees time and space to do their best work. They won’t be the ones that can’t wait to pull everyone back to the office, they’ll be the ones that spot the advantages of optionality, and recognize a wonderful resilience in being able to work from anywhere.

And they’ll be the ones that finally realize that there’s nothing magical about the office. It’s just a space where work can happen, but not where it must happen. Anytime a myth is busted is a good time.

Work remotely, don’t port the office.

Spy pixels are evolving like malware, so HEY’s adapting

We knew that spy-pixel pushers might go down the rabbit hole of escalation once we gave HEY users the power to defend themselves. Just like virus and malware makers are constantly trying to defeat anti-virus and other security protections. But I guess we didn’t realize just how quickly it would happen!

Enter GMass, a plugin for Gmail that adds spy-pixel tracking, amongst a grab bag of other stuff. They hadn’t been on our original list of 50+ services we name’n’shame, but thanks to a new blog post where they brag about defeating protections that recipients might take to defend themselves, they came onto our radar.

This lead to an in-depth investigation into how their latest techniques work, and we spent the whole day coming up with a new process of detecting GMass’ spy pixels. It just shipped! And now HEY will name’n’shame GMass, just like we do the other fifty-odd pushers of this kind of surveillance.

Of course, like those virus and malware makers, GMass may try to defeat our protections again. And we’ll then have to adapt once more, and so we will. Internet security is a constantly moving target. But we can hope that Google will soon stop being a conduit for this kind of privacy abuse on the Gmail platform. Just like they don’t tolerate being used for spamming or phishing.

In the mean time, we’ll continue to do the work both on a general level to protect against all forms of privacy attacks against HEY users, but also specifically to identity bad actors, and to call out the users who employ their software for spying.

Have a surveillance-free Friday!

On Apple’s monopoly power to destroy HEY

This statement was delivered to the democratic side of the House Antitrust Subcommittee upon invitation on July 17, 2020 in the committee’s preparation for the forthcoming July 27, 2020 showdown with the Big Tech CEOs.

Two years ago, we got the audacious idea to take on Google, Microsoft, and Verizon to provide a new, fresh alternative to their email services. It’s been about 16 years since people were last excited about getting a new email address – Gmail was introduced in 2004 – and frankly, it shows. These legacy services have been virtually devoid of innovation for over a decade.

And why wouldn’t they be! They’ve already captured the market. Between Gmail, owned by Google, Hotmail/Outlook, owned by Microsoft, and AOL Mail/Yahoo Mail, owned by Verizon, you have about 85% of the US email market captured by three players. And out of that, Gmail alone is about 55 percentage points.

But! Despite this near-total capture by big tech, the underlying protocol of email is still just barely free and open. You don’t have to ask anyone’s permission to make a new email service. (Or so we thought.)

Fast forward millions of dollars in investment to June 15, 2020, which was the day we opened the doors to our new email service, and were met by every entrepreneur’s dream: amazing reviews and customers tripping over themselves to signup and pay for our product.

Keep reading “On Apple’s monopoly power to destroy HEY”

Basecamp’s Ops Team is Hiring

Basecamp is hiring three new System Administrators for our Operations team to help us deliver fast and reliable applications, like Basecamp and our new email service HEY. Our infrastructure exists both in colocated data centers and in the cloud, and you’ll be working alongside our existing team of Blake, Eron, John, Matt, Matthew, Nathan, and Troy.

As you might gather from the names, our operations team today is not nearly as diverse as we’d like it to be, or as the rest of the company. We therefore strongly encourage candidates of all different backgrounds to apply. Basecamp is committed to building an inclusive, supportive place for you to do the best and most rewarding work of your career. We are an equal-opportunity employer and are committed to building a company that embraces and celebrates diversity and inclusion.

Keep reading “Basecamp’s Ops Team is Hiring”