Scheduled maintenance notice Jason 22 Jun 2006

64 comments Latest by Neesh

This blog, our public sites, and our apps will be offline to move our server cluster to a new data center Sunday, June 25th starting around 10am Central Time for roughly 5 hours.

According to our current managed hosting provider, “one of each type of server will be moved first, mounted, cabled, verified for integrity, then the next one of each server type, etc, eg, phase one is one web server, one app server, one db server, one file server. The disk arrays will be disassembled, hard drives stored in cushioned transport cases. There is a duplicate of all physical hardware available on site in case of physical accident.”

This is step one of a two-step process to provide additional expansion capabilities, higher performance, and more reliability for our hardware cluster. Thanks for your patience during the move.

64 comments so far (Jump to latest)

Jeff Kramer 22 Jun 06

Is this going to be a fieldtrip for RailsConf? :)

Don Schenck 22 Jun 06

Translation: “Our sale to Yahoo will be completed”.

That is all.

Eamon 22 Jun 06

Where are you moving to?

Ara Pehlivanian 22 Jun 06

Uhm, stupid question, but if there’s a duplicate of everything on site, why not just set it all up in the new location, mirror everything over to the new site, repoint your DNS and then move the old hardware in peace with no downtime?

Luis 22 Jun 06

Man, I use this thing daily. I don’t know if I can handle a 5 hour downtime. What will I do? Someone, please help!

matt 22 Jun 06

I’d be curious to hear why you’re moving from Tilted and where to…

Baeck 22 Jun 06

Don’t worry… they will be issuing a 7 cent credit to Plus subscribers to compensate for the down time ;-)

Edward 22 Jun 06

@ Ara Pehlivanian:
repoint your DNS

That can take 24-48 hours, depending on a whole lot of things. Then having to repoint it back, well, you’d end up having larger downtime.

JF 22 Jun 06

Translation: �Our sale to Yahoo will be completed�.

Funny!

This particular downtime isn’t about a move from Tilted. Tilted is moving their data center and we are moving with them.

Randy J. Hunt 22 Jun 06

No doggin’ here. Thanks for the notice and keeping your clients in the loop.

Brandon 22 Jun 06

I really enjoyed the comment about Yahoo! That made my day!

Honestly, if you guys sold out to some big corporation I’d probably cancel my plan. I like you guys and your simple products (and simple company) - there’s no way a big company could gobble you up and not complicate things.

Anyway, looking forward to the vacation. I think I’ll just hit the golf course and set my Servers Alive program to ping you every minute or two to see when it comes back up.

I need a break anyway! :)

Hope the move goes smoothly for you guys and everyone at Tilted.

Ara Pehlivanian 22 Jun 06

@Edward:
Wouldn’t it just make sense to have the two sites run in parallel for 24-48 hrs while only updating the new one (ppl who don’t have the latest DNS info just see the old one for a day) and then shut everything down and leave the DNS pointed that way? Why bother switching back?

Historically I’ve always been told that the reason why this seemless procedure can’t be done is due to a lack of identical equipment at the destination. But since they have the stuff already…

Matt 22 Jun 06

@Ara:

Then you have the problem of data changing. If I update something in Basecamp, and my computer still sees the old hardware, it will make the change there. Then, when my computer sees the DNS switch, and I logon to basecamp, I’ll see the data from the time they mirrored, not the updated one.

-Matt

Chris 22 Jun 06

Ara, the reason you don’t want to leave the old one up while moving is because people would be entering data on the old server, and then it wouldn’t be there the next day.

However, you could do a temporary redirect to the new ip address from the old servers while the dns changes.

Matt 22 Jun 06

Just to add to my comment, even if they kept mirroring you’d have the problem, because different people will see different DNS settings at the same time, and so will be concurrently updating the same thing with different info. That’d make syncing a nightmare.

-Matt

Chris 22 Jun 06

What Matt said - I guess he posted at the same time.

JF 22 Jun 06

Ara, the individual replacement hardware is there *if* it’s necessary, but it’s not all configured as a cluster so you could just flip a switch and turn it on and it would work. There’s a *lot* more to it than that.

brad 22 Jun 06

Hmpf, my site is hosted by Tilted too (I chose them because I figured you guys had done your homework and if they were good enough for you…) but I haven’t received any notification that my site would be down. Is this something that’s going to affect all of their customers or just some?

Travis 22 Jun 06

Does this mean that Tilted is doing all of this and setting up the servers for you or does 37signals have to do that?

JF 22 Jun 06

Is this something that�s going to affect all of their customers or just some?

Why don’t you ask them?

Don Wilson 22 Jun 06

Roughly how many servers of each type (web, db, file) do you have of each?

Russ 22 Jun 06

Personally, I love the idea that we were given only a 3-day notice on something that they must have known now for some time.

It’s not like Tilted just decided yesterday to move their data center.

Russ 22 Jun 06

I forget to mention in my previous post …

I agree with those posting above asking 37signals to please look into seeing if it’s possible to use the back-up systems during the move to minimize or even eliminate any down time.

brad 22 Jun 06

One of the reasons I’ve stayed with Tilted despite cheaper options elsewhere was just demonstrated to me a minute ago: they CALLED me after reading my comment above to explain about the server move (I’m on a shared hosting plan rather than a dedicated server, plus the move is happening in the middle of the night, so it’s not a big deal). Now really, how many hosting services would do that?

Ben Rometsch 22 Jun 06

Why do you need to wait for the DNS to propagate? Surely you just follow these steps:

Create new hardware setup at new site
Copy apps from old site to new site
Copy DB from old site to new site to check it all works
Put up a “We are down for maintenance” on the old site.
Copy the DB across to the new site - should take a few minutes
Update routers to redirect IP addresses to new site
Update DNS if required to new IP address block

That gets you down to about 5 minutes of downtime without any extra cost. Do I get a cookie? ;)

Russ 22 Jun 06

@Ben Rometsch

I’m a sysadmin for a small-medium size IT shop and we moved our data center about 10 months ago.

What you described is exactly what I did and our down time was just like you said. About 5 minutes.

For those of you who are unaware, DNS even updates within a few minutes now … no longer the 24 hour like it use to be.

Ben Rometsch 22 Jun 06

@Russ

Indeed. And if you had a funky setup with MySQL or similar you could have the old site DB replicating to the new. Then it would just be a case of putting up a maintenance page, waiting a few seconds for the replication to sync, bringing down the old site DB and flicking the router.

I’m not sure why people are even talking about DNS…?

It’s odd that people are talking about transporting data with cushioned transport cases. I always found it easier to trasport data over ethernet! What if the car transporting the disks has an accident or is stolen?

JF 22 Jun 06

Is this an example of your �don�t scale� philosophy

1. No.
2. We say don’t scale ahead of time. Scaling when you need to scale makes sense.
3. Moving an entire hardware cluster is something that may happen once every 5 years. It’s not something that needs to be planned for otherwise.
4. The backup “system” is the cluster itself. It’s fully redundant on multiple levels (from the load balancers to the DB servers to the app servers to the web servers). But if it has to be moved from one physical location another, which as stated above is exceedingly rare, then there will be some downtime.
5. The backup hardware we have is not a mirrored cluster. It’s backup hardware in case hardware fails. Multiple machines in the active cluster can fail at any one time and everything is fine because of the multiple points of redundancy. There’s no need to have a mirrored active cluster since the cluster itself is its own mirror. The only time this ever matters is now when we’re moving physical locations. Then we’re down for a bit on a Sunday morning once every few years or so. Life goes on.

Ben Rometsch 22 Jun 06

Hi JF,

Understood. It’s not a risk I’d take with my hardware, but there you go!

I’m assuming most of your customers are in the US; why not start the move at 10PM and not 10AM?

Ben Askins 22 Jun 06

Thanks for the advance warning. It’s good to be able to plan around this (not that it’s really necessary given that it’ll be 1:00am here in Oz).

To the DNS experts leaving comments here: perhaps you could go and lend your experise to Dreamhost whose unplanned outage has left 2 of my sites and 4 of my clients sites down for the past 24 hours.

JF 22 Jun 06

We have customers all over the world so we’re making the move on a day and time that affects the least number of people.

Re: moving hardware. It’s not our first choice either, but it’s what needs to be done at this point. We can talk about the perfect world all day long, but we have to deal with the way the world is. And right now this requires an ultra rare hardware move on an early Sunday morning. Flickr, Bloglines, and other companies have gone through this recently too. It happens.

Our host is moving data centers to provide better service, more space for expansion, and better power facilities. Everyone is doing their best to reduce risk to the smallest possible manageable levels.

This is phase one of the move. Phase two is in the works as well, but that’s just a data move, not a hardware move. We’ll have fully functional hardware in place in two locations and we just need to shuttle data between them.

Thanks for everyone’s concerns. The experts assure us they have this under control.

Nick 22 Jun 06

@Ben Askins
I hear ya, after dealing with that crap from Dreamhost one to many times, we dropped them. It was getting so that there was a Emergency Maintenance email about once a week. We ended up going with Site5.

Im not sure how Dreamhost still manages to have a good reputation. Some people swear by them. Plus, I thought their control panel was ugly and confusing.

J 22 Jun 06

I love how there’s always a good batch of “dude, it’s simple…just do this and that and flip this switch and everything’s fine…5 minutes max” commenters on blogs. It sure is easy to look in from the outside and say it’s all easy and it should all work like this or that and in the perfect world you’d have this and that set up.

The perfect world — you know the one where nothing every goes wrong because *everything* is accounted for no matter what the chance of it happening — is infinitely costly to create.

It sounds like 37s and their host have this under control. I’ll use my energy to be positive and wish the transition well instead of pointing out the flaws in this or that or the possible potholes along the way.

Joel M 22 Jun 06

Could someone point me to a good tutorial or primer on how clusters, load balancers, DB servers, app servers, file servers, web servers, etc work in a “commercial/enterprise” hosting environment?

Thanks.

Long Time Listener - Repeat Caller 22 Jun 06

“5 hour downtime”… Don’t be waiting with a stopwatch going. Jason said “roughly”, which means you can expect (if past experience is any indicator) that it will take longer. And at least it is happening on Sunday morning in the US (though it will be Sunday night for me in Tokyo), so it’s not like you’ll have as great a potential need for it as you might on, say, Monday morning.

My only wish is that there would be the added surprise of a calendar waiting for us when the site goes live again.

I can dream, can’t I?

Long Time Listener - Repeat Caller 22 Jun 06

Eh… Sorry about that 2x posting, there. That’s not what I mean by “repeat caller”.

Ara Pehlivanian 22 Jun 06

@Matt / Chris / JF:

I see your point about new data being entered. Not to beat a dead horse deader though (especially since JF said the hardware is there only in case something breaks) but in theory you could just do a little MOD rewrite and point all incoming traffic on the old server to the new server’s IP. No?

But, seeing as how there’s no hardware at the destination, this remains purely academic.

Ara Pehlivanian 22 Jun 06

Hmmm, seems I should have read a little further before writing…

@Ben:

I think ppl are talking about DNS because of me. I’m not a sysadmin, so my knowledge of such things is relatively limited.

robbm 22 Jun 06

hmm. I think that the words are ‘lease to move’

I don’t really mind, but I don’t think that it is right to imply that there was no way around it. I’m currently planning the move of roughly 350 24/7 production servers to a new facility and none of them will have downtime. You just have to lease or buy new machines for the other end, implement db syncronization of some flavor and make a cutover.

Just because 37sigs is opting not to do that doesn’t mean they couldn’t have and it’s odd to imply that there were no options.

DHH 22 Jun 06

We won’t be getting new IPs, so there’s no downtime related to that. And you don’t need to change any openings in your firewall, if you’re doing that for Basecamp already.

forrest 23 Jun 06

Quote: “My only wish is that there would be the added surprise of a calendar waiting for us when the site goes live again.”

That would be perfectly awesome!

Anyway, thanks Jason and George for all the upgrades.

@Brandon
Enjoy your swing, I’ll be watching FIFA World Cup!

Yeah.. 23 Jun 06

“Flickr, Bloglines, and other companies have gone through this recently too.”

One big difference, no one pays for those services. This downtime is exactly why “web apps” won’t be replacing desktop apps anytime in the future. Microsoft Project has never been missing for 5 hours before.

I understand George from Tilted’s reasoning for the downtime, obviously they are not going to foot the bill for 37sig redundancy, but I don’t understand 37signals reasonings. How much money do you make a month from your web products? And all your eggs are in one single basket? If Tilted goes bankrupt and there is a lengthy asset reclaiming process, or their physical site burns down I guess your out of business? Now that you are at “the next level” you might want to consider having some seriously redundancy that would include multiple sites and internet connections.

Andrew 23 Jun 06

DNS transfers aren’t really a problem if you plan ahead. A week in advance, simply set the “TTL” on your domain to something small like 3 minutes instead of the default.

Then when you update the DNS, and switch the database from the old to the new server. Everyone with a TTL-obeying DNS client (almost everyone on the net) will see the changed IP within 3 minutes, so the maximum downtime is limited to that.

I’ve done this many times, it works fine, just requires planning and delicacy and doesn’t make as much sense when moving a whole data centre full of customer equipment.

Wifflemaster 23 Jun 06

Thanks for the heads up, sounds like you’ve got it all under control. Have a nice day :-)

John Topley 23 Jun 06

“One big difference, no one pays for those services [Flickr, Bloglines]”.

Excuse me, I have a Flickr Pro account that I pay for, as do many, many other people!

Greg 23 Jun 06

Any chance you guys (37singals or Tilted) can take pics of the data center move or make a post about it?

I remember last time the Something Awful cluster moved, from New Orleans to, I think, Kansas, their coding/network monkey had a really awesome blog post or two explaining the hardware and what went into the move, with photos to explain it.

I don’t know about anyone else, but I’d be interested in some hardware/network porn.

George 23 Jun 06

@Yeah…

Big things are in the works. This is just phase one of two.

George 23 Jun 06

@Greg

Sorry, no geekpr0n4u. A) Priorities, B) No cameras allowed at Equinix. =)

J 23 Jun 06

One big difference, no one pays for those services

See Flickr Pro.

- 23 Jun 06

OK, but there is a difference in expected availability between a product that costs $25 a year that targets primarily personal use and one that can cost up to $1800 a year and targets business use.

JF 23 Jun 06

As stated before, in the perfect world we’d have months of notice to make all the appropriate arrangements to make this move as smooth and as short as possible, but we were just alerted about this move a few days ago. So we’re doing what we can to make it as low impact as possible.

As George stated, having exact duplicates of a fully functional cluster isn’t practical for them or for us. Since the existing cluster is fully redundant at every level the only time an exact duplicate would be necessary is in the ultra rare occasion of a complete hardware move.

So we can spend endless money and time preparing for all possible points of failure, or we can cover 99% of them. We’ve covered 99% of them. We’re covered for multiple hardware failure in the active cluster and power loss. The 1% that isn’t covered is a physical hardware move on one week’s notice. So we have to deal with the world the way it is, not the way we all want it to be.

So hang in there. Everything should be just fine.

Russ 23 Jun 06

It’s been interesting to hear everyones comments with regards to the outage particular when the example of Flickr came up.

Flickr is moot point since Yahoo publicizes the fact the product is in Beta … whoops, I mean Gamma. People paying understand that their might be outages since it’s not a final product yet.

No more comments from me on this post.

Don Schenck 23 Jun 06

Don’t worry folks, there will be a surprise in store.

“We’ve Been Purchased By Yahoo!” But we promise blah blah …

I’ve seen Jason shopping for a new Porsche Cayenne Turbo in the Chicago area.

Trust me on this one.

Edgardo 23 Jun 06

Please guys, is just 5 ours on a Sunday for most. My suggestions on what to do during that time:

*Sleep
*Spend time with your family
*If you don’t have a family go out to a park or something and try to find someone that wants to start one with you
*Read a book
*Go to the World Cup site and watch all the highlights from the first stage
*Rent a movie
*Cook something fancy
*Exercise
*Write a letter to someone you haven’t heard from in a while (a letter, not an email)
*Purchase a new computer and migrate all your files to the new one
*Get a freakin life!

Edgardo 23 Jun 06

I mean hours!

Don Wilson 23 Jun 06

If your business can’t handle 5 hours on a sunday without a webapp then you need to rethink your business stategy. How could have businesses managed without Basecamp way, way long ago?

JF 23 Jun 06

Don, I’m an Audi guy. Although that Cayman is sweeeeet (I don’t like the Cayenne — no trucks for me).

Don Schenck 23 Jun 06

OOOOHHHH … so you are NOT denying my story!!

You heard it from ME first, folks!!

Hmm? 23 Jun 06

But Jason, Audi is like the Microsoft of cars…. overloaded with useless features, expensive, and mostly unreliable. I think a Lotus would be more the 37sig type.

JF 23 Jun 06

I’ve had 3 Audi’s and never had a single problem with them.

I’d actually consider Audi the Apple of car companies. They care about design inside and out. Their interiors are the best in the business. They pay attention to styling details like few others. Their electronics are clean and simple. The exterior designs are understated and elegant.

Neesh 21 Jul 06

The whole stuffs in this notice will never be the same.