Dear Nic, are Netflix ruining cloud computing?

Have you been following the recent interweb kurfuffle accusing Netflix of ruining cloud computing?

Frankly, closely coupling open source tools to a single commercial vendor (Amazon in this case) is a slight of hand. But I'm baffled by the Cloud Computing 1.0 vs Cloud Computing 2.0 debate.

Back in the bad old days of Cloud Computing 0.0 I remember merrily shovelling existing snowflake servers, architectures and management stacks onto hypervisor platforms, freeing up racks worth of space in expensive datacenters and saving the CTO a few quid. It was terrifying.

As tools have matured to allow us to more effectively manage the configuration of our servers, I'm thinking of things like Chef and Puppet here, the terror has abated and green field well managed Cloud Infrastructure has become commonplace. I would call these kind of architectures Cloud Computing 1.0 because, candidly, I think we have a long way to go - this stuff is still pretty immature.

Now here is where my bewilderment arises. I've been following Netflix and some of the talks and blog posts Adrian Cockcroft has been publishing over the past year or so. As far as I was concerned they were the ones moving towards 2.0 and Puppet and Chef were 1.0. Now this guy from InfoWorld comes along and tells us that we've got it the wrong way round! Have I got my money on the wrong horse here?

Cheers, Jim.

Dear Jim, maybe he was drunk?

He's a journalist after all.

Controlling the configuration of your servers with Puppet or Chef (or indeed cfengine or power shell scripts or whatever) is great. It's the right thing to do. Don't stop.

But as you point out, the problem of configuration management does not end with a deployed server. It just ends up as a beautiful, unique, Snowflake because none of those config management tools track changes that happen outside of them.

Netflix have already sorted this out. You need a chaos monkey not only to tell you where you have single points of failure but also to keep your config management honest. Have the mad primate randomly trash your servers and then you can rebuild them. Then they become phoenix servers. They are wholly and completly in configuration management. No snowflake change would last long. The snowflakes are crushed under the trampling ape's massive feet.

Now, to their massive credit, Netflix have taken this even further. Instead of making their servers orthogonally to their release builds, they actually make their servers in the release build pipeline:

netflix-release-build-pipeline

That, it seems to me, is a pretty interesting thing to do. It means that whenever you deploy your server configurations are being refreshed. That's a really good thing. Of course, it doesn't mean you can't run Puppet or Chef on the servers while they're running. But it seems that, as long as you have a chaos monkey you could switch all that out to just static deployments:

  1. build your code
  2. build your VM
  3. deploy your VM
  4. run
  5. when the chaos monkey has destroyed the VM detect it's destruction
  6. deploy a new one from step 1

I think this is an entirely different pattern to the persistent server approach. I call that the orthogonal control model. Why? because the control of the server, the deployment target, is orthogonal to the deployment of your applications.

I think it's interesting that both approaches need a chaos monkey to keep them honest and prevent servers from becomming snowflakes. But it's really cool that the Netflix model only needs 1 deployment pipeline and the chaos monkey. That seems like a valuable reduction in process.

It's true that Netflix's model is now making a whole load of virtual machine images. Joe Masters Emison thinks this is a bad idea. Errr. ok. Why exactly?

I stood in a hall recently and listened to a CEO tell his IT people about how they should emulate the car industry in achieving simplicity. He said the car industry had made great strides in achieving simplicity, for example in one particular factory, cutting the number of assembly lines down from 40 to 3.

Wow. The trouble is, the software industry is nothing like the car industry because we don't make physical things. It doesn't matter how many virtual assembly lines there are if we can contol them with software. We don't care if there are 40 if we can abstract the build of them. In fact, we don't care if there are 40,000 or 40,000,000 of 40,000,000,000,000,000 if we can abstract the build of them.

And that's exactly what Netflix are doing, they're moving the game forward by abstracting the definition of what it means to deploy something.

So, in short Jim, I don't really care for what a cloud journalist tells me, I'll stick with the ideas coming out of the Netflix people because they are smart people and they're doing it.