Amazon has finally published it’s write-up of the outage that occurred starting 4/21/2011 - http://aws.amazon.com/message/65648 .
Honestly I have to hand it to Amazon for single-handedly creating the cloud-computing race. Remember they were an online store, not a datacenter hosting company. They have to be the healthiest company around in terms of constantly improving, expanding, driving into and inventing new products and markets.
But I want to challenge them here. They’ve made something where there was nothing. But I don’t think they have gone far enough yet in cloud computing in terms of offering server computing as a service. I think they owe it to their customers to make some more of this complexity disappear. I’m talking about geographic fault tolerance. I can’t believe in this day and age companies still think they can bet their business on single-site deployments, no matter what kind of clustering and zoning is in place. Amazon needs to take over another layer of the stack here, and make global load-balancing automatic and standard, just build it in to everything.
Sure it’s hard. But right now Amazon is asking the customer to do that hard thing as a one-off baked into their application infrastructure design. Imagine the world-wide cost savings if Amazon does it once, and bakes it into it’s services. Sure the little guy doesn’t want to pay for it. But Amazon ask yourself if you want to have to write another letter like this. And maybe enough big guys will come sniffing around once you add it to the sales literature, to offset those admittedly significant development and ongoing increased resource utilization costs.