Last week, Amazon’s EC2 business suffered a significant outage, which took down a significant portion of hot start-ups and existing businesses alike. While the technical causes of the outage are interesting, and have been dealt with – at least with what information is available – in other places, including http://blog.rightscale.com/2011/04/25/amazon-ec2-outage-summary-and-lessons-learned/ (thanks to Josh Mahowald for the link), I find the crisis management most interesting.
Amazon’s communication level during this incident was an F, and that is being generous. It took them over 40 minutes to acknowledge a problem, and from that point on, data was non-specific and infrequent. Further, they still have not opened up as to what the root cause was (at least as of this writing).
Amazon is a fantastic company. A legendary startup, they went into retail – always a difficult business – excelled, survived two market meltdowns, and continue to prosper. But Amazon is a retail company built on technology. They excelled in the infrastructure to manage the retail business, it may even have provided a critical competitive advantage, in the same way that Google and Facebook’s investments in infrastructure also provided and continue to provide significant advantage. In my days in Wall Street IT, for example, when other shops had 10 servers to every systems administrator, we had a 100:1 ratio due to massive infrastructure engineering. It reduced our costs but made us massively more nimble in infrastructure, by extension in applications and in business. It was a serious competitive advantage.
But Amazon is still, at heart, a retail company; its culture is one of selling books (and everything else) to the end-user. When Amazon Retail is down, end-users don’t want details, which actually detract from the message. On the other hand, Amazon Web Services is a technology services business. To succeed in cloud, and not be a short-lived wonder, it needs a culture of selling services to IT people. These people, in turn, are providing other services to others, and need to handle difficult situations. Amazon is, in some ways, a “deputy CIO for infrastructure” to the CIOs at its customers. Any CIO who has an outage and doesn’t keep the business properly informed quickly becomes an ex-CIO.
Culture Matters. Amazon needs to fix this, and it needs to fix it fast. My recommendation? Spin AWS off. Let it run as its own company, with a CEO handpicked from another Internet company that really gets how to sell to and manage IT customers.