Amazon Web Services just celebrated its fifth birthday last month, but a late guest has come — and crashed the party. The crash felt ’round the world (or Crash 2.0 as the Daily Mail snarkily dubbed it) on April 21st has caused many to question the safety of the technology.
The problem originated in the Northern Virginia data center for Amazon’s EC2 (Elastic Cloud Compute) Web service when a network connection failure on Thursday morning triggered an automatic recovery mechanism, which also failed. Thousands of websites affected by the crash, which lasted a couple of days, included reddit, foursquare, HootSuite, BigDoor, and Quora. Some of the larger sites (such as foursquare) came back up as engineers frantically worked to restore services. Two of Amazon Web Services’ biggest customers, Netflix and Zynga, seemed to be unharmed.
Cloud computing allows companies to store their data and files in “server farms” provided by Amazon or cloud hosting competitors such as Rackspace, GoGrid, Joyent, IBM, and Microsoft’s Azure platform. Amazon Web Services, which is projected to make more than $500 million this year, is a key player in providing cloud computing storage, as evident by the number of websites affected last week.
Many industry experts see the cloud as the future of computing, and it is unlikely the Amazon outage will substantially change that. But it will definitely lead to an industry conversation, according to a New York Times interview with IDC analyst Matthew Eastwood, who called the outage a “wake-up call for cloud computing.”
In addition to the obvious security concerns, several other issues have been raised by the disruption. What operations should be kept in-house? What type of back-up and recovery services are available and how much are those services worth? What are strategies for working around cloud computing outages? What level of communication should be expected when outages occur? (There has been some criticism of Amazon’s communication, or lack thereof, including that by BigDoor CEO Keith Smith, who described the Amazon updates as sounding like something written by “attorneys and accountants.”)
Amazon continues to post the status of its service on a website it calls the Service Health Dashboard, which provides customers with a link to report issues. The company also offers RSS feeds to alert customers on service interruptions. Most operations appeared to be normal by the following Monday, but the discussion and debate will certainly linger.