Amazon cloud computing outage raises questions

Amazon Web Services just celebrated its fifth birthday last month, but a late guest has come — and crashed the party. The crash felt ’round the world (or Crash 2.0 as the Daily Mail snarkily dubbed it) on April 21st has caused many to question the safety of the technology.

The problem originated in the Northern Virginia data center for Amazon’s EC2 (Elastic Cloud Compute) Web service when a network connection failure on Thursday morning triggered an automatic recovery mechanism, which also failed. Thousands of websites affected by the crash, which lasted a couple of days, included reddit, foursquare, HootSuite, BigDoor, and Quora.  Some of the larger sites (such as foursquare) came back up as engineers frantically worked to restore services. Two of Amazon Web Services’ biggest customers, Netflix and Zynga, seemed to be unharmed.

Cloud computing allows companies to store their data and files in “server farms” provided by Amazon or cloud hosting competitors such as Rackspace, GoGrid, Joyent, IBM, and Microsoft’s Azure platform. Amazon Web Services, which is projected to make more than $500 million this year, is a key player in providing cloud computing storage, as evident by the number of websites affected last week.

Many industry experts see the cloud as the future of computing, and it is unlikely the Amazon outage will substantially change that. But it will definitely lead to an industry conversation, according to a New York Times interview with IDC analyst Matthew Eastwood, who called the outage a “wake-up call for cloud computing.”

In addition to the obvious security concerns, several other issues have been raised by the disruption. What operations should be kept in-house? What type of back-up and recovery services are available and how much are those services worth? What are strategies for working around cloud computing outages? What level of communication should be expected when outages occur? (There has been some criticism of Amazon’s communication, or lack thereof, including that by BigDoor CEO Keith Smith, who described the Amazon updates as sounding like something written by “attorneys and accountants.”)

Amazon continues to post the status of its service on a website it calls the Service Health Dashboard, which provides customers with a link to report issues. The company also offers RSS feeds to alert customers on service interruptions. Most operations appeared to be normal by the following Monday, but the discussion and debate will certainly linger.

Bobby Duncan

Bobby Duncan is a Hoover’s text editor. In addition to covering environmental, mining, and chemicals companies, she likes to monitor the role of people in business.

Read more articles by Bobby Duncan.


  1. Online Tech says:

    One of the major issues is the advent of public clouds being hosted on hardware that can easily fail and cause major outages. These companies should look to private cloud computing for security, and offsite backup and virtualization for faster recovery times.

  2. Bobby Duncan Bobby Duncan says:

    Since the outage in April, Amazon Web Services has strengthened its security capabilities. However, AWS’s brief outage Monday night (August 8) affected some of the same websites. This week Amazon extended its Virtual Private Cloud (VPC) service to all AWS regions, which offers customers the ability to establish their private clouds in a faster, more secure environment. No doubt many will consider that option.

  3. Cheryl Lucia says:

    i would like to comment on this in retrospect, now that it has passed. Yes it is still referred to, but largely it was a major force behind CSP’s using comprehensive SLA’s and consumers being aware “NOT” to do business with a company without one. Companies in the cloud computing industry now have a more definitive grasp on disaster and recovery plans and have them outlined in their SLA’s. Those who do not miss out on clients.

Leave a Comment