Update on Scheduled Maintenance/Interruption

Rackspace has experienced a service interruption during tonight’s scheduled maintenance on UPS Cluster G. We were testing phase rotation on a Power Distribution Unit (PDU) when a short occurred and caused us to lose the PDUs behind this Cluster. The phase rotation allows us to verify synchronization of power between primary and secondary sources.

All power has been restored and devices are being brought back online. The PDUs were down for a total of about 5 minutes. We have aborted the maintenance for the remainder of the evening and will reschedule this for another date.

Service to Cloud sites has been restored and we are continuing to work with Cloud sites customers to bring them online. We will continue to update the Cloud Server and Slicehost status on their respective websites, which you can access through these links:  Cloud Status and Slicehost Status.

If you have any questions please feel free to contact a member of your support team.

Sincerely,

Rackspace

Debbie Talley is a senior internal communications manager for Rackspace.

24 COMMENTS

  1. […] 3:40 AM CST [UPDATE] Thank you for your patience as we work to restore service to our CloudFiles system this morning. The service interruption was related to tonight’s scheduled data center maintenance on UPS Cluster G in our DFW data center. We have posted an official Rackspace incident report here: http://www.rackspace.com/blog/?p=690 […]

  2. I’m at a lost for words. Only word that keeps popping up – “unacceptable” – after what went through in June/July. Get your act together, please. We have invested a lot of time switching over to RackspaceCloud.

  3. The PDUs may have been down for 5 minutes, but customer servers were unavailable for well over an hour. Let’s don’t whitewash this, please.

  4. I agree that this blog post misrepresents the scope of the outage. In my case, my slice was unreachable for about an hour and a half, and I was never proactively notified of it (my own monitoring running elsewhere caught it). The blog post instead makes it sound like the outage was only five minutes, which is not correct.

  5. If you are a Rackspace Cloud customer, you will be receiving a Post Mortem via email outlining the details of the service interruption that occurred last night/early morning. We appreciate your patience.

    Best,

    Angela Bartels
    @rackcloud

  6. Yeah all these constant issues with the Cloud Site is a very big concern for my company and I. We have spend a lot of time migrating over to Cloud Site and we have discovered that it is not as reliable as the 99.9% uptime Rackspace had promised.

  7. was this to do with the UPS’s internal bypass failing? and thus having an out of phase moment causing a complete loss of power?

    -n

    curious – is MGE your PDU vendor? 🙂

  8. Doesn’t the customer gear have redundant power supplies that operate against separate UPS Clusters? Or is each side share a single point of failure?

  9. […] today there was an outage at the DFW Rackspace data center, which unfortunately meant the BanditDeals web site was knocked offline as well. Things look to be […]

  10. I was not greatly affected by 90 minutes of downtime, but was significantly bothered by the lack of information. I received notification of the outage from my own monitor about 2 hours after it was over and went to the website to find out what happened (primarily if it was fixed and if there was anything I needed to do) and clicked on “cloud status.” It had something about network connectivity, but nothing about servers rebooting or maintenance misfortunes.

    I truly appreciate the full explanation by email 20 hours later, but some hint in real time would have been better.

    Also, as I’ve seen pointed out elsewhere, Rackspace should report times in UTC, not CST. CST is probably inapplicable to the majority of customers. It’s a pain converting.

  11. We had a downtime of 3 hours in our case (not 5 min). This is the not the first time this has happened. In this instance we were not even notified that there was going to be a maintenance window.

    We actively encourage organization to migrate to the cloud. These type of incidents are deal breakers.

  12. I too was down for 1 hour, 12 minutes. I was never made aware there was going to be maintenance nor was I aware that my slice went down (besides my external Nagios monitoring).

    I should have been warned…all of us should have been. A followup email should have been sent to the “affected” customers with an RFO. This is disappointing.

LEAVE A REPLY

Please enter your comment!
Please enter your name here