Preparing for “This never happens”

Recently, we’ve seen a number of instances in which a service or service provider has an issue, and suddenly the media is flooded with headlines like “XYZ issue has broken the Internet” and “This is why the world was offline.”

No organization wants their IT operations to be affected by service outages, but many take comfort in knowing that they’re not the only ones affected. While situations may seem outside of an organization’s control, these outages can be mitigated. When resiliency, redundancy, and availability are considered when building a website or application, it can withstand a wide variety of service outages.

Let’s take a moment to highlight the differences between resiliency, redundancy, and availability as it applies to service outages.

  • Availability: a website or application can be accessed and is running as expected.
  • Resiliency: a website or application has the ability to repair itself. It does not mean it will be available.
  • Redundancy: a website or application data exists in multiple places. It does not mean it will be available.

To build the most “service outage-proof’ application possible, organizations need to think globally and consider what kind of outage their application needs to withstand and how automated the process should be.

  • Storage services such as Amazon S3 and Microsoft Azure Blob storage offer features like global and multi-regional replication. This makes these storage options not only resilient and redundant, it also gives us the ability to increase availability.
  • Use highly available Domain Name System (DNS) services. Amazon Route 53, Azure DNS Services, and Imperva Incapsula can provide low time to live (TTL) and even automated DNS fail-over, providing an easy way to increase the availability of your other services such as a cloud-based storage service.
  • Content Delivery Network (CDN) services such as Amazon CloudFront, Azure Content Delivery Network, and others can increase the availability of your site while also increasing the performance of it. By caching non-dynamic data on a CDN, static content can continue to be served in the event of a failure. This helps provide greater redundancy and availability.

A managed service provider can help a company understand and take into account these services and architectural choices in coming up with a design for their solution that meets their risk tolerance and overall needs of their business. While large-scale outages may be beyond your control, there are steps that can be taken to ensure your application is built to survive the next headline-grabbing service disruption. By accounting for resiliency, redundancy, and availability using the methods and suggested tools above, organizations can build outage-resistant websites and applications that will turn “This never happens!” into “Why is everyone else freaking out?”