Root Cause Analysis in Rackspace Data Centers: From Fixing to Solving

A few years ago, I planted a beautiful little golden thryallis bush. I placed it in an area with rich soil to provide the proper nutrients. I ensured it received ample sun and water, and it began to grow very well. As it seemed to have settled in nicely, my attention waned.

One day, however, I saw the thryallis had been covered by a vine, wrapped so tightly it was difficult to remove. I pulled out the unwanted vine from the ground, but a month later, I found the vine once again strangling my thryallis. This time, I dug down deep into the soil to remove the full root of the vine. Problem solved.

Getting to the root 

I find the issues we face in Rackspace Data Center operations to be very similar.

As we implement new processes and policies and roll them out in an environment conducive to their adherence, we see great results. However, when we lose focus, issues can crop up quickly, latching on to the process unnoticed. The issue begins choking the process, competing for nutrients, and causing the activity or environment to create unintended results. If we simply address the symptom (the vine) and not the cause (the root), the issues surface again and again, creating undesired outcomes.

When I first came to Rackspace to lead data center operations, I noticed these issues… a lot. We were good at “fixing” symptoms and diving in to save the day, but not solving issues at their root causes. The work was frustrating. Rackers worked hard to deliver results, but achieved inconsistent outcomes.

In response, we implemented a new root cause analysis methodology to change the approach from fixing to solving. The RCA methodology at Rackspace leverages fishbone (ishikawa) diagrams combined with the Lean Six Sigma “five whys” approach.

[Read More: Poka Yoke, Fishbone Diagrams and 5S: How Lean Manufacturing Principes Guide Rackspace Data Center Operations]

Five nines uptime

Integral to the methodology is engaging a combination of those specifically involved in the process or event that resulted in an undesired outcome as well as the subject matter experts. This small team diagnoses what happened (detailed summary) and the potential causes (fishbone diagram), then is able to determine the actual causes (five why’s).  Actions are then agreed on to address those identified causes.

These actions show up in adjustments to process and policy standardization, changes to automation, and large or small projects to improve architectural designs, upgrade code versions or replace unreliable parts.

At Rackspace, we’ve leveraged this RCA methodology hundreds of times for issues ranging from a late delivery, a safety near miss and a customer-impacting disruption and seemingly everything in between. Thousands of actions have been implemented to continue to narrow the opportunity for error.

The results? Five 9s uptime — yup, that’s 99.999 — year after year, and an army of Rackers who are always thinking beyond the obvious symptoms to root causes. This is how we create fanatical outcomes for our customers. This is the Rackspace difference.

Jim Hawkins is the vice president of global data center operations and engineering at Rackspace, where he oversees the company’s worldwide network of data centers and other critical infrastructure and operations. Jim joined Rackspace in 2008, initially serving as director of operational excellence. Since then, he has held several positions, including director of U.S. data centers and senior director of global data center operations. Jim has brought a number of strengths to these roles, including a specialized knowledge of Lean Six Sigma methodology and operational discipline. While serving in each of his roles at Rackspace, he incorporated his knowledge of these principles into the design and daily operation of Rackspace’s critical infrastructure and teams transforming their performance. As Jim approaches his 10th year at Rackspace, he continues to make a significant impact on the performance of the company. Recently, he enhanced the operational rigor of Rackspace network operations through fleet management principles. When he’s not busy working towards his professional goals, he enjoys daily exercise, working on his landscaping projects, spending time with his wife and coaching his three sons on the soccer field. Before he joined Rackspace, Jim was a plant manager and North American fiberglass fabrics manager at Owens Corning, where he turned a struggling $40 million division into a very profitable business by leveraging his knowledge of Lean Six Sigma methodology. He earned his BS in marketing at Westminster College, graduating Summa Cum Laude, and received his MBA, with high honors, at Purdue University.


Please enter your comment!
Please enter your name here