A few years ago, I planted a beautiful little golden thryallis bush. I placed it in an area with rich soil to provide the proper nutrients. I ensured it received ample sun and water, and it began to grow very well. As it seemed to have settled in nicely, my attention waned.
One day, however, I saw the thryallis had been covered by a vine, wrapped so tightly it was difficult to remove. I pulled out the unwanted vine from the ground, but a month later, I found the vine once again strangling my thryallis. This time, I dug down deep into the soil to remove the full root of the vine. Problem solved.
Getting to the root
I find the issues we face in Rackspace Data Center operations to be very similar.
As we implement new processes and policies and roll them out in an environment conducive to their adherence, we see great results. However, when we lose focus, issues can crop up quickly, latching on to the process unnoticed. The issue begins choking the process, competing for nutrients, and causing the activity or environment to create unintended results. If we simply address the symptom (the vine) and not the cause (the root), the issues surface again and again, creating undesired outcomes.
When I first came to Rackspace to lead data center operations, I noticed these issues… a lot. We were good at “fixing” symptoms and diving in to save the day, but not solving issues at their root causes. The work was frustrating. Rackers worked hard to deliver results, but achieved inconsistent outcomes.
In response, we implemented a new root cause analysis methodology to change the approach from fixing to solving. The RCA methodology at Rackspace leverages fishbone (ishikawa) diagrams combined with the Lean Six Sigma “five whys” approach.
Five nines uptime
Integral to the methodology is engaging a combination of those specifically involved in the process or event that resulted in an undesired outcome as well as the subject matter experts. This small team diagnoses what happened (detailed summary) and the potential causes (fishbone diagram), then is able to determine the actual causes (five why’s). Actions are then agreed on to address those identified causes.
These actions show up in adjustments to process and policy standardization, changes to automation, and large or small projects to improve architectural designs, upgrade code versions or replace unreliable parts.
At Rackspace, we’ve leveraged this RCA methodology hundreds of times for issues ranging from a late delivery, a safety near miss and a customer-impacting disruption and seemingly everything in between. Thousands of actions have been implemented to continue to narrow the opportunity for error.
The results? Five 9s uptime — yup, that’s 99.999 — year after year, and an army of Rackers who are always thinking beyond the obvious symptoms to root causes. This is how we create fanatical outcomes for our customers. This is the Rackspace difference.