Building a cloud operations team isn’t a new concept. But what if you need to build an operations team from the ground up for your applications that are running on Google Cloud Platform?
Over the course of eight years at Rackspace, I’ve had the opportunity to build operations teams that have supported thousands of customers. When given the opportunity to build out a new operations team focused on managed services for Google Cloud Platform, I knew we needed to approach things a little differently.
Google currently has eight applications with more than a BILLION users. Out of necessity, they created Site Reliability Engineering to resolve performance and availability issues. Rackspace has thousands of customers who rely on our Fanatical Support teams to ensure their applications are available to their end users. While the objective is the same for each organization, the approaches are quite different.
Naturally, I wanted to borrow elements from both sides to create a new breed of operations team. Here are the top three considerations I looked to for building a team that creates scalable processes, leverages modern tools and responds efficiently to incidents.
Hire for the bigger picture
Building an effective operations team starts with the people you hire. These engineers need to possess a balanced set of skills and experience. Without the right balance, your operations team will struggle to be effective in ongoing and emergency situations.
A modern cloud engineer or operations engineer for Google Cloud Platform should possess strong problem-solving skills, broad technical abilities (above and beyond expertise with GCP) and soft skills such as conflict resolution and critical listening. It’s also important to understand your team’s weaknesses and focus on development in those areas. Tools and tactics change frequently and spending time investing in individual development will keep your operations team ahead of the curve.
Build and leverage tools that enable scale
Google Cloud Platform provides a robust infrastructure to make it easier to rapidly innovate and scale your applications. Scaling your operational capabilities at the same pace is impractical when accounting for the human factor (work-life balance, burnout, performance and growth). Investing in and/or building the right tooling will enable your operations team to minimize toil and maximize efficiency.
Some examples of this include using:
Application/infrastructure performance monitoring to provide insight into the health of your application. Tools like Stackdriver can be tuned to raise alarms and trigger automation to scale proactively or address issues with your application
Configuration management to enforce standards across your environment. Ansible strikes a good balance between scalability and complexity.
Infrastructure as code to streamline deployments and ensure adherence to policy. Google Deployment Manager provides a native interface to the GCP resource APIs through yaml and python.
CI/CD pipelines to help tie operations activities together from commit to deployment. Jenkins and Spinnaker are tools that can provide visibility into and greater control of your application’s development and deployment life cycle.
Define and enforce standards
While powerful, Google Cloud Platform services can be complex to piece together. Your operations team can provide crucial building blocks for your product and development teams to utilize by defining standards and enforcing their usage. These standards can account for complex networking configurations, ensure proper versioning in dynamic application deployment workflows, and outline identity and access management policies. This foundation will ensure that your operations team can respond effectively within your defined service level objectives.
Highly skilled people leveraging the right tools on a solid foundation of policy and standards is a recipe for success.
Does this seem like a daunting task? Do you need help augmenting your operations team? Visit Rackspace to find out about the support we provide for the world’s leading clouds, including Google Cloud Platform.