car parts

When Rackspace and Red Hat came together to create a managed private cloud, we knew this offering needed to support enterprises looking to transform themselves using the latest technological innovations. Rackspace Private Cloud Powered by Red Hat, RPC-R for short, was architected to meet stringent enterprise requirements such as availability, stability and manageability, while helping these enterprises enter the cloud computing age.

The process for creating RPC-R began with taking the industry’s best OpenStack distribution, Red Hat OpenStack platform, and designing a reference architecture Rackspace could deploy and operate on behalf of our joint customers. Think of it as building the best racing car in the sport, then surrounding it with the best pit crew and equipment.

sports car

Let’s peek under hood and see how we’ve architected RPC-R for stability, availability and manageability.

external-network

Availability

When we talk about availability, we are specifically talking about the availability of the OpenStack services and APIs users interact with as they provision resources. In a cloud environment, instances are ephemeral, but the underlying cloud services are expected to be always on and available. A component failure may cause a VM to crash, but users expect services in their cloud will always be available for provisioning a replacement VM.

To create a highly available OpenStack cloud that can withstand component outage in the control plane, we use an active/active/active configuration with three clustered controller nodes. Each controller node runs the identical services and software needed to manage cloud resources. This control plane design can survive the failure of up to two nodes and when fully operational, each node is active and can service API requests from users. Look for more details on how we cluster the controller nodes in a future blog post.

control-plane

To provide highly available compute and storage services, RPC-R leverage multiple compute/hypervisor nodes and multiple Ceph storage nodes. With the ability to scale out the compute and storage planes by adding more resources, users can scale out their cloud as they need to in order to scale out their applications. RPC-R also supports enterprise storage arrays for customers who want to leverage existing storage technology investments.

Observant readers may have noticed, in the reference architecture diagram above, that RPC-R utilizes redundant external load balancers and firewalls. Focusing on the load balancers specifically, this differs from the default configuration for Red Hat OpenStack platform, which uses HAProxy software to load balance API requests across the three controller nodes.

To support RPC-R, we upstreamed changes so that Red Hat OpenStack platform can deploy external load balancers instead. In the case of RPC-R, we chose to use a pair of F5 hardware appliances to front-end our control plane and to load balance OpenStack API requests. The use of F5 appliances reaps several benefits for our users:

  • A redundant pair of appliances provides high availability of the load balancing layer itself
  • We are able to decouple the virtual IPs of the control plane from the controller nodes which allows Rackspace to fence off a controller node for maintenance or upgrade with minimal to no impact on cloud services.
  • We can get higher throughput of API requests by offloading SSL traffic to SSL Accelerator Modules resident in each F5 appliance.

 Stability

The key to a stable cloud is sound architecture and rigorous testing. The RPC-R reference architecture is designed to scale without stressing a customer’s cloud to the breaking point. That’s where the use of hardware load balancers and active/active/active controllers, as examples, are able to keep the infrastructure stable even during times of unexpected growth or unplanned spikes in application usage.

But no matter how stable the architecture, software breaks. The best way to mitigate that is to design around the possibility of failure and conduct rigorous testing before going into production. Red Hat’s commitment to testing is well known, and RPC-R leverages that by using Red Hat OpenStack platform as the software core of our offering.

Rackspace also goes above and beyond in our testing of the RPC-R software and architecture.

  • We work with the OpenStack Innovation Center to perform large scale performance testing of RPC-R. This ensures that our reference architecture can scale even beyond the needs of our customers.
  • Rackspace tests every RPC-R deployment using a 1,400 use case test suite that leverages the Rally and Tempest projects.

Manageability

When I talk about manageability for RPC-R, I am primarily talking about our ability to provide a managed service to handle day to day operations on behalf our customers. The starting point for creating a managed RPC-R cloud is the Red Hat OpenStack platform Director node, aka. Installer node. The Director/installer node leverages the TripleO project to function as the OpenStack Undercloud, which deploys and configures the OpenStack Overcloud. The Overcloud is what users interact with to provision resources for deploying their applications.

overcloud

The Director/Undercloud node enables Rackspace to automate management and configuration of the OpenStack cloud for our customers. It’s also what allows us to perform upgrades for our customers with minimal disruption to the control plane. More details on the RPC-R upgrade process will be coming in a future blog post.

Cloud management and operations, of course, involves more than deployment and configuration. We must be able to react to issues or potential issues in order to resolve them as quickly as possible. That means having a monitoring solution capable of detecting issues which can scale to meet the demands of our RPC-R customers’ clouds.

For monitoring, we leverage the Rackspace Cloud Monitoring service — a home-grown service we’ve used for years to monitor our public cloud. By extending this monitoring capability to Rackspace Private Clouds, users can take advantage of all the accumulated expertise that comes from monitoring the largest OpenStack public cloud in existence.

cloud-monitoring

The Rackspace Cloud Monitoring service uses both a local agent-based monitor (agent software deployed on each physical node) as well as global (remote HTTP) checks. These checks are executed from designated polling systems in different Rackspace data centers and test the availability of core OpenStack services by running against a publicly accessible endpoint.

Local monitoring is performed by the Cloud Monitoring agent, which is installed as part of the RPC-R deployment. The local agents connect to Cloud Monitoring instances in Rackspace data centers using Transport Layer Security to push up collected metrics.

The agents are packaged with (and trust only) a single root CA certificate directly under Rackspace control. Metric data is analyzed by the Cloud Monitoring service to trigger any defined alerts/alarm notifications and workflow (including automatic generation of tickets for the RPC Support team).

As you can see, Rackspace Private Cloud Powered by Red Hat is more than just a simple packaging of upstream OpenStack code or a Rackspace branding of Red Hat OpenStack Platform. Significant work has been to create a reference architecture that can support enterprises of any scale. Going forward, you’ll be hearing and reading more about how our two companies are collaborating together to build this world class offering.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here