It seems fitting after celebrating the U.S.’s Independence Day to cover OpenStack’s Liberty release, the 12th edition of Rackspace Private Cloud powered by OpenStack, in detail.
While the 12.0 and 12.1 versions hit general availability to our customers in April, we are eager to share details about the additional functionality now available with v12.2, as well as describe the reasons behind its reference architecture.
This series will be broken up into three parts. Today’s post will cover the Rackspace Private Cloud powered by OpenStack reference architecture. Part 2 will describe consuming Load Balancing as a Service v2 with Liberty, while Part 3 will cover security hardening for OpenStack.
Rackspace created the reference architecture for Rackspace Private Cloud powered by OpenStack to satisfy the top five major concerns of customers and prospects: cost, security, reliability, scalability and flexibility.
Being the OpenStack managed cloud leader, we have learned from both our successes and failures. Using those lessons, we’ve drafted a very prescriptive design for deploying not just highly available OpenStack clouds, but also production-ready ones.
The launch of our new reference architecture hit the streets in September 2014 with our RPC v9 release. Here are the key components that make this architecture unique and allows us to provide our customers with our industry-leading 99.99 percent OpenStack API uptime guarantee.
This feature is near and dear to my heart, as many of you may know I am a big fan of Ansible. Users not only get a super easy way to deploy OpenStack‚ they also gain familiarity with a very popular open source configuration management tool. We spearheaded and use the openstack-ansible repository to build our OpenStack clouds for customers.
I am all for anything that will remove complexity from deploying OpenStack. Using OpenStack-Ansible allows for consistent deployments and the ease of customizing the install to meet a customer’s needs. With OpenStack-Ansible, you can deploy as an all-in-one single node type setup or a fully distributed production grade multi-node cloud model, all using the same base code, with adjustments needed to just two configuration files.
If you have not had a chance to give this method a try, I highly encourage you giving it a go.
Containers are the newest and hottest approach to simplifying application deployments, so it’s worth understanding how Rackspace incorporates containers into the OpenStack deployment process.
OpenStack can be seen as the definition of always-changing technology, with the community releasing a new version every six months like clockwork. That means upgrading from the previous release is critical challenge — one we are looking to solve by using containers to host the individual OpenStack services. This gives Rackspace the ability to perform in-place OpenStack release upgrades — no more need to “lift-and-shift.”
Currently, we upgrade packages and code running inside containers that host OpenStack services. We’re also exploring the possibility of making future upgrades even more seamless by being able to drop new containers into your control plane with the new release bits, making required database upgrades, then turning off the old containers, meaning the upgrade is done with minimal downtime!
Some additional points of clarity:
- LXC containers are used to host the OpenStack services (not Docker!).
- Container communication occurs over a non-routable network segment.
- Additional containers can be added to any nodes part of the region as one method to scale out the OpenStack services.
Linux Bridge vs. Open vSwitch
As with all things OpenStack, the details matter. This includes how you set up your cloud’s bare metal network. In order to take advantage of all the great features in Neutron, you must start with a foundation that allows for this.
As mentioned earlier, we have experienced failures along our OpenStack journey, and many of them centered around Open vSwitch, particularly when a customer’s cloud footprint began to expand. The very smart Neutron experts at Rackspace began seeking out viable alternatives, and ended up switching out vSwitch for Linux Bridge as Neutron’s underlying operating system component. Linux Bridge had a proven reputation as being a native stable production ready component to Linux for years, and the switch offered a more stable Neutron implementation — and extended the cloud’s overall stability.
With Neutron using the ML2 plugin and Linux Bridge agent, it supported the following network capabilities:
- Flat and VLAN provider networks
- Flat, VLAN and VXLAN overlay (tenant) networks
- L3 agents for routing, NAT and floating IP addresses
As a disclaimer to all you Open vSwitch fans, that technology has come a long way in maturity and stability when under production-like load. Due to that, we do have plans to re-evaluate and may pivot back to having it as part of the reference architecture in the future.
Active-Active-Active Control Plane
Many of you have heard of active-active, but how about active to the third?
If you can accept triple-o, you will definitely love this. As per all the OpenStack documentation out in the world, to achieve high availability for your OpenStack services you need at least two control plane servers part of the region. Some believe having one as a warm-standby is good enough (And I encourage those folks to step into the twentieth century). But in order to deliver the reliability customers seek and stand confidently behind our API uptime guarantee, we knew we had to take things to the next level.
This is where running a three-node control plane comes into play. Instead of playing the failover game, we placed all three control plane nodes behind a load-balancer, and actively balance the OpenStack API traffic over all three. Thus was born the “active to the third” strategy. A core tenant of our private cloud reference architecture is requiring a physical or dedicated software load-balancing component be part of your cloud design. This load balancer is intended to handle not only the OpenStack API traffic, but can also be used to handle application traffic as needed, adding an additional alternative to handling application load balancing needs.
All the core infrastructure services, such as MariaDB and RabbitMQ, are set up as clusters, synchronizing data across the control plane nodes. Losing a control plane node is no longer a panic moment. With this design, even losing two still allows for a fully functioning OpenStack cloud.
Five words: Highly Available Production Ready Cloud.
Please stay tuned for part two of this blog series, where we will cover in more detail the Load Balancing as a Service feature just added to RPC v12.2. Come back to learn the how and why around this fantastic capability.