Caring For Your Pets In OpenStack

I am very fortunate to visit lots of clients in the UK and Europe that have a huge appetite for running their workloads on OpenStack®. These organizations range from bleeding edge online e-commerce platforms to global enterprises that have mainframes at the heart of their operations. For each of these organizations, there’s the easy fit applications that are perfect fit for a cloud platform – the applications that can be orchestrated and require little maintenance; the applications that handle failure. These include web services, scalable NoSQL stacks like MongoDB and stateless applications that scale behind a load balancer. In many circles these applications are referred to as “cattle”* – larger groups of applications, identifed by the service or its instance ID number, where the first instance is indistinguishable from the numerous other spawned.

Then there’s the other end of the spectrum: legacy heavy weight applications that cause headaches at 2 a.m. when the server fails such as SQL database servers and tightly coupled services that require human interaction on service failure. These are commonly known as “pets;”* applications that require care and attention, are unique and if they become ill take time and money to nurse them back to health. For any true cloud environment, these legacy applications require more than a lovingly crafted orchestration recipe to provide a seamless service in the event of a disaster.

OpenStack can be deployed in a variety of ways to overcome a wide range of disasters at the infrastructure level and when doing so, we’re aiming to achieve one thing: remove any single point of failure (SPOF).

With Rackspace® Private Cloud, we deploy a typical out-of-the-box experience with High Availability (HA) in mind when the hardware is available. This includes replication of databases between two MySQL nodes, running services across multiple nodes and either using load balancing between these services or utilizing a virtual IP that is able to be moved between physical nodes. In this world, we make it very easy to gracefully handle the failure of a controller service such as the OpenStack Identity service, Keystone, or the OpenStack Image service, Glance.

For the Compute nodes, the nodes that run the hypervisor and virtual machine instances, there are a number of things that are now expected capabilities that a purist view of cloud considered unnecessary up to six months ago, such as live migration of running instances and placing nodes into maintenance mode. These capabilities are commonplace for pre-cloud environments such as VMware estates. Live migration is a necessity regardless of whether you have zero to cloud orchestration capabilities complete with a fully automated continuous integration/continuous deployment (CI/CD) environment or a large integrated application where each component offers an element of risk and considerable overhead should an underlying hypervisor suddenly disappear. Live migration puts an element of control to the chaos often associated with a cloud environment. Live migration is an expectation when traditional workloads are migrated away from VMware to OpenStack – it is no longer a good enough excuse or practical to turn around and suggest that migrating to OpenStack involves a rewrite of your application to hide the short-comings of not having this capability within an OpenStack cloud.

KVM, the open source hypervisor shipped with Rackspace Private Cloud, supports both block level migration and live migration when used with a shared storage platform attached to the Compute nodes. We also benefit from the ability to migrate to other nodes by booting from cinder volumes. These cinder volumes are naturally exposed to all our Compute nodes and, coupled with the ability to restart instances on failure, we’re providing features expected by enterprises, but managed in a cloud way.

OpenStack Grizzly, which Rackspace Private Cloud uses, allows an enterprise to consume these features today.

There is still some work to do to improve this area – maturity of the hypervisor’s ability to do block level migrations, improvements to the live migrations through OpenStack to provide an uninterrupted service, easy to consume boot from volume processes; and an ability to perform maintenance and migration operations from OpenStack’s web dashboard, as well as improving the process by which upgrades can occur to remove planned downtime. But work is already under way through the current Havana development cycle to improve the areas of High Availability within OpenStack.

* If you’ve been to any IT conference or if you follow anyone who’s anyone on Twitter, you may have heard of the “Pets vs. Cattle” analogy, coined by Randy Bias and expanded upon by Tim Bell of CERN. Pets versus Cattle is used to illustrate the difference between the mindset of managing virtual machines (in which virtual machines are named, nurtured and cared for) versus managing cloud instances wherein by if your cloud instance instance or “cattle” gets sick, you shoot it and move on. (No offense, PETA.)


Please enter your comment!
Please enter your name here