As enterprise organisation ramp up their use of on-demand scalable computing resources (aka ‘the cloud’) to solve IT problems, there has been some confusion about what the cloud can really do.
The idea that public cloud can solve all IT problems just is not true. Putting full trust in infrastructure with no true service level agreement, then praying you followed the knowledge base article on how to create a highly available setup properly is not worth the risk for most companies that aren’t start-ups,
Another enticing idea is that the cloud will reduce your infrastructure and IT costs. While this can be true, it is not automatically so.
So how can enterprise organisations use the cloud while keeping the capability to provide their customers and business units service level agreements? Can that be done while still reducing overall cost, while still gaining more control over how their infrastructure is partitioned?
The answer is hybrid cloud. The term is being used in a variety of ways, but in general, hybrid cloud is the strategy of running your own dedicated compute while being able to connect with shared compute to fill various needs, such as increased traffic volume, and the ability to provide high availability.
Imagine having the capability of limitless on-demand scalability with dedicated resources highly available to handle steady state workloads. Imagine never telling a business unit “no” to a new cutting edge project. Imagine never waiting for infrastructure to be available to test out a new project. Lastly, imagine never having to display your “under maintenance” web page, when you are really attempting to recover from a user load outage.
Those are the promises of hybrid cloud.
Now let’s cover the keys to creating a successful production hybrid cloud. Spotting a true production hybrid cloud can be like spotting Big Foot on the family camping trip. Where are they? What is holding up this beautiful IT strategy from being a common practice?
The truth is, many have tried — and failed. Here are some suggestions to deploy a successful hybrid cloud, one that is fit to confidently handle your production customer loads.
What are the keys to a successful production hybrid cloud?
- DAI – Determine your average infrastructure needs
Determining your average steady state volumes allows you to estimate how many instances you actually really need. Environments are too often mis-sized in both directions. Having too many instances is a waste of supported and powered-on hardware that may never be consumed. Application dedicated hardware consumed less than 30 percent long term costs more to support, cool and power in one quarter than the hardware itself. Watch your OPEX cash go right out the window.
But too few instances means you are continuously scrambling to find resources to throw at an ever-increasing user load. That game will eventually cause you to make rushed hardware decisions, costing you more. Bye bye Mr. CAPEX. Let’s not talk about unacceptable SLA’s, outages and bad NPS scores.
So gather up your stats and do the necessary analysis to find your average steady state volume. Key term here is average; using the average gains you the security of being able to handle your normal volume spikes with a defined controllable architecture.
I follow the 60 percent old school Microsoft rule — no instance should receive more than 60 percent utilisation. Find the average volume, cross reference that with instance utilisation (do not forget to average out any resources handling over 60 percent) and at the end you will have an average you can use to build the steady state infrastructure for your private cloud.
You have a 10-server web farm with daily user page views of 500,000. The average CPU utilization on each web server instance is 35 percent. If your user volume went up to 3M page views, would you have enough resources to still serve up your website?
Okay, stop counting. The answer is yes. Increased user load does not equal increased utilisation and vice versa. Many metrics play a role in how your instance handles user load— almost to many to consider. The best way to gather metrics is during your peak volume day(s) or even during an outage. Yes, I did say outage: while that is a very unpleasant time for your system admins, it is a great time to collect valuable data; during that time you know your infrastructure has reached its limits and what, numbers wise, caused it.
Remember: private cloud = steady state volume with peaks; public cloud = volume overflow
Monitoring is near and dear to my heart, because I was a production support engineer for 15 years. Now, as an solution architect, the question I ask at every meeting is: are you monitoring that system? I’m still amazed at how often monitoring is an afterthought.
When using the cloud it needs to be your first, middle and last thought. Monitoring is a double-edged, winning sword. The obvious win is determining your system availability. The other win is instance baselines (notice how stats are like gold to architects). In order to track and distribute workloads you need baselines. Those baselines are best generated by monitoring reports.
Assuming you set up the correct types of monitoring, your monitoring tools are used to set hard thresholds (CPU, memory, disk, worker processes, threads, etc) and actions based on hitting those thresholds. In old IT thinking, that action was to send an email alert, then pray someone responded to it.
With the enablement of cloud computing, you can simply send a request to your public/private cloud to spin up another instance. Of course, this new thinking does come with additional complexity. The system admins’ life of shifting of workloads between public/private clouds can be thought of as managed complexity.
Knowing how many instances running what app in what cloud with what volume is not for the meek of heart, but monitoring will provide that single pane of glass to your environment/application. The combination of monitoring and provisioning automation are your Michael Jordon and Scotty Pippen of basketball.
Provisioning automation (orchestration)
Based on everything I have seen thus far, it’s clear to me that hybrid cloud cannot be accomplished without some level of automation to fully utilise the power of the public and private cloud.
That automation can be as simple as low level Infrastructure orchestration, creating and deploying instances via an API, and go up the stack to deploying applications hosted on those new instances. Depending on how mature your IT organisation is, much can be accomplished by turning infrastructure into code.
Added benefits to provisioning automation are:
- ability to standardize and optimize operational processes and
- to deploy shared resource pools and handle movement of workloads.
It’s also necessary to connect the communication between your public and private cloud footprints. Fortunately we are in the midst of a DevOps revolution, meaning there are many ways to accomplish this. Some of the popular cloud provisioning tools include Ansible, Chef, Puppet, Salt, OpenStack Heat and Vagrant, to name a few. Finding the right tool(s) for your organization will require some research depending on various possible use cases.
All and all, if you keep in mind these three key factors (DAI, monitoring and provisioning automation) you should find great success deploying a production hybrid cloud.
So how can Rackspace help? There are many commodity product options Rackspace can provide (commodity because the product is not custom), saving money and also instilling higher level of assurance in functionality and supportability.
Some hybrid cloud possibilities:
- My favorite combo is Rackspace Private Cloud powered by OpenStack hosted at Rackspace using RackConnect to burst out to Cloud Servers (Rackspace Public Cloud).
- Rackspace Private Cloud powered by OpenStack hosted at Rackspace using RackConnect to burst out to OnMetal server(s) using CoreOS to create Docker containers (Rackspace public cloud).
- Rackspace Private Cloud powered by OpenStack hosted at Rackspace over a dedicated pipe bursting out to AWS (public cloud).
- Any of the above, but instead of being hosted at Rackspace, the private cloud footprint would be hosted in your datacenter.