Whether you use the Cloud or dedicated servers, you should always make sure you have a plan for your configuration in the event that something goes wrong. This is a series of posts based on a discussion I had with Aaron Scheel, a solutions engineer here at Rackspace.
We all understand that an ounce of prevention is worth a pound of cure, yet this is sometimes overlooked when architecting a configuration. Technology sometimes breaks, but having a High Availability (HA) configuration is a way to ensure that a bump in the road won’t take down your environment.
“The cloud can help to mitigate disasters with the ability to replicate servers (and thereby the data on them) at an inexpensive price. That’s one of the powers of cloud. It is an inexpensive means you can do disaster mitigation, when appropriately leveraged,” Scheel says.
The distinguishing feature of an HA configuration is that not only is there a level of redundancy amongst your servers and network gear, but also the ability for the configuration to continue serving requests if a node is taken out of rotation. This ultimately means that your configuration doesn’t go down even if a server goes offline.
To achieve an HA configuration you should focus on several things including: file system/database replication with automatic failover and load-balancing appropriate parts of your configuration.
To make your configuration highly available, it is imperative to have replication of both the file system and database. Replication ensures that all the data in both their file system and database is copied over to another server. The main difference between replication that can have an automatic failover as opposed to a manual failover (as discussed in the previous Disaster Recovery post) is that the servers where data is being replicated must be the same size as the primary server.
For example, if you have an 8GB RAM server and you want to set up automatic failover, you would want to replicate the data to another 8GB RAM server. This means that if the primary server goes out of rotation the 8GB RAM server can step up immediately without having to be resized.
Having an HA configuration previously was quite an expense, however, the cost savings associated with the cloud has afforded this luxury to both small and large sized businesses. For those customers serving up crucial data who do not have a tolerance for any amount of downtime associated with manual failover, an HA configuration is a solution they should strongly consider.
For file systems, customers can set up a variety of nodes that sync all the information together. They can all serve traffic in conjunction (more on this in a moment), so if one gets taken out with a problem, the other servers can continue serving requests for the customer.
Setting up replication and automatic failover is slightly different for database servers. One possible solution is for customers to have a primary DB1 server, a DB2 server (to where data is being replicated) and a Witness/sqlMMM Server (for Windows/Linux respectively).
“If the primary MS DB server goes down then the Witness Server and DB2 both say ‘I can’t see DB1 any more, can you?’ If they both agree that they can’t see DB1 then they will decide to promote DB2 to the primary,” Scheel explains. “Once this happens, the customers application needs to be aware that if it can’t connect to DB1 then it should connect to DB2.” This is one example of how to set up database replication with automatic failover.
“Alternatively, if one of the Masters in a sqlMMM implementation goes down, the sqlMMM node will be a bit more invasive by actually moving a shared IP from the failed primary Master to the secondary Master. This reduces the amount of awareness that the front-end app/web servers have to have during a mySQL node failure.”
If you have replicated your file system on your Web/App servers to different servers, you should consider having a load balancer in front of all these machines. This load balancer will essentially distribute all the requests to all the servers that are located behind it.
This does two things: (1) it prevents one particular server for being over taxed and (2) if a server does fall out of rotation, it would simply route the traffic to those servers that don’t have a problem.
Imagine that you have four servers in your configuration. With a load balancer, traffic will be distributed to each of these servers. Suddenly, the second server drops out of rotation with an issue, the traffic will then be routed to the other three servers.
While the configuration will see an increased amount of traffic on a per server basis, that increase is also distributed across the configuration. This minimizes the additional load each server has to handle, allowing your configuration to serve traffic while the other node is being repaired.
Often, these types of implementations will have one node beyond what is actually required to handle the load (known as an n+1 configuration), so that when a single node fails, the client doesn’t experience increased latency to requests, and the remaining systems are at a lower risk of being over-taxed.
Different businesses have different needs, and our solutions engineering Rackers have the expertise to guide you to the best option. To understand what solution works for you, it would be best to give us a call and chat with one of these Rackers in more detail.
Are you looking for more information? Be sure to read our previous posts where we discussed some of the high level differences between Disaster Recovery and High Availability in the cloud. What more specific information on Disaster Recovery? Check out last week’s post to get more insight.
Aaron Scheel is a solutions engineer Racker who has advised a number of our customers with strategies to ensure that their businesses have the ability to withstand an adverse event.