Scale Up And Down Based On Load With Rackspace Auto Scale

As a Product Manager for Rackspace Auto Scale, I frequently talk to customers to determine what problems they are trying to solve. Many customers are trying to find solutions to similar problems. I will write a series of articles to help explain these problems and provide solutions. I will also provide insights into some of the nuances and caveats to consider when you use these solutions.

Here, I will address a very common problem customers run into. I am frequently asked about how to set up Autoscaling in a way that enables a number of servers to increase and shrink based on load. Rackspace Auto Scale already provides the ability to schedule scale up or down based on time and this addresses a large number of use cases when you know how much additional capacity you will need and when. There are cases when you do not or cannot know how much additional capacity you may need and when; for example, if your website or product is mentioned in a prominent publication.

It is quite easy to address this by setting up the following configuration (click to enlarge):


  • Your base level load is handled by two servers – Server 1 and Server 2
  • You can run a script on Server 1, which is able to compute the CPU or Memory and call a URL

How this works:

The script is set up to run every few minutes, let’s say every five minutes. The script checks the condition that correlates to the load on the system, for example CPU usage in percentage and when it exceeds a certain value you scale up by calling Webhook, which invokes the policy you had previously set up for scaling up. The process is similar for scaling down.

Insights and Nuances:

  1. How many additional servers to add for scaling up will need some experimentation. You can certainly start by adding one server and because the script is running and continuously checking for CPU, it will add additional servers one at a time. This will work well if your load gradually increases or one additional server can handle enough load for a while. You can add more than one server at a time, say two or three servers when you scale up. You will have a larger capacity available quicker, but you may be overprovisioning. Just keep in mind the tradeoffs.
  2. Make sure to have enough of a gap between scale up condition and scale down condition; otherwise you can introduce yo-yo effect. For example, if you add a server when CPU becomes greater than 90 percent. When the script checked again in five minutes, the load was 75 percent. If your scale down condition was CPU of less than 80 percent, this will immediately cause a scale down and so on. It also helps to do a simple calculation to figure out how to set the parameters for scaling up or down. The following example illustrates this:
    • You have two servers that handle base level load. You set the condition that says when CPU is greater 85 percent, add one server. Let’s say that load is at 90 percent. One server gets added and now the load will be equally divided into three servers and each server will handle 90 percent times 2/3, equaling 60 percent load. If your scale down condition was set to 60 percent, this will immediately trigger scale down and load will go back to 90 percent for the two remaining servers. In this case, you maybe better off with four smaller servers that can handle the load. In that case, when you add one server, average the load will be 90 percent time 4/5, which equals 72 percent. You can scale down when load is less than 60 percent of the average, which means it will be 60 percent times 5/4, equaling 75 percent, which is still within limits.
    • You are dependent on the script running all the time. If Server 1 goes down or the script fails, this solution will not work. You can mitigate this by running the same script on both Server 1 and Server 2. Autoscaling is smart enough to ignore one of the requests if both requests come in within the cool down period
    • You are responsible for making sure that Server 1 and Server 2 are up all the time and are receiving representative traffic. Server 3, Server 4 and the rest need to use the same image used by Server 1 and Server 2, which means that you need to set up a process to keep server images in sync. This is a byproduct of an Autoscaling Group not managing all the servers.

In summary, it is relatively easy to set up Autoscaling based on simple load conditions. You have to do a minimal amount of prep work like setting up an Autoscaling Group and some rough calculations for setting up scale up and scale down parameters.

In future articles, I will talk about a variety of topics including a different approach to measuring load, challenges faced by continuous deployment and interaction of monitoring solutions with Autoscaling.

Rack Blogger is our catchall blog byline, subbed in when a Racker author moves on, or used when we publish a guest post. You can email Rack Blogger at


Please enter your comment!
Please enter your name here