Time-series analysis on cloud infrastructure metrics: Exploring how to “right-size” your infrastructure with Amazon Web Services

This post is co-authored with Arti Garg.

Many businesses are choosing to migrate, or natively build their infrastructure in the cloud; doing so helps them realize a myriad of benefits. Among these benefits is the ability to lower costs by “right-sizing” infrastructure, to adequately meet demand without under- or over-provisioning. For businesses with time-varying resource needs, the ability to “spin-up” and “spin-down” resources based on real-time demand can lead to significant cost savings.

Major cloud-hosting providers like Amazon Web Services (AWS) offer management tools to enable customers to scale their infrastructure to current demand. However, fully embracing such capabilities, such as AWS Auto Scaling, typically requires:

  1. Optimized Auto Scaling configuration that can match customer’s application resource demands
  2. Potential cost saving and business ROI

Attempting to understand potential savings from the use of dynamic infrastructure sizing, is not a trivial task. AWS’s Auto Scaling capability offers a myriad of options, including resource scheduling and usage-based changes in infrastructure. Businesses must undertake detailed analyses of their applications to understand how best to utilize Auto Scaling, and further analysis to estimate cost savings.

In this article, we will discuss the approach we use at Datapipe to help customers customize Auto Scaling, including the analyses we’ve done, and to estimate potential savings. In addition to that, we aim to demonstrate the benefits of applying data science skills to the infrastructure operational metrics. We believe what we’re demonstrating here can also be applied to other operational metrics and hope our readers can apply the same approach to their own infrastructure data.

Infrastructure Usage Data

We approach Auto Scaling configuration optimization by considering a recent client project, where we helped our client realize potential cost savings by finding the most optimized configuration. When we initially engaged with the client, their existing web-application infrastructure consisted of a static and fixed number of AWS instances running at all times. However, after analyzing their historical resource usage patterns, we observed that the application had time-varying CPU usage, where at times, the AWS instances were barely utilized. In this article, we will analyze simulated data that closely matches the customer’s usage patterns, but preserves their privacy.

In Figure 1, we show two weeks’ worth of usage data, similar to that available from Amazon’s CloudWatch reporting/monitoring service, which allows you to collect infrastructure related metrics:

Figure 1 (Credit: Arti Garg)

A quick visual inspection reveals two key findings:

  • Demand for the application is significantly higher during late evenings and nights. During other parts of the day, it remains constant.
  • There is a substantial increase in demand over the weekend.

A bit more analysis will allow us to better understand these findings. Let’s look at the weekend usage (Saturday–Sunday) and the weekday usage (Monday–Friday), independently. To get a better sense of the uniformity of the daily cycle within each of these two groups, we can aggregate the data to compare the pattern on each day. To do so, we binned the data into regular five-minute intervals throughout the 24 hour day (e.g., 0:00, 0:05, etc.) and determined the minimum, maximum, and average for each of these intervals.

To read the rest of this post, please visit the O’Reilly site.

Arti Garg is a Principal Consultant at Datapipe, leading Analytics efforts for the Data & Analytics Practice. She has extensive experience working with time-series data, applying techniques she has developed to applications ranging from studying the composition of the Galaxy to supporting incident investigations for jet engines. Her professional experience has taken her to many places including a major industrial products company as a data scientist, an energy services company as the Director of Data & Analytics for its innovation team, a mountain top observatory in Chile as a researcher, and the White House as an energy and R&D budget policy analyst.