You Spoke, We Listened: Better Rackspace Metrics Services

It has been an amazing year at Rackspace Metrics.

Since the launch of our Early Access Program last year, we’ve worked closely with customers to understand their business needs, then brainstormed, debated and designed internally to improve Rackspace Metrics based on that feedback.

Rackspace Metrics is a multi-tenant software-as-a-service (SaaS) product that offers a flexible and affordable platform for storing and serving time-series metrics. It provides a REST API for metrics ingestion and retrieval. In addition, it also provides out-of-box integration with popular open source tools. The software that powers this service is an open source project named ‘Blueflood’, which is built on top of Apache Cassandra. Rackspace Metrics is designed to meet the functional and performance requirements of enterprise-scale metrics.

It’s been a great journey, and we’re thrilled to announce that Rackspace Metrics is now “feature complete.” Below are three important insights that lead to changes we want to share.

What we wanted to verify: Is our scaling technology a differentiator in the market?

What we learned: Absolutely.

We worked with several Rackspace teams plus a handful of our EAP customers who were struggling with their metrics storage solution and were thrilled to switch to Rackspace Metrics, which allows Rackspace Monitoring to store time-series data collected from hundreds of thousands of servers.

Key insights we gained:

  • Don’t limit the metrics collection. The overhead of deciding which metrics not to collect adds cost to a project.
  • Don’t split the metrics into different clusters. Doing so means it’s  almost impossible to build a consolidated dashboard. The overhead of data consolidation adds cost to the project.
  • The storage unit must be production ready. In one case, a full time engineer was needed just to take care of the storage unit — a thankless job no one on the team wanted.

With years of experience addressing the challenge of scaling, we built a system with affordability, reliability and performance in mind. These attributes became the key differentiators for users, many of whom were overjoyed to be able to get rid of the infrastructure for metrics storage and focus more on their core business.

A perfect example comes from our own Engineering Director Alex Scammon, who found costs went down and reliability went up with a move to OnMetal:

What we wanted to verify: Is an API-only product enough for the adoption from the market? Can we focus on making the metrics API faster, better and growing with the user base?

What we learned: Yes, but only with good integration support.

Our users want something that just works out of the box. Programming against an API continues to be the biggest adoption barrier. At the same time, it is quite acceptable to have an end-to-end solution through integration among microservices.

We focused on three key integrations: StatsD, Carbon Relay and Grafana, which allow our users to replace the metrics storage component in their infrastructure without changing their daily workflow. This greatly reduces the cost of switching and solved the biggest problem with metrics storage.

The following diagram shows how it works with StatsD:

For more details on these integration points, please see the following references:

And we’re always open to any further integration we can do. Join the Early Access program and share your ideas.

What we wanted to verify: Is it sufficient for Rackspace Metrics to only support numeric data?

What we learned: No. Annotation and enum values are also critical for time-series data.

Based on what we saw and heard, we identified two new types of data to replace the usage of strings for Rackspace Metrics: annotation and enum.

Annotation has been a must-have feature for all dashboarding tools since Etsy’s Mike Brittain wrote “Tracking Every Release” in 2010. It allows special events, such as deployments to be recorded and shown along with the graph.

The documentation for annotation is here:

Often discrete values like HTTP response code are collected for histogram analysis. You can now send them as enum types and have Rackspace Metrics count the occurrences of theses values during the roll-up process.  No more scripting!

The document for enum is here:

What’s next?

Several customers have reported they plan to increase the data volume to 10x or even 100x.  From our experience over the past year, we have full confidence we can support that kind of growth. From the same experience, we also know it will take some heavy lifting, and this will be our primary focus before the UA launch.

In the meantime, our Early Access user base is growing, and we’d love to help even more companies make metrics collection easier, more reliable and affordable. Sign up today, and get an invitation to the private feedback forum. We value your ideas, and want to continue refining Rackspace Metrics. Your ideas will be heard!

Shane Duan was a Senior Product Manager at Rackspace enabling monitoring capability in all Rackspace products. His mission was to provide a monitoring experience that just works and is part of the Rackspace Fanatical experience. Shane has been in the software industry since 1998 and had the fortune to work with one of the best in software development tools (Borland) and software development process (ThoughtWorks). Before Rackspace, Shane worked as a Development Manager for Guidewire DevOps team delivering the best tools for the developers, including two private cloud deployments for testing. Shane has degrees in Physics, Computer Science, and most recently MBA from Wharton San Francisco. When not typing on a Dvorak keyboard and getting geeky with software and business analysis, Shane loves spending time with the family on outdoor activities like hiking, geocaching and snowboarding.

LEAVE A REPLY

Please enter your comment!
Please enter your name here