Cloud Metrics: Working Toward A Public Launch

An important part of Rackspace’s monitoring pipeline is the metrics that we gather in the process. We have a small team called Cloud Metrics that is dedicated to these metrics. We are otherwise known as the Blueflood team since we authored the project, which is the technology at the heart of our Cloud Metrics service. We’ve been hard at work improving this part of our business and some changes are underway that I think are worth sharing.

What We’re Up To

In short, these are the two primary things we’re working on:

  • We’re upgrading our hardware to have even more capacity
  • We’re making this product an http-based service and making it public

What This Means


The upgrade to newer machines has a couple of implications, but the obvious one is our desire for increased capacity and performance. We ingest close to 2 million metrics per minute right now, but we want more. To scale for various Rackspace-wide projects, we are expecting to increase our ingestion rate to 40 million to 50 million metrics per minute over the next year, so we are preparing ourselves for the onslaught.


The more visible change will be that this service is publicly accessible. Cloud Metrics used to be merely a step-child of the Cloud Monitoring product. As such, it had a thrift API that the Cloud Monitoring team had developed for its own internal purposes.

We have removed the thrift API and made Cloud Metrics available as an HTTP API. We have set up all the necessary wiring to make this just another standard-issue Rackspace service. While the only “customer” right now is Cloud Monitoring, these changes pave the way for any customer, big or small, to send metrics our way and retreive them through a standard-issue HTTP API.

Where We’re At

We’re about halfway through the changover. Here’s a breakdown of the things we’ve done and what we’re working on right now:

Progress So Far

  • Our new production hardware is set up and ingesting production data as we speak
  • We’ve migrated all the old rollups to the new production hardware
  • We have all the wiring for the public HTTP API set up

Still In Progress

  • Point all queries to the new production cluster
  • Work out a new method of metric indexing
  • Deprecate the old production hardware

So, some big, big milestones accomplished; some big milestones yet to reach.

What Happens Next

Finishing all the work in progress would be a huge relief for the team and allow us to work on a world of problems and questions we have been eager to address. Things like:

  • Using Blueflood as a backend to Graphite and Grafana
  • Annotation support in Blueflood
  • Integrations with other teams in Rackspace
  • A better data persistence layer with Kafka
  • Using Cloud Metrics for more than monitoring data
  • Courting the open-source community with the Blueflood project

We have a lot that we want to accomplish and the work that we’re doing right now will set us up to achieve all of it. I’ll post another update in a couple months to let you know how far along we are. In addition, the team is planning on writing a few articles that go into more technical depth on how we have done what we’ve done.

For the past 15 or so years, Alex has been gently guiding companies towards better testing and better development practices. What normally begins as an innocent desire to write a few tests usually ends up as a huge, skyscraper-sized robot that automates the testing, deployment and monitoring of all the code everywhere, and also cooks a mean three-egg omelet.


Please enter your comment!
Please enter your name here