An important part of Rackspace’s monitoring pipeline is the metrics that we gather in the process. We have a small team called Cloud Metrics that is dedicated to these metrics. We are otherwise known as the Blueflood team since we authored the blueflood.io project, which is the technology at the heart of our Cloud Metrics service. We’ve been hard at work improving this part of our business and some changes are underway that I think are worth sharing.
In short, these are the two primary things we’re working on:
The upgrade to newer machines has a couple of implications, but the obvious one is our desire for increased capacity and performance. We ingest close to 2 million metrics per minute right now, but we want more. To scale for various Rackspace-wide projects, we are expecting to increase our ingestion rate to 40 million to 50 million metrics per minute over the next year, so we are preparing ourselves for the onslaught.
The more visible change will be that this service is publicly accessible. Cloud Metrics used to be merely a step-child of the Cloud Monitoring product. As such, it had a thrift API that the Cloud Monitoring team had developed for its own internal purposes.
We have removed the thrift API and made Cloud Metrics available as an HTTP API. We have set up all the necessary wiring to make this just another standard-issue Rackspace service. While the only “customer” right now is Cloud Monitoring, these changes pave the way for any customer, big or small, to send metrics our way and retreive them through a standard-issue HTTP API.
We’re about halfway through the changover. Here’s a breakdown of the things we’ve done and what we’re working on right now:
So, some big, big milestones accomplished; some big milestones yet to reach.
Finishing all the work in progress would be a huge relief for the team and allow us to work on a world of problems and questions we have been eager to address. Things like:
We have a lot that we want to accomplish and the work that we’re doing right now will set us up to achieve all of it. I’ll post another update in a couple months to let you know how far along we are. In addition, the team is planning on writing a few articles that go into more technical depth on how we have done what we’ve done.