I am pleased to announce that today we begin taking applicants for the Rackspace Cloud Monitoring Private Beta. This is especially exciting because we will be offering technology created and integrated post-acquisition of Cloudkick joining the Rackspace family.
From the inception of Cloudkick, we’ve followed one paramount goal: make the lives of system administrator easier. We feel our Cloud Monitoring product meets that goal and are seeking beta testers to help us improve it all the way to general availability.
Please note: this beta is API only; control panel integration is planned for 2012.
What is Monitoring? Monitoring helps a user keep track of the overall health of an application. There are a number of different techniques, ranging from making sure a web address is responding to http requests to ensuring a port is sending the correct banner as defined by a specific protocol. Internet service monitoring in general is a must-have to ensure business continuity.
What is Rackspace Cloud Monitoring? It is an API driven cloud service built for infrastructure monitoring. It offers a simple yet powerful feature-set, allowing extreme flexibility in configuration and execution. Starting with the private beta, we are offering 5 datacenters (monitoring zones) from which polling services will be available.
How does it work? If you are familiar with traditional monitoring tools this system is similar in a number of ways. This product checks the health of a variety of internet-based services. Very simply, we help answer the question, “Is my service up and running?” To do that, we have two simple objectives:
1. Alert you (the site or application owner) before your customer knows
2. Take measures against allowing the system to go down in the first place.
We’ve taken careful measures to ensure we are processing events as fast as possible; and do our best to deliver those alerts in a failure scenario, in a timely manner.
We offer unique features and configurable controls:
● Monitor various components – Monitor websites and URLs (for link inaccuracies, missing content, 404 pages, max response time etc.), ports and protocols to ensure services are behaving the way you need them to.
● Multiple Monitoring Zones Perspective – Monitor a single resource from many different monitoring zones. With a mixed perspective you control the consistency policy: SINGLE, QUORUM, or ALL. This allows you to optimize for time-to-alert or accuracy of alerts. These controls tune whether to alert at the first sign of a failure, a majority agreeing, or when all datacenters “agree” the service is down. Monitoring multiple zones helps in reducing false alarms (e.g. if one of the datacenter is down but the end customers are still able to access client website then it’s a false alarm).
● Auditing – Use an API to retrieve time series data from the system. Use this to go back in time and verify which notifications went out and why. Also, track all changes to an entity, check and alarm. Get the before and after of every changed object in the system. You can even add a “who” and “why” parameter to each POST/PUT request to keep a log of why the changes were made!
● Remote IPv4/v6 checks – Monitor IPv4/IPv6 services via IP addresses or using DNS resolution. Using DNS resolution, specify which type of address to resolve, A or AAAA.
● Data Model – We’ve disconnected the concept of alerting and data collection. This is important for creating robust and rich monitors. Check many aspects of a single resource; for instance, make sure the site is responding with a 201-status code, and features the copyright as a body match on the bottom of the page. Ensure all these conditions meet before the status of an alarm is “OK.”
● Alarm Language – Create a unique query with our language purposefully built to express thresholds and alert conditions. String compare metrics, use a regular expression operator, check the rate of change, and much more. With each of these primitives you can chain together multiple conditional’s to set the state of an alarm. Our documentation highlights best practices for building robust monitors and alarms.
● On-demand Simulation – Run a check on-demand, to verify its status before adding it. This allows you to get a feel for a monitor before being alerted by it. On top of that, you can take the results and pass it another endpoint to simulate your alarm language criteria match! This is extremely powerful as it takes some of the complexity out of guessing and allows one to simulate responses as they happen in normal operation.
● Change Alert Destination – Depending upon which event is generated, you can execute different notifications; for instance on a warning alert, fire a webhook, but on an error alert, and send firstname.lastname@example.org an urgent email.
I am extremely excited about getting this product in front of the right customers. The Cloud Monitoring Private Beta is intended for those familiar with monitoring systems and REST-like API’s. Access to the API is managed through an existing Rackspace Cloud account, but you do not have to be a current Rackspace customer to be considered.
To apply for the beta program, please fill out this short survey with your contact information and use case and we will determine your eligibility within 24 hours.
We’re looking forward to what this product will bring us in the future. If you’d like to get in on the ground floor, let us know and as we’d love to hear from you.