In our discussion with customers about their monitoring experiences, we often hear the expression “Single Pane of Glass.”
It describes the need for consistency in monitoring tools regardless of where the monitored infrastructure is located. As teams make sophisticated decisions between deployment in a dedicated environment and the cloud, they need one place where all their monitoring information is at their fingertips.
Rackspace Monitoring and Intelligence teams have been working hard based on our customers’ feedback around the needs for a “Single Pane of Glass” for IT Operations. With a recent series of new feature releases in the past six months, Rackspace Intelligence has stepped up to be the “Single Pane of Glass” for cloud monitoring.
Below is a high level overview of basic Cloud Monitoring concepts using pages in Rackspace Intelligence. This post is aimed at those new to the Rackspace Cloud Monitoring product. Advance concepts will be detailed in separate posts, like the recent one published by Kapil Kansal on suppression feature.
Below is a diagram of the concepts we’ll cover:
To get to the Rackspace Intelligence landing page, from your browser go to http://intelligence.rackspace.com and log in using your Rackspace Cloud account credentials. Upon logging into Rackspace Intelligence, you will land on the “Entities” page. This is where you can quickly review your infrastructure status.
Cloud Monitoring uses Entities to represent any object or resource you want to monitor. You can create an entity for any server or website you want to monitor through the “Create Entity” button. This can be set of servers or non-server objects, but most often we see entities that refer to individual servers. As you create new cloud servers or cloud databases, entities are created automatically.
Rackspace customers currently using dedicated managed hosting will soon be able to see the entities on Rackspace Intelligence for their devices as well.
Besides sorting on the columns to find what you are looking for, you can also use the “Open Alerts” page to see a list of open alerts by clicking on the Open Alerts link at the top of the page. During a major outage, you can get paged on many things at the same time. The “Open Alerts” page provides you an overview so you can check to see if there is any correlation among the alerts.
Monitoring Details Page
All the entity labels in Rackspace Intelligence are linked to the “Monitoring Details” page. Get to this page by clicking on an entity name. Besides the basic entity information, you can use this page to see the host information reported by the agent. This saves you from having to log into the actual server and typing the command to obtain the host information.
If you don’t have the monitoring agent installed, you can click on the Install Agent link and launch the Monitoring Agent Installation wizard to guide you through the steps.
The lower part of the “Monitoring Details” page shows the monitoring checks that have been configured for the current entity and the status of the alarms that are associated with each check.
Check and Alarm Details Page
From the “Monitoring Details” page, you can click on any check label to go to the “Check and Alarm Details” page.
A Check in Cloud Monitoring specifies the parts or pieces of the entity that you want to collect metrics on and how you want to do it. In other words, a check returns a group of related metrics. Metrics groups help you figure out how to collect any data you want.
For example, you can use “agent.cpu” check to collect the CPU utilization related metrics from your server. The next screen shot shows the list of metrics collected by an “agent.cpu” check.
Since a check does not trigger any alerts by itself, collecting more data is almost always a good option. Our recommendation is to set up as many checks as relevant to your server even before you know what you want to be alerted on.
The following screenshot shows an agent plugin check I have configured to make sure that the “hubot” process is running on my server.
Agent Plugin Check
Agent plugin check is the most advanced check. It allows you to extend the agent to collect any data you want. The rest of the check types are available in the drop down menu when you are creating the check, as shown in screenshot below. The documentation for all the check types can be found here.
Cloud Monitoring uses Alarms to analyze the data that is collected by a check. The alarm criteria contain the logic to process this data and convert the alarm into one of the three states: OK, WARNING, CRITICAL.
You can specify alarm criteria through the alarm detail popup. The syntax to configure Cloud Monitoring alarms is powerful and flexible. The goal is to enable you to set the condition to be exactly what suites your situation. At the same time, many new users have found it intimidating to get started. In addition to providing the alarm criteria examples in the “Examples” tab, we are working on a more UI-rich and interactive feature for alarm editing.
Notification and Notification Plan
Alarm criteria determine the alarm state based on the metrics collected from the check. When the alarm state changes, Cloud Monitoring will send an alert to you. This is done through constructs in Cloud Monitoring that are called Notifications and Notification Plans.
Each Notification specifies two things: to whom the alert should be sent (Notification Target) and how it should be delivered (Notification Type). The notifications are grouped into notification plans that are used by alarms.
You can start managing your notifications and notification plans by clicking on the “Notify” link in the navigation bar.
From the “Notification Plans” details page, you can select the notifications associated with the current plan. The three check boxes tell you which alarm states should trigger the notification. For example, you can add a new notification configuration to receive an SMS on your phone when and only when the alarm enters a critical state. This way you don’t have to filter your email to locate the critical alerts. Instead, your phone will let you know right away when and only when a critical event happens.
The “Notifications” list is the central place to manage all the notification configurations.
For more Cloud Monitoring concepts, please see documentation here. In future blog posts, we’ll share more in-depth discussions about Rackspace Intelligence is building a smooth experience with thoughtful user experience design. It has been wonderful for us to see all these features showing up in a user-friendly UI. We have confidence that our monitoring product will become a tool that you trust and love.
Cloud Metrics – Elastic Metrics Storage
All the metrics data that is collected by Cloud Monitoring are stored in another product called Cloud Metrics. In addition to ingesting Cloud Monitoring data and supporting Rackspace Intelligence, Cloud Metrics now has the capability to integrate with other metrics collection systems, such as CollectD and StatsD, as well as the dashboard tools Graphite and Grafana. To learn more about Cloud Metrics, please see details here: http://bit.ly/rax-metrics-overview .
Send Your Feedback
We have only scratched the surface of the Cloud Monitoring capabilities. As always, we love to hear from you! Please provide your feedback through our feedback portal. You can also provide feedback directly through Rackspace Intelligence.