Upon joining the Rackspace Azure Team as a product architect, one of my first tasks was to help develop a new monitoring platform for our Azure customers.
A large part of offering a managed service backed by Fanatical Support is being able to provide important insights into complex workloads and quickly respond to critical events. Some of the primary requirements we established for a monitoring platform that would accomplish these goals were:
- Reduction of unnecessary communication dependencies — Because we were designing a system to monitor assets outside of our data centers, we wanted to be cognizant of any unnecessary communication channels and reduce potential security risks.
- Hyperscale — We wanted a system that would be able to scale along with our customer growth.
- Flexibility and customization — We wanted a flexible platform that could adapt to different customer environments (custom reports, custom monitors).
- Proactive self-service — We wanted data that was as close to live as possible to be readily available to customers.
- Automation and control — We wanted a system that would allow for automated configuration and onboarding.
- Extensible — We wanted the platform to be natively extensible so we could integrate with existing business tools and processes.
- Comprehensive monitoring coverage — We wanted a system that would be able to monitor Linux and Windows OS, as well as the Azure platform itself.
While assessing vendors and technologies, Microsoft Operations Management Suite Log Analytics immediately bubbled to the top of the list. Not only did it fulfill the key requirements above, it provided a platform for much more.
About OMS Log Analytics
At its core, OMS Log Analytics is a cloud-based data aggregation service that’s able to collect information from a variety of different sources. All the data is then indexed and exposed using a robust search engine, which allows for powerful analysis. The centralized data can be sliced and diced (queried) to provide comprehensive environment reporting, auditing, event correlation and alerting.
Data sources and collections are defined at a global level and host-based agents (Windows/Linux) stream the requested events over SSL. Much like System Center Operations Manager (SCOM), there are a number of default Microsoft reports that can be used, including user-generated custom reports.
Sample OMS data sources include:
- Windows event logs
- Windows/Linux performance counters
- Microsoft anti-malware events
- IIS logs
- Linux faculties (logs)
The Rackspace approach
Rackspace leverages Azure’s OMS Log Analytics service as the primary monitoring and reporting platform for our Fanatical Azure support offering.
While Log Analytics is technically OS and platform agnostic, we currently only target the solution for our Azure-specific environments.
For every on-boarded subscription, we create a corresponding log analytics workspace that provides self-service reporting as well as customized alerting capabilities. All OMS monitoring is integrated into the Rackspace incident management system to create support tickets, which are handled 24x7x365 by trained Fanatical Support engineers.
See author Dugan Sheehan speak about the Rackspace approach to hybrid environments at Microsoft Ignite 2016:
In addition, Rackspace automation ensures all customer VM instances are properly registered to the Log Analytics workspace, to make sure we have complete coverage. As part of the formal onboarding process, all workspaces are pre-populated with Rackspace defined dashboards (solutions), data sources and alerts/thresholds. We leverage an ARM template to streamline the workspace deployment process and a proprietary automation tool to manage the threshold alerting.
The OMS dashboard is used heavily by our Fanatical Support staff and is also available on demand for customers. At any point in time, you can log in and check for things such as:
- Pending security updates
- MSSQL security and performance recommendations
- Active directory health
- Host performance
- Malware status
- Security assessment
- Change tracking
The Malware Assessment solution allows you to view the status of antimalware across all of your registered servers (Windows), including definition dates, agent versions and scan settings.
Security and audit
The OMS Security and Audit pane provides high-level insight into the security state of your computers. The solution also contains built-in reports and queries to help identify potential security breaches, including communication with known malicious endpoints.
The OMS Change Tracking solution help organizations monitor modifications to their server environment. The dashboard allows you to audit service state changes as well as software installation dates.
View Designer can be used to create dynamic resource views for your environment. These custom views can be configured to drive unique insights such as over or under-utilized assets.
IaaS SQL assessment:
Every seven days OMS performs a best practice assessment of your IaaS MSSQL servers and provides a report containing weighted remediation suggestions and resolution steps.
The OMS update management solution allows you to Identify and orchestrate the installation of missing system updates.
Harnessing OMS for enhanced customer experience
There were numerous other benefits that came with the implementation of OMS Log Analytics, including the ability to extend our support offering to the following additional services:
- Patching — Ability to report on patch status for Linux and Windows OS and perform orchestrated updates.
- Automated alert remediation — OMS exposes the ability to call an Azure automation runbook whenever an alert fires. This allows us to write custom runbooks to address specific problem scenarios and streamline the support experience.
- A full hybrid solution — OMS offers a single pane of glass that allows for monitoring and management of full hybrid environments. We’re looking to extend the service to include additional private cloud workloads.
- Access from anywhere – Log analytics is also available via a mobile app, which gives customers access to their critical data from anywhere at any time.
- Platform and PaaS monitoring – OMS also allows for integration with Azure Activity Logging and PaaS Metric reporting. As a result, we are able to offer unique customer reports and monitors beyond just the compute instances. We can alert on service, health and availability issues as well as a variety of critical PaaS events.
Want to find out more about Fanatical Azure Support and OMS Log Analytics? Visit Rackspace to learn about our managed support offering and the opportunity to receive a $4,000 credit towards your Rackspace Azure infrastructure and a free strategy session with an Azure specialist.