Evaluate Your Monitoring Strategy

An increasing number of open source and SaaS, devops monitoring tools have been introduced in the market over the last couple of years. To help you with your evaluation process in a rapidly changing environment, we’ve devised a generic reference architecture and strategy framework that helps compare a tool’s implementations against the complete set of available functionality.

This framework provides a good starting point to evaluating your monitoring needs, and taking inventory of what your tools already provide, where you should augment functionality, and where you can consolidate.

Determine Your Monitoring Goals

The end goal for your monitoring is to consolidate tools, reduce the total cost of ownership, and automate the configuration via machine learning.

Monitoring Goals	Benefits
Consolidate monitoring tools when possible	Streamline and speed up troubleshooting
Choose tools with preset configuration	Shorten migration and setup time
Use open source and open license agents	Remain vendor-independent and extend technology
Adopt machine learning technology	Automate manual configuration tasks
Use hosted services when cost effective	Eliminate administration cost and distraction
Integrate public cloud monitoring	Manage cloud performance and cost
Integrate with instant messaging and paging services like PagerDuty or Slack	Specialized reporting for measuring capacity and analyzing cloud costs

Once you’ve set your goals, take a look at your monitoring tools.

Identify Monitoring Functionalities

The expectations from monitoring tools for DevOps have evolved in recent years and now include the collection of performance time-series data from open source agents, the persistence of the data in scalable time series databases, and the application of machine learning for alerting and reporting.

Here’s a generic set of functionalities that one or more of your tools might provide:

Dashboards: Preset dashboards that are easy to customize and share with peers.

Reports: Out of the box reports to help capacity planning and identify performance hotspots.

Diagnostics: Troubleshoot across your full application stack in the same user interface.

Notifications: Alerts that can be integrated with instant messaging and escalation services.

REST API: Ingest custom data, access any data, and update configuration via documented API.

Data Retention: Scalable Big Data storage for log data and for time series performance data.

Data Collectors: Open source/license agents for every middleware and programming language.

Machine Learning: Real-time anomaly detection and non-real time analysis of capacity cost.

These features are what might be offered – but where in your stack should each feature fall? Let’s explore that in the next section.

Monitor Your Full Application Stack

Specific functionality is available for each tier of your application stack. This list is not meant to be comprehensive but rather intended to capture the largest feature sets.

End-User

Synthetic Monitoring: Exercise a web page or API at regular intervals with a test script.

Page Load Performance: Measure the time that it takes for a standard web page to load.

Browser Performance: Measure execution latency inside of the browser.

Ajax Monitoring: Measure latency of each Ajax call for a single-page-load application.

Application

Performance Metrics: Measurement of count, errors and latency down to method calls.

SQL Query Analysis: Identify the slowest running queries.

Transaction Tracing: Associate inter-dependent application calls from ingress to egress.

Custom Metrics: Ingest, store, display and analyze any custom data.

Infrastructure

Availability Checks: Standard and custom checks for HTTP, TCP port, process, etc.

Metrics Collection: Open source/license agents to collect performance data from each tier.

Time Series Database: Scalable storage and retrieval of performance data.

Log Indexing: Aggregation, indexing and searching of text-based log data.

Public Cloud

Cloud Service Monitoring: Monitor services such as load balancer, messaging, storage, etc.

Capacity Utilization: Measure different dimensions such as memory, I/O, storage space.

Cost Analysis: Analysis, aggregation and recommendations of historic and projected costs.

Auto-Scaling Analytics: Simulation and real-time optimization of nodes in a cluster.

This list is by no means exhaustive, but should give you an idea of what your existing monitoring tools offer – and where there are holes in your monitoring strategy.

Evaluating Monitoring Tools for DevOps Workflows

While each environment is unique, the outlined framework here can be used as a starting point for any devops team’s evaluation process. By outlining goals that would generally apply to your monitoring strategy as a whole, you are able to start narrowing your focus during evaluation to, “Does it meet my needs and goals?” Understanding generic sets of monitoring functionalities that your tools should provide in aggregate allows you to deep dive in to feature functionality during the trial process. Finally, knowing the monitoring functionality associated with each monitoring domain (such as infrastructure or application monitoring) helps inform the best choice for a comprehensive or specific monitoring solution.

To learn more about how Metricly can help you take your monitoring to the next level with machine learning technology and integrations to open source agents, check out our feature page or sign up for a free trial.

Learn more

About Metricly

Metricly coaches users throughout their cloud journey to organize, plan, analyze, and optimize their public cloud resources.

Try Metricly Free

About the Author

Bob Farzami

Bob started his career as a hardware developer before taking on business development for early stage software companies. In recent years, his interest has been focused on the application of analytics to infrastructure management which inspired Metricly. He lives in NYC and spends any free time that he can find running in Central Park and spending time with his wife and daughter.