O11y Guide: Keeping Your Cloud-Native Observability Options Open

Take look at architecture-level choices being made and share the open standards with the open-source landscape.

Being a developer from my early days in IT, it's been very interesting to explore the complexities of cloud-native o11y. Monitoring applications goes way beyond just writing and deploying code, especially in the cloud-native world. One thing remains the same: maintaining your organization's architecture always requires both a vigilant outlook and an understanding of available open standards.

In this fourth article, I'm going to look at architecture-level choices being made and share the open standards with the open-source landscape.

As any architect will tell you, open standards are always preferred when considering adding to your existing infrastructure. Does the candidate component under consideration adhere to some defined open standard? Does it at least conform to using open standards?

The Open Choice

When an open standard exists, and in some early cases open consensus where everyone centers around a technology or protocol, it gives an architect peace of mind. You often have choices as to the final component you want to use, as long as it's based on a standard you feel you can swap it out in the future.

An example of one such standard is the Open Container Initiative (OCI) for container tooling in a cloud-native environment. When ensuring your organization's architecture uses such a standard, all components and systems interacting with your containers become replaceable by any future choices you might make as long as they follow the same standard. This creates choice and choice is a good thing!

Open O11y Projects

In cloud-native observability (o11y), there are many open-source projects to help you tackle the initial tasks of o11y. Many are closely associated with the Cloud Native Computing Foundation (CNCF) as projects and promote open standards where possible. Some of them have even become an unofficial open standard by their default mass usage in the o11y domain.

Let's explore a few of the most commonly encountered cloud-native o11y projects.

Prometheus

Prometheus is a graduated project under the CNCF umbrella, which is defined as "...considered stable and used in production." It's listed as a monitoring system and time series database, but the project site itself advertises that it is used to power your metrics and alerting with the leading open-source monitoring solution.

What Does Prometheus Do for You?

It provides a flexible data model that allows you to identify time series data, which is a sequence of data points indexed in time order, by assigning a metric name. Time series are stored in memory and on a local disk in an efficient format. Scaling is done by functional sharing, splitting data across the storage, and federation.

Leveraging the metrics data is done with a very powerful query language called PromQL which we will cover in the next section. Alerts for your systems are set up using this query language and a provided alert manager for notification.

There are multiple modes provided for visualizing the data collected, from a built-in expression browser to integration with Grafana dashboards and a console templating language. There are also many client libraries available to help you easily instrument existing services in your architecture. If you want to import existing third-party data into Prometheus, there are many integrations available for you to leverage.

Each server runs independently, making it an easy starting point and reliable out of the box with only local storage to get started. It's written in the Go language and all binaries are statically linked for easy deployment and performance.

There is a Prometheus organization with all the code bases for their projects.

PromQL

This is officially a part of the Prometheus project, but well worth mentioning on its own as an unofficial standard used widely to query ingested time series data. As stated in the Prometheus documentation:

Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API.

There are various ways to learn about how to write queries in PromQL, but a fun little project called PromLens provides an online demo that helps you accelerate your use, understanding, and troubleshooting of PromQL. You can also easily spin up a Docker image with the tool setup for exploration on your own local machine. Visually building queries of your time series data is a big boost to your productivity.

There is a good background story on the origins of PromQL in an interview with the creator Julius Volz.

OpenTelemetry

Another up-and-coming project is found in the incubating section of the CNCF site: it's called OpenTelemetry (OTEL). This is a very fast-growing project with a focus on "high-quality, ubiquitous, and portable telemetry to enable effective observability."

This project helps you to generate telemetry data from your applications and services, then forward that in what is now considered a standard form, called the OTEL Protocol, to a variety of monitoring tools. To generate the telemetry data, you have to first instrument your code, but OTEL makes this very easy with automatic instrumentation through its integration with many existing languages.

You can find the community and its code in the Open-Telemetry organization.

Jaeger

Before OTEL was on the scene, the CNCF project Jaeger provided a distributed tracing platform that targeted the cloud-native microservice industry.

Jaeger is open-source, end-to-end distributed tracing. Monitor and troubleshoot transactions in complex distributed systems.

While this project is fully matured, it's targeted an older protocol and has just recently retired its classic client libraries while advising users to migrate to their native support for the OTEL Protocol standard.

Start Your Monitoring Engines

This concludes the short overview of the open source projects and (un)official standards that you will encounter when getting started with cloud-native o11y. This brings me to the first step in getting hands-on where we want to start exploring the open source projects, with the understanding that we are starting without issues of having to scale yet.

Next up, I plan to take a look at how traditional or older monitoring for monolithic solutions and infrastructure integrates into cloud-native o11y.

We Provide consulting, implementation, and management services on DevOps, DevSecOps, Cloud, Automated Ops, Microservices, Infrastructure, and Security

Services offered by us: https://www.zippyops.com/services

Our Products: https://www.zippyops.com/products

Our Solutions: https://www.zippyops.com/solutions

For Demo, videos check out YouTube Playlist: https://www.youtube.com/watch?v=4FYvPooN_Tg&list=PLCJ3JpanNyCfXlHahZhYgJH9-rV6ouPro

If this seems interesting, please email us at [email protected] for a call.

Relevant Blogs:

Is Multi-Cloud Infrastructure the Future of Enterprises?

The Top Elastic Beanstalk Alternatives for Startups in 2022

Architectural Patterns for Microservices With Kubernetes

Searchable Pod Logs on Kubernetes in Minutes