Canonical Observability Stack

Welcome to the home to all things related to the Canonical Observability Stack (COS): the observability and monitoring stack for Juju, powered by Juju on Kubernetes and monitoring all the things :slight_smile:

COS Components

COS is made of the following Juju charmed operators:

The charmed operators that make up COS are available as the pre-configured COS Lite bundle.

Additionally, there are charmed operators designed to work with COS to provide additional functionality:

  • the Prometheus Scrape Config charm allows you to tweak the scrape jobs resulting from relating a charmed operator exposing the /metrics endpoint with the Prometheus charmed operator.

  • the Prometheus Scrape Target charmed operator allows you to represent in a Juju model /metrics endpoints provided by software not managed by Juju, e.g., LXD or MaaS, so that the Prometheus charmed operator can scrape metrics from them.

  • the COS Proxy charmed operator is a machine charm designed to “translate” the relations supported by the previous iteration, LMA, to COS.

  • the Karma charmed operator runs for you on Kubernetes the Karma UI, which enables you to visualize alerts from various Alertmanager clusters, e.g., when you were to deploy many different COS on Edge computing on in different production environments, and wanted to keep a centralized overview.

Design goals

There are several design goals we want to accomplish with COS:

  • Provide a set of high-quality observability charmed operators that are designed to work well on their own, and better together.

  • Make COS run on Kubernetes, with specific focus on MicroK8s, to achieve a very “appliance-like” user experience.

  • Ensure a consistent, cohesive experience: all alerts go through Alertmanager, Grafana can plot all telemetry, etc.

  • Provide a highly-integrated observability stack with the simplest possible deployment experience.

  • Take the toil out of setting up monitoring of your Juju workloads: monitoring your Juju applications should be as simple as establishing a couple of relations with the COS charms.

  • Showcase the declarative power of the Juju model: for example, if some can be modelled as relation, rather that a configuration, it should be. Also, relations must be semantically meaningful: by looking at juju status --relations, you should intuitively understand what comes out of two charms relating with one another.

COS Lite and HA

We foresee two “flavors” of COS:

  • Lite, currently being worked on, is designed for the Edge, and is capable of running reliably with limited computing resources (around 4 GB of overall memory, including MicroK8s and the Juju controller, limited CPU power)

  • HA, for “high-availability”, to be worked on in the future (and not before Lite becomes generally available) will be designed for large sites, with a high-availability setup through redundancy, a careful design of the architecture, and potentially a slightly different set of operators involved (for example, likely swapping out Prometheus for Cortex).

Status of COS

COS Lite is currently under heavy development. We are pretty confident in the quality and design of most of the integrations in and around Prometheus and Grafana.

In terms of use-cases, there is some more work to do for Alertmanager configurability, Loki integrations and some aspects of customizing monitoring logic for Juju administrators.

We are also working on the first iteration of monitoring charms that integrated with the previous iteration of the Prometheus charm, as well as charmed operators that allow you to represent in Juju workloads that are not run by Juju, so that COS can still monitor them.

Moreover, we need to harden COS Lite in terms of resilience and furnishing it with out-of-the-box self-monitoring capabilities (“Quis custodiet ipsos custodes?”, or “Who watches the watchers?” and all that).

COS HA will feel very, very similar to COS Lite in terms of end-user experience, but it will require significant changes under the hood. We foresee work on COS HA to start in the first half of 2022.

Why a new stack?

At Canonical, we have been referring to LMA as a system of machine charms currently in use to monitor Canonical and customer systems.

COS draws a lot of learning from years of operational experience with LMA, but it is also different enough that we felt we needed to make a distinction from the previous iteration.

Further reading

The “What is observability?” page provides an overview of the relation between observability and monitoring. An overview of the Canonical offerings related to observability and monitoring can be found on the Observability page. COS will join those pages when it goes GA :slight_smile:

We have been keeping a sort of “development blog” as a series of blog posts about model-driven observability. Keep in mind that not everything that COS can do is already showcased in one of those blog entries, and we will provide proper documentation for COS both here and on the pages of the charms on

Last updated a day ago.