Operator lifecycle manager services

The Juju operator lifecycle manager (OLM) enables carrier-grade, highly available operators. It provides services that reduce the cost and complexity of operators for mission-critical apps and infra.

It is easy to write a simplistic operator. Launch a container, fetch a new container, and you’re done. It is challenging to write a highly available operator that handles distributed systems gremlins smoothly and scales efficiently with automated integration to other operators.

The Juju OLM provides enterprise-class operator lifecycle management and it offers services to operators which make it easier for them, in turn, to be enterprise-class components. Operators for critical services may well need to be highly available, and so they become distributed systems in their own right.

Event delivery and serialization

Any distributed system is in a state of constant simultaneous change. Applications and operators on multiple machines are working at the same time. A single conceptual administrative change might turn into multiple actual changes that cascade around the application graph, and these changes might easily arrive at a particular operator simultaneously.

For example, in a large deployment of Apache Cassandra, there will be many separate database instances clustered together. An administrative decision to upgrade the cluster turns into many separate upgrade events across the cluster. Any pieces of software integrated with those database instances need to know about these changes. The consequences may be more changes which they need to propagate to other applications.

So a single administrative command, “upgrade that database cluster”, turns into many separate upgrade events for each of the instances of the database, and further, many notifications of these separate upgrade events to all of the integrated application instances which are working together with that database cluster in the model. The entire model is alive, it is a dynamic system in which ripples of change move through the system until it settles.

The Juju OLM provides an event or message distribution mechanism that serves to deliver these notifications to operators. Abstracting the concepts of application configuration, status, units and integration provides a consistent model for message types and addressing across all applications.

Often a single operator will receive multiple events from different sources at the same time. To complicate matters further, there may well be multiple operators for different applications installed in the same place (machine or container). So it is likely that multiple events relate to a single machine or container at the same time.

A naive implementation of an operator is at risk of deadlock, when two separate processes are attempting to use a common resource at the same time. Operators interact with the underlying system and the application unit they control; if multiple operators in the same place try to use common resources they may well collide. In a simple machine operator example, if one operator tries to update the package list at the same time as another operator is installing a package, it will fail. In a container context, if one operator tries to upgrade a database at the same time as another tries to vacuum it, results may be unpredictable.

The Juju OLM carefully serializes event delivery, not just to a single operator, but to the set of operators which might collide in this way. The design simplifies the operator developer’s job. By reducing the risk of collisions between operators, we enable operator developers to think in more linear terms about what they want to do in any given situation, with fewer concerns about locking or other defensive measures to deal with simultaneous events being handled in operators of which they are unaware or not in control.

In much the same way that Golang rethinks concurrent execution in applications, the Juju OLM rethinks concurrent command and control of distributed applications, in a way which does not depend on underlying substrate semantics. The benefits of this approach accrue to operators everywhere, not just those on Kubernetes.

Leader election

In a distributed system where individual nodes may fail in unpredictable ways, it is a well known challenge to provide consistency of data and services. One of the tricky problems in this situation is leader election - out of several units, which one should we trust to answer questions for the whole group? There are several algorithms for this, all of which are considered subtle, and implementing leader election correctly is non-trivial.

Instead of asking each operator to implement leader election separately, the Juju operator lifecycle manager provides leader-election as a service to all operators. This greatly simplifies the problem of knowing which operator can determine the aggregate state of a scale-out service, or handle events which should only be considered once in the context of the entire application as opposed to being considered for every unit individually.

Persistent state

Every operator needs to keep track of state. For fine-grained control, that state may need to be specific to a particular unit of the application, such as a Kubernetes pod, or an instance of the app on one of several machines in a cluster.

Juju provides a consistent and reliable storage mechanism that avoids the use of storage volumes or elastic block devices unless the operator has specialist storage requirements. If Juju is itself running in a highly available configuration then this persistent state is highly available, so a cluster without reliable storage can still provide highly available services.

The Python Operator Framework abstracts this persistent state capability to provide a local data store to the operator which is entirely managed by the framework and the operator lifecycle manager.

Application and unit status

In an integrated system, applications may depend on each other for services. A change in the status of an application is material for any applications integrated with it. For example, if a database is made read-only for an upgrade, then applications associated with that database should be made aware.

The Juju OLM provides lifecycle management to individual operators. Any individual operator may be going through changes at any given time - an update or upgrade, a change in scale, or a change in configuration. Those changes may affect the availability or status of the application as a whole.

In a distributed or scale-out application there will be several units, or instances, of the application in a cluster. Changes to the application propagate to individual units, which may process them at slightly different times. It is important to remember that the status of the application as a whole is an aggregation of the statuses of the individual units that comprise it.

Juju tracks a status flag for each distinct unit in the application. An operator (speaking for a particular unit) is able to update the flag to describe its current status. Juju will aggregate these unit status flags into an overall status for the application, or the leader unit can set the application status directly.

Here is an example of a Hadoop model with applications coming up:

App                   Version  Status   Scale  Charm                   Store       Rev  OS      Notes
client                         waiting      1  hadoop-client           jujucharms   12  ubuntu
ganglia               3.6.0    active       1  ganglia                 jujucharms   12  ubuntu
ganglia-node          3.6.0    active       5  ganglia-node            jujucharms    7  ubuntu
namenode                       blocked      1  hadoop-namenode         jujucharms   46  ubuntu
plugin                         waiting      1  hadoop-plugin           jujucharms   46  ubuntu
resourcemanager                waiting      1  hadoop-resourcemanager  jujucharms   48  ubuntu
rsyslog                        unknown      1  rsyslog                 jujucharms    7  ubuntu
rsyslog-forwarder-ha           unknown      5  rsyslog-forwarder-ha    jujucharms    7  ubuntu
slave                          waiting      3  hadoop-slave            jujucharms   47  ubuntu

Juju treats integration as a first class concept, so it propagates such changes in application status to all the integrated application operators. This allows an application to become aware of changes in the status of nearby applications in order to react appropriately.

There are only these allowed application status values:

  • Active
  • Blocked
  • Waiting
  • Maintenance
  • Unknown

Maintenance implies some activity in the operator which might impact on service levels. Blocked and Waiting imply a dependency on another application which is not fulfilled.

In the case of Blocked, the application has determined that the current application graph is not sufficient for it to become Active under any circumstances. For example, an application that needs a database would go Blocked if it saw in the application graph that there was no database integrated with it. An application that is Blocked is signalling that an administrator will need to update the application graph to address the issue.

An application that is in the Waiting state has determined that its requirements can be fulfilled in the application graph, but that the necessary applications are themselves not yet in the Active state. In the prior example, if the application saw in the application graph that it did have a database to integrate with, but that database was not yet Active, then the application would put itself in the Waiting state. An application in Waiting state is expected to become Active once its own dependencies become Active.

Application messages

The operator pattern is a new level of sophistication in automation, greatly reducing the need for application-specific domain knowledge. Nevertheless, human administrators are responsible for the end result, and it is helpful to keep them informed about application status.

The Juju OLM allows operators to provide human-friendly messages that reflect the top level status or most actionable issue for the application. These messages are typically surfaced in the Juju CLI output and web dashboard, but they might also be consumed by third-party automation and surfaced through their own interfaces. Such application messages are generally designed to reassure, update or direct administrator responses to problems in application configuration, performance, or topology. Again, the goal is to avoid any direct human interaction with the underlying systems hosting applications, so messages are generally written to provide the most helpful guidance.

Only a single message per application is presented at any time, but updates to the application message are logged and may be reviewed as part of an audit or incident analysis.

Integration data

Integration requires that operators have the necessary information about one another to adapt their own configuration to reflect the integration. In a simple example, Wordpress needs to know where to connect to the database, along with things like a username, password, database name, and server certificate, and MySQL needs to know which IP addresses to allow to connect, with the matching details.

The Juju OLM enables operators to provide this information to other operators with which they are integrated in the application graph.

Consider the line of integration between the endpoints on the two applications. In graph theory it would be called an ‘edge’ between the ‘nodes’ which are the endpoints on each application. We call that line a ‘relation’, and we call the set of integration data from both sides of the application the ‘relation data’.

Each application can write to its own relation data, and read from the other application’s relation data. Changes in relation data are notified to the other application with an event. So the ‘integration protocol’ of such a line in the application graph is really the sequence of updates to relation data on both sides. Each update triggers an event on the other side, and they might respond by updating their own relation data, which returns an event to the original side. This pattern should be determined by the type of the endpoints, and be consistent for all operators that declare the same type of endpoint.

To restate, each endpoint on an application has a name, a type (the ‘interface’) and a direction (‘provides’ or ‘requires’). Integration lines can only be drawn between two endpoints which have the same type (interface) and opposite directions. It is expected that the sequence of updates to relation data for such integrations is always the same if the type, or ‘interface’ of the endpoints is the same. So it is always possible to integrate apps that offer the same interfaces for the same purposes. The abstraction of interfaces allows for substitution of operators that can provide the same service - such as Amazon RDS substituting for MySQL.

Integration data is published both at the application level and at the unit level. The leader unit for an application is the only operator that is allowed to update application-level relation data, any unit can update its own unit-level relation data.

Application and unit level relation data is just for that relation. For each integration, there will be a separate relation data. If an application has two relations to the same endpoint, then there will still be two separate sets of relation data, exposed to the respective counterparties on each of the relations.