What is an Operator Lifecycle Manager?

Juju manages the install, update, configuration and integration of operators on Kubernetes or traditional machines

An operator is software which drives an application. The operator lifecycle manager (OLM) is a service which installs operators, updates them, configures them, and in the case of the Juju OLM, helps it to integrate with other operators.

Model-driven operations

One key differentiator of the Juju OLM is the way it organises a large number of operators, potentially across many different clouds or clusters. Operators are in a sense extra pieces of software to manage, so having a structure and catalog of that software immediately provides a level of enterprise awareness and control of the operator estate.

The Juju OLM provides model-driven operations. Sets of operators are organised into models, each model having the same set of access permissions and living on the same Kubernetes cluster or machine compute substrate such as a public cloud in the case of traditional applications. The ability to organise multi-cloud operations in these models, and to manage models globally, across many regions of many clouds and private infrastructure, through a single scale-out OLM, is a strength of Juju.

Model-driven operations simplify integration between diverse operators, and reduce repetition and redundancy because they describe both sides of every integration with a single element. They also capture business intention, the decision to allocate resources to particular sets of applications. Business decisions become explicit with model-driven operations.

The concept of ‘desired state’ is popular in Kubernetes. Model-driven operations elevate the notion of desired state, allowing administrators to describe their integration and resource allocation policies, and using shared operations code to achieve that intention rather than application-specific expertise.

Integration

While the focus of an operator lifecycle manager is usually the simple life cycle of a particular application, the model-driven nature of the Juju OLM provides an opportunity to manage the application graph in its entirety, reflecting patterns of integration between multiple applications.

With the Juju OLM, applications can be integrated declaratively in the model. A line in the graph represents a line of integration, connecting specific functions of the applications. Operators ensure that their application configuration reflects the desired state of integration in the model.

Integration can be made, and undone, dynamically. The Juju OLM delivers updates in the application graph to operators as a stream of events, and operators can use this mechanism to inform one another of changes in their status which might affect related applications in the graph.

Storage for operators

Many applications require storage to be allocated and managed carefully. The operator lifecycle manager coordinates this allocation based on the model, matching resources allocated to the model with application-specific requirements and configuration.

The Juju OLM is able to manage storage for applications in bare metal, virtualised, cloud instances, and Kubernetes environments. Storage categories are described in the model, and applications can have specific classes of storage designated for particular functions. For example, it is possible with Juju to deploy a database with persistent data volumes on high-speed high-availability disks and application logging to slower, less reliable media.

The OLM adapts to and coordinates with the substrate. In the case of Kubernetes it uses CSI capabilities. In the case of clouds, VMware and bare metal it uses the relevant cloud-specific APIs and storage architecture to enable administrators to allocate storage to the model as a whole and specific application functions in particular.

The ability to provide consistent model-driven storage for both legacy applications and Kubernetes applications greatly simplifies administration across the enterprise.

Networking for operators

Applications provide services to each other and to end-users on the network. Security practices usually dictate that network services are separated on different networks for different applications, and in some cases, for different functions of the same application.

In a typical enterprise environment there will be multiple segregated networks, and strict policies about which services should be accessible or provided to which network. It is important to be able to place particular application services on specific networks.

The OLM is in a position to define and enforce network security policies.

The Juju OLM models networks in traditional machine environments. Application endpoints can be bound to particular networks in the model, enabling the operators to comply with enterprise network policy in any particular deployment. Networks can be provided at both L2 and L3, enabling low-level telecommunications services to be provided through operators just as cleanly as typical IT services.

Installing operators

Since an operator is just a package, it can be provided locally. For example, you may be developing the operator yourself, and iterating locally, so you can provide the operator to the OLM manually. You may also have an operator with confidential code or data, and prefer not to store that in a public service.

It is much more convenient, however, to share operators in a trusted, public repository. We consume software from such repositories all the time. In Linux we have popular package distribution systems like APT, in the MacOS world we have brew, and in Windows we have Chocolatey and Winget. For Python code it is common to use PyPI, and Ruby has gems.

An operator is software, and it is useful to have a package manager to install, update and remove it.

You can think of the operator lifecycle manager as the ‘package manager’ for operators. The OLM handles finding, fetching, updating and removing operators from the system. The Juju OLM can retrieve operators from the Open Operator Collection at charmhub.io, and will also look for updates there.

If the operator is being used on Kubernetes then it will be run in a container in its own pod, or as a sidecar of its workload application container. If it is being run on a machine (bare metal or VMware or cloud) then it is installed in a directory of that machine.

The operator lifecycle manager will then invoke the operator to handle events associated with the lifecycle of the operator.

Updating operators

Since operators take responsibility for mission-critical systems, they are themselves critical systems, and need to be kept up to date to ensure that bugs and security issues are fully addressed.

The OLM will check periodically to see if there is an update for installed operators. Updates are not applied automatically, but can be applied on demand by the administrator.

Note that updating the operator does not necessarily update the application. Think of updating the operator as updating your configuration management scripts; they don’t do anything until you run them. In general, an update of the underlying application should be an explicit action by the administrator and not tightly coupled to an operator update.

Updating an operator might well, however, update the configuration of the application. For example, imagine a security problem was discovered in an operator such that it omitted a firewall rule for its application. An update to the operator should immediately bring the system configuration into line with expectations, even if the application itself is not updated.

With the Juju OLM it is possible to publish operators in channels, and consume them from channels, which allows particular models to use riskier or faster-moving versions of the operator. It is good practice to have some deployments using pre-stable versions of operators in order to assess incoming changes.

Removing operators

The process of decoupling integrated software for removal must be done carefully. In development and test environments it is straightforward to tear down and rebuild an entire environment, but in a long-lived production environment it will occasionally be necessary to retire one component while adding new components to the running services.

The OLM can orchestrate the clean removal of operators from a system. In Juju this is achieved by removing the application from the model, which triggers a series of events in application units and in related applications to ensure an orderly shutdown and removal of the software.

Policy, settings and the command-line options will govern the consequences for storage associated with the removed application — it may be archived or it may be destroyed as well.

Operator configuration

Another benefit of an OLM is consistent configuration of operators from a wide range of vendors and communities.

The Juju OLM offers configuration schemas and structured YAML config, defined by the operator publisher. The OLM is able to ensure that configuration changes conform to requirements, acting as a consistent guardrail against misconfiguration. Of course, the operator itself should sanity-check configuration to prevent superficially correct config from creating a problem in the service.

The OLM also offers mechanisms to roll configuration changes out across a distributed system. For large, scale-out environments, it is valuable to avoid shotgun changes which cause ripples to turn into waves. Where a single change will expand across many systems, it is possible to steer the change deliberately and manage the impact precisely at all times.

Scaling

Scale is a critical ingredient in enterprise software, both for resilience and for capacity. Coordinating multiple integrated applications at dynamic scale is challenging. An OLM that provides a desired-state view of scale, as well as tracking current scale, enables more sophisticated operator design and implementation.

Dynamic scale requires careful coordination between applications that integrate at a low level. For example, where the scale of one application should drive the scale of a connected application, or where changes in scale in an application require configuration changes in adjacent applications, a central model of application scale removes the need for custom operator-to-operator communications channels associated with scale adjustments.

The Juju OLM explicitly models application scale, and provides mechanisms for integrated applications to be aware of changes in their own and related application scale.

Everyday operations actions

Any long-lived workload requires maintenance or administrative activities. Backup, restore, reset, checkpoint, benchmarking, and application-specific operations occur both on a schedule and as interrupts for the administrator.

Well-crafted operators include these activities as pre-packaged code, just as they include code for lifecycle management.

In the Juju OLM, everyday activities are called Actions and they are declared in operator metadata as parameterised methods which can be invoked through the Juju CLI or API. Actions run asynchronously by default, and can produce output files as well as output metadata.

A well-designed and implemented operator ensures that administrators never need actual access to the containers or machines on which applications are deployed, because the full range of actions they may need to perform are expressed in this way.