Universal Operator Pattern

Generalise the Kubernetes operator pattern to drive traditional applications on bare metal, virtualisation and cloud instances

The Juju Operator Lifecycle Manager (OLM) brings the Kubernetes operator pattern to legacy estate on Linux and Windows, enabling operators to drive apps in traditional machine environments. The operator pattern has proven benefits in reuse, security, compliance and efficiency, which the Juju OLM extends to legacy applications as well.

What is the Kubernetes operator pattern?

The operator pattern is a form of automation that is gaining awareness in the Kubernetes domain. A special container, the operator, is allowed to drive other containers. Shifting the burden of lifecycle management from configuration files and manual steps to a separate container simplifies some container operations. In Kubernetes, traditional configuration management does not work, so the operator pattern has grown in popularity as a way to address lifecycle and configuration management on K8s.

The idea itself, using one piece of software to drive other pieces of software, is simple enough. Telecommunication equipment providers have sold ‘element managers’ for decades. Control theory has long provided a framework for the analysis of such automation. The operator pattern has simply been popularized now because it has not been possible to forward port legacy automation to containers.

What is a universal operator lifecycle manager?

An operator lifecycle manager (OLM) provides services to operators. For example, it can deploy operators, upgrade them, deliver configuration updates and other services.

Many operator lifecycle management implementations are tied to specific substrates. For example, lifecycle management may be limited to Kubernetes, or a particular PAAS.

The Juju OLM is designed to be universal and substrate-neutral. It works on Kubernetes as well as traditional machines. A universal operator lifecycle manager supports operators for legacy applications that run on servers and virtualisation platforms, as well as first-generation public cloud estate, and the newer containerized applications too.

Traditional or legacy applications on both Windows and Linux can be driven with operators, and integrated with newer applications on Kubernetes. The ability to evolve an application from the machine world to the Kubernetes containerized world presents a smooth path to containerization of legacy estate, without “flag day” cutovers.

Juju provides OLM on bare metal, VMware, OpenStack, and public cloud instances, as well as K8s clusters.

Bringing operators to infrastructure-as-code

The Juju OLM can be used to bring the operator pattern to infrastructure-as-code. Unlike configuration management systems such as Chef, Puppet, Ansible and Salt, a full operator lifecycle manager with bare metal support enables large-scale physical infrastructure to be operator-driven, with significant benefits in operating costs, security and agility.

Complex bare metal solutions like big data lakes, OpenStack, and software-defined storage arrays become easier to own and operate with this automation. The costs of integration, automation and security are largely shared in open source communities like the Open Operator Collection. The agility of bare metal infrastructure operators is transformational for those responsible for significant physical estates.

Bringing operators to legacy applications

The legacy application estate is an enormous cost burden for large organisations. Much of the annual IT budget is consumed with maintenance, largely driven by old operations and integration code accumulated over decades. Companies have explored newer technologies such as PAAS to eliminate this maintenance burden, but have never successfully managed to migrate any meaningful portion of their legacy estate to the newer software architectures mandated by OpenShift or similar PAAS offerings.

Universal operators represent a compelling new approach to legacy estate application management that do not involve replatforming the workload onto Kubernetes, nor a new set of constraints such as PAAS. Instead, we develop operators for those legacy workloads, which continue to instantiate them, update, upgrade and integrate them in their traditional machine environment. A machine-based operator is driving the workload exactly the way the workload was always driven, except with an operator rather than configuration management system, rather than rearchitecting it for new substrates such as containers.

Operator services

A typical operator lifecycle manager provides services that simplify the development and use of operators in production: deployment, upgrade, configuration and removal.

The enterprise-focused Juju OLM adds services for operator scale, leader election and persistent storage, reflecting its use for sophisticated high-availability and mission-critical applications and infrastructure.

Juju also provides integration services — enabling the declarative integration of operators from different vendors, both within the same model on the same substrate, and across models, potentially across clouds. Integrating Kubernetes and legacy applications becomes straightforward thanks to cross-model relations and the ability to drive all classes of software with the Juju OLM.

Kubernetes operators

The Juju OLM implements the Kubernetes operator pattern for containerised applications.

Pod placement of operators

Kubernetes operators can run in their own pod, independent of the workload. This is the common framing of the Kubernetes operator pattern, with the operator as an entirely distinct container driving workload containers from a distance via the Kubernetes API.

With pod placement, there is a single operator for all the ‘units’ of the application. This is easier to understand but presents some challenges for complex distributed systems, especially those ported to Kubernetes rather than designed from the ground up for cloud-native operations. It is not possible for an operator in its own pod to interact directly with workload processes, since the operator pod and the workload pod may in fact be running on different machines in the cluster.

Operator pods represent a first-generation approach to the operator pattern on Kubernetes. They work for certain workloads, and are a good way to start any new operator development, but it will present limitations if the goal is to achieve complete control of the application at a low level in the operator.

Sidecar placement of operators

With the Juju OLM, operators can have much more fine-grained control of their workload, by placing the operator code in a sidecar of the workload pods.

Sidecar operators can be thought of as running in ‘an adjacent chroot’ with the workload processes. They are guaranteed to be on the same machine as the workload container, and can share volumes directly with a directory mounted into both the application container and the sidecar container. Sidecar placement enables richer interaction between the operator and the workload. For example, the operator might monitor workload status files, logs, or transaction records directly.

Inter-process communication between the operator and the workload becomes possible with sidecar operators, too, although the fact that the operator is in a separate container requires some adjusting. Since the operator is ‘running in a chroot’ on the same machine as the application; they do share a kernel, but file or inter-process communication between workload and operator require careful design.

Operators for traditional machine-based applications

The Juju OLM supports traditional and legacy applications on Ubuntu, CentOS, RHEL and Windows.

Metadata in the charm — the operator package — declares which versions of the OS are supported by that operator, for example, CentOS 6 and 7, or Ubuntu 18.04 LTS, or Windows 10. When an application is placed on a machine in the Juju OLM model, the agent on that machine will fetch the correct operator from the controller and install it on that machine.

Lifecycle events for this unit of the application are delivered to the operator by the agent. Initial installation and configuration are driven by lifecycle events, followed by integration with related applications in the model. Upgrades are also driven through operator lifecycle events.

Operators for Linux applications

Linux machines have the OLM agent in /var/lib/juju/tools/ and charms in /var/lib/juju/agents/<unit-app-number>/charm/ . There can be multiple charms because multiple applications can have units co-located on the same machine, just as you can install as many apps on a machine as you want with your configuration management system. Subordinate application units will also follow their primary applications to every machine.

And example directory tree for a machine in a Hadoop model looks like this:

            ├── agents
                ├── machine-4
                ├── unit-client-0      <--- this is the Hadoop client
                │   ├── charm          <--- the operator code is here
                │   └── state          <--- persistent state for this operator
                ├── unit-ganglia-0     <--- a second operator
                │   ├── charm
                │   └── state
                ├── unit-plugin-0       <--- a third operator
                │   ├── charm
                │   ├── resources       <--- resources for this operator
                │   └── state
                └── unit-rsyslog-0
                    ├── charm
                    └── state

The different applications are clearly visible. For each unit located on the machine, there is a directory (“unit-app-number”) containing the charm itself, with the operator code, and any persistent state that the operator uses.

Note the resources directory for the unit of the “plugin“ application, which is where the OLM will deliver any requested resources for that unit.

Operating system upgrades

In long-lived machine estates it is eventually necessary to upgrade the operating system underneath an application. The life of an application is extended by carrying it forward to newer versions of the OS.

OS upgrades are handled explicitly by the Juju OLM. Events signal the imminent OS upgrade to the operator which prepares for any disruption during OS upgrade and reboots. Then, operator lifecycle event delivery is paused while the OS upgrade takes place. Finally, the operator will reconfigure the application for the new OS version, before the application unit is reinstated and receives operator lifecycle events.

Operators for SAAS

Many clouds offer on-demand SAAS for common application services. MySQL, Kafka, AMQP and other services can be activated, configured and consumed on these clouds without ever installing any software. API calls on the cloud instantiate, configure, or terminate the SAAS.

Applications need to integrate with this SAAS. If we are deploying a content management system that will use Amazon RDS, we need to configure RDS to allow our content management system to connect to it, and we need to configure the content management system to connect to RDS.

The Juju OLM uses a cloud credential to provision new machines for scale-out applications, or additional applications in the model. That same credential can be used to activate SAAS. An operator can be designated as trusted, at which point it will be given the credential and can make API calls using it.

So, an Amazon RDS operator can be trusted with an AWS credential. Deploying the RDS operator into a model on AWS provides an ‘RDS’ application in the graph. Integrating with RDS is exactly the same as integrating with MySQL, in fact, the other operators in the system have no idea that they are talking to a stand-in, they think they are talking to MySQL.

We call such operators ‘proxy charms’. They do not proxy the network traffic of the connection. They simply act as placeholders in the graph, and exchange information with the other operators in the graph. Behind the scenes, instead of launching containers or installing applications like a normal operator, they make the API calls that instantiate, configure or terminate the SAAS.

Operators for physical appliances

Many organisations have physical appliances which can be configured and monitored remotely. This is particularly true of the telco sector, where network-managed appliances were standard equipment for decades.

Any operator can interact with networked appliances as long as it has the address and credential needed to do so. So it is possible to create an operator which represents the network appliance in the application graph, offers integration points for each of the ways the appliance can be integrated, and enables other applications to integrate with the appliance as if it were pure software.

The appliance may be static, providing a single service, or it may be able to provide services to multiple applications. It is important to deal with capacity limitations in a physical appliance, because unlike software-defined services it is not possible to deploy more of the appliance on demand.

Operators for “uncharmed” software

In some cases organisations have software that is deployed, cannot be redeployed, but would be useful to integrate using charms. In this case, a proxy charm can be created which represents the pre-existing software. The charm really only serves to provide access details and credentials to other applications related to it, but it serves nonetheless as a mechanism for integration of charmed components with the static software element.