Writing a Kubernetes (v1) charm

Scope: The scope of this post is to cover the format and content of the YAML file generated by the charm and passed to Juju via the pod-spec-set hook command. i.e. what k8s resources can a charm ask to be provisioned by the controller. Other aspects of charming, like using the operator framework, are covered elsewhere.

Juju charms v1: The documentation below applies to v1 charms where the charm operator runs in a separate pod that of the workload being managed by the charm. From Juju v.2.9 onwards (a limited preview is supported by 2.9), we are beginning to support a new deployment mode where the Juju agent runs in a sidecar container in the same pod as the workload.

Introduction

Kubernetes charms are similar to traditional cloud charms. The same model is used. An application has units, there’s a leader unit, and each relation has a data bag on the application which is writable by the leader unit. The same hooks are invoked.

The only mandatory task for the charm is to tell Juju some key pod / container configuration. Kubernetes charms are recommended movin forward to be written using the operator framework.

Charms can also be written using the to be reactive framework, in which case there’s a base layer that is similar to the base layer used for traditional charms:

https://github.com/juju-solutions/layer-caas-base

The basic flow for how a reactive charm operates is:

  1. charm calls config-get to retrieve the charm settings from Juju

  2. charm translates settings to create pod configuration

  3. charm calls pod-spec-set to tell Juju how to create pods/units

  4. charm can use status-set or juju-log or any other hook command the same as for traditional charms

  5. charm can implement hooks the same as for traditional charms

There’s no need for the charm to apt install anything - the operator docker image has all the necessary reactive and charm helper libraries baked in.

The charm can call pod-spec-set at any time and Juju will update any running pods with the new pod spec. This may be done in response to the config-changed hook due to the user changing charm settings, or when relations are joined etc. Juju will check for actual changes before restarting pods so the call is idempotent.

Note: the pod spec applies for the application as a whole. All pods are homogeneous.

Kubernetes charm store

A number of Kubernetes charms already written are available on the charm store.

Container images

Charms specify that they need a container image by including a resource definition.

resources:
  mysql_image:
    type: oci-image
    description: Image used for mysql pod.

oci-image is a new type of charm resource (we already have file).

The image is attached to a charm and hosted by the charm store’s inbuilt docker repo. Standard Juju resource semantics apply. A charm is released (published) as a tuple of (charm revision, resource version). This allows the charm and associated image to be published as a known working configuration.

Example workflow

To build and push a charm to the charm store, ensure you have the charm snap installed.
After hacking on the charm and running charm build to fully generate it, you push, attach, release:

cd <build dir>
charm push . mariadb-k8s
docker pull mariadb
charm attach cs:~me/mariadb-k8s-8 mysql_image=mariadb
charm release cs:~me/mariadb-k8s-9 --resource mysql_image-0

See
charm help push
charm help attach

Charms in more detail

Use the information below in addition to looking at the charms already written to see how this all hangs together.

To illustrate how a charm tells Juju how to configure a unit’s pod, here’s the template YAML snippet used by the Kubernetes mariadb charm. Note the placeholders which are filled in from the charm config obtained via config-get.

version: 3
containers:
  - name: mariadb
    imagePullPolicy: Always
    ports:
      - containerPort: %(port)s
        protocol: TCP
    envConfig:
      MYSQL_ROOT_PASSWORD: %(rootpassword)
      MYSQL_USER: %(user)s
      MYSQL_PASSWORD: %(password)s
      MYSQL_DATABASE: %(database)s
    volumeConfig:
      - name: configurations
        mountPath: /etc/mysql/conf.d
        files:
          - path: custom_mysql.cnf
            content: |
              [mysqld]
              skip-host-cache
              skip-name-resolve
              query_cache_limit = 1M
              query_cache_size = %(query-cache-size)s
              query_cache_type = %(query-cache-type)s

The charm simply sends this YAML snippet to Juju using the pod_spec_set() charm helper.
Here’s a code snippet from the mariadb charm.

from charms.reactive import when, when_not
from charms.reactive.flags import set_flag, get_state, clear_flag
from charmhelpers.core.hookenv import (
    log,
    metadata,
    status_set,
    config,
    network_get,
    relation_id,
)

from charms import layer

@when_not('layer.docker-resource.mysql_image.fetched')
def fetch_image():
    layer.docker_resource.fetch('mysql_image')

@when('mysql.configured')
def mariadb_active():
    status_set('active', '')

@when('layer.docker-resource.mysql_image.available')
@when_not('mysql.configured')
def config_mariadb():
    status_set('maintenance', 'Configuring mysql container')

    spec = make_pod_spec()
    log('set pod spec:\n{}'.format(spec))
    layer.caas_base.pod_spec_set(spec)

    set_flag('mysql.configured')
....

Important Difference With Cloud Charms

Charms such as databases which have a provides endpoint often need to set in relation data the IP address to which related charms can connect. The IP address is obtained using network-get, often something like this:

@when('mysql.configured')
@when('server.database.requested')
def provide_database(mysql):
    info = network_get('server', relation_id())
    log('network info {0}'.format(info))
    host = info.get('ingress-addresses', [""])[0]
    if not host:
        log("no service address yet")
        return

    for request, application in mysql.database_requests().items():
        database_name = get_state('database')
        user = get_state('user')
        password = get_state('password')

        mysql.provide_database(
            request_id=request,
            host=host,
            port=3306,
            database_name=database_name,
            user=user,
            password=password,
        )
        clear_flag('server.database.requested')

Workload Status

Currently, there’s no well defined way for a Kubernetes charm to query the status of the workload it is managing. So although the charm can reasonably set status as say blocked when it’s waiting for a required relation to be created, or maintenance when the pod spec is being set up, there’s no real way for the charm to know when to set active.

Juju helps solve this problem by looking at the pod status and uses that in conjunction with the status reported by the charm to determine what to display to the user. Workload status values of waiting, blocked, maintenance, or any error conditions, are always reported directly. However, if the charm sets status as active, this is not shown as such until the pod is reported as Running. So all the charm has to do is set status as active when all of its initial setup is complete and the pod spec has been sent to Juju, and Juju will “Do The Right Thing” from then on. Both the gitlab and mariadb sample charms illustrate how workload status can set correctly set.

A future enhancement will be to allow the charm to directly query the workload status and the above workaround will become unnecessary.

Workload pod in more detail

It’s possible to specify Kubernetes specific pod configuration in the pod spec YAML created by the charm. The supported container attributes are:

  • livenessProbe
  • readinessProbe
  • imagePullPolicy

The syntax used is standard k8s pod spec syntax.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

It’s also possible to set pod attributes as well:

  • activeDeadlineSeconds
  • dnsPolicy
  • restartPolicy
  • terminationGracePeriodSeconds
  • automountServiceAccountToken
  • securityContext
  • priorityClassName
  • priority
  • readinessGates

Again, standard k8s syntax is used for the above attributes.

Pod annoations and labels can be set.

You can also specify the command to run when starting the container.

command: ["sh", "-c"]
args: ["doIt", "--debug"]
workingDir: "/path/to/here"

These pod specific attributes are defined in YAML blocks as shown below:

version: 3
containers:
  - name: gitlab
    imagePullPolicy: Always
    ports:
    - containerPort: 80
      protocol: TCP
    command:
      - sh
      - -c
      - |
        set -ex
        echo "do some stuff here for gitlab container"
    args: ["doIt", "--debug"]
    workingDir: "/path/to/here"
    kubernetes:
      securityContext:
        runAsNonRoot: true
        privileged: true
      livenessProbe:
        initialDelaySeconds: 10
        httpGet:
          path: /ping
          port: 8080
      readinessProbe:
        initialDelaySeconds: 10
        httpGet:
          path: /pingReady
          port: www
      startupProbe:
        httpGet:
          path: /healthz
          port: liveness-port
        failureThreshold: 30
        periodSeconds: 10
kubernetesResources:
  pod:
    annotations:
      foo: baz
    labels:
      foo: bax
    activeDeadlineSeconds: 10
    restartPolicy: OnFailure
    terminationGracePeriodSeconds: 20
    automountServiceAccountToken: true
    hostNetwork: true
    hostPID: true
    dnsPolicy: ClusterFirstWithHostNet
    securityContext:
      runAsNonRoot: true
      fsGroup: 14
    priorityClassName: top
    priority: 30
    readinessGates:
      - conditionType: PodScheduled
   

Workload permissions and capabilities

We allow a set of rules to be associated with the application to confer capabilities to the workload; a set of rules constitutes a role. If a role is required for an application, Juju will create a service account for the application with the same name as the application. Juju takes care of the internal k8s details like creating a role binding etc automatically.

Some applications may require that cluster scoped roles are used. Used global: true if cluster scoped rules are required.

serviceAccounts:
  automountServiceAccountToken: true
  # roles are usually scoped to the model namespace, but
  # some workloads like istio require binding to cluster wide roles
  # use global = true for cluster scoped roles
  global: true
  #
  # these rules are based directly on role rules supported by k8s
  rules:
    - apiGroups: [""] # "" indicates the core API group
      resources: ["pods"]
      verbs: ["get", "watch", "list"]
    - nonResourceURLs: ["*"]
      verbs: ["*"]

Config maps

These are essentially named databags.

configMaps:
  mydata:
    foo: bar
    hello: world

Service scale policy, update strategy and annotations

As well as setting annotations, it’s possible to set the scale policy for services, i.e. how should the workload pods be started, serially one at a time, or in parallel. The default is parallel. Also configured here is update strategy, i.e. how pod updates should be managed.
For reference:

service:
  scalePolicy: serial
  annotations:
    foo: bar
  updateStrategy:
    type: Recreate
    rollingUpdate:
      maxUnavailable: 10%
      maxSurge: 25%

Mounting volumes into workloads

As well as creating a directory tree with simple text files (covered earlier), it’s also possible to configure volumes backed by:

  • config map
  • secret
  • host path
  • empty dir

With secret and config map, these must be defined in the elsewhere YAML handed to Juju - you can’t reference existing resources not created by the charm. If you leave out the files block, the entire secret or config map will be mounted. path is optional - the file will be created with the same name as key if not specified.

The path for each file is created relative to the overall mount point.

Here’s an example of what’s possible:

version: 3
...
    volumeConfig:
      # This is what was covered earlier (simple text files)
      - name: configurations
        mountPath: /etc/mysql/conf.d
        files:
          - path: custom_mysql.cnf
            content: |
              [mysqld]
              skip-host-cache
              skip-name-resolve
              query_cache_limit = 1M
              query_cache_size = %(query-cache-size)s
              query_cache_type = %(query-cache-type)s
      # Additional volume types follow...
      # host path
      - name: myhostpath1
        mountPath: /var/log1
        hostPath:
          path: /var/log
          type: Directory
      - name: myhostpath2
        mountPath: /var/log2
        hostPath:
          path: /var/log
          # see https://kubernetes.io/docs/concepts/storage/volumes/#hostpath for other types
          type: Directory
      # empty dir
      - name: cache-volume
        mountPath: /empty-dir
        emptyDir:
          medium: Memory # defaults to disk
      - name: cache-volume222
        mountPath: /empty-dir222
        emptyDir:
          medium: Memory
      - name: cache-volume
        mountPath: /empty-dir1
        emptyDir:
          medium: Memory
      # secret
      - name: another-build-robot-secret
        mountPath: /opt/another-build-robot-secret
        secret:
          name: another-build-robot-secret
          defaultMode: 511
          files:
            - key: username
              path: my-group/username
              mode: 511
            - key: password
              path: my-group/password
              mode: 511
        # config map
        configMap:
          name: log-config
          defaultMode: 511
          files:
            - key: log_level
              path: log_level
              mode: 511

The story so far…

Extending the sample YAML to add in the above features, we get the example YAML below,

version: 3
containers:
  - name: gitlab
    imagePullPolicy: Always
    ports:
    - containerPort: 80
      protocol: TCP
    command:
      - sh
      - -c
      - |
        set -ex
        echo "do some stuff here for gitlab container"
    args: ["doIt", "--debug"]
    workingDir: "/path/to/here"
    envConfig:
      MYSQL_ROOT_PASSWORD: %(rootpassword)
      MYSQL_USER: %(user)s
      MYSQL_PASSWORD: %(password)s
      MYSQL_DATABASE: %(database)s
    volumeConfig:
      - name: configurations
        mountPath: /etc/mysql/conf.d
        files:
          - path: custom_mysql.cnf
            content: |
              [mysqld]
              skip-host-cache
              skip-name-resolve
              query_cache_limit = 1M
              query_cache_size = %(query-cache-size)s
              query_cache_type = %(query-cache-type)s
      # host path
      - name: myhostpath1
        mountPath: /var/log1
        hostPath:
          path: /var/log
          type: Directory
      - name: myhostpath2
        mountPath: /var/log2
        hostPath:
          path: /var/log
          # see https://kubernetes.io/docs/concepts/storage/volumes/#hostpath for other types
          type: Directory
      # empty dir
      - name: cache-volume
        mountPath: /empty-dir
        emptyDir:
          medium: Memory # defaults to disk
      - name: cache-volume222
        mountPath: /empty-dir222
        emptyDir:
          medium: Memory
      - name: cache-volume
        mountPath: /empty-dir1
        emptyDir:
          medium: Memory
      # secret
      - name: another-build-robot-secret
        mountPath: /opt/another-build-robot-secret
        secret:
          name: another-build-robot-secret
          defaultMode: 511
          files:
            - key: username
              path: my-group/username
              mode: 511
            - key: password
              path: my-group/password
              mode: 511
        # config map
        configMap:
          name: log-config
          defaultMode: 511
          files:
            - key: log_level
              path: log_level
              mode: 511
    kubernetes:
      securityContext:
        runAsNonRoot: true
        privileged: true
      livenessProbe:
        initialDelaySeconds: 10
        httpGet:
          path: /ping
          port: 8080
      readinessProbe:
        initialDelaySeconds: 10
        httpGet:
          path: /pingReady
          port: www
      startupProbe:
        httpGet:
          path: /healthz
          port: liveness-port
        failureThreshold: 30
        periodSeconds: 10
configMaps:
  mydata:
    foo: bar
    hello: world
service:
  annotations:
    foo: bar
  scalePolicy: serial
  updateStrategy:
    type: Recreate
    rollingUpdate:
      maxUnavailable: 10%
      maxSurge: 25%
serviceAccount:
  automountServiceAccountToken: true
  roles:
    - global: true
      rules:
        - apiGroups: [""]
          resources: ["pods"]
          verbs: ["get", "watch", "list"]
kubernetesResources:
  pod:
    annotations:
      foo: baz
    labels:
      foo: bax
    activeDeadlineSeconds: 10
    restartPolicy: OnFailure
    terminationGracePeriodSeconds: 20
    automountServiceAccountToken: true
    hostNetwork: true
    hostPID: true
    dnsPolicy: ClusterFirstWithHostNet
    securityContext:
      runAsNonRoot: true
      fsGroup: 14
    priorityClassName: top
    priority: 30
    readinessGates:
      - conditionType: PodScheduled

Next, we’ll cover things like custom resources and their associated custom resource definitions, as well as secrets. All of the resource described in subsequent sections belong under the kubernetesResources: block.

Custom resources

The YAML syntax is curated from the native k8s YAML to remove the boilerplate and other unnecessary cruft, leaving the business attributes. Here’s an example of defining a custom resource definition and a custom resource. These could well be done by different charms, but are shown together here for brevity.

kubernetesResources:
  customResourceDefinitions:
    tfjobs.kubeflow.org:
      group: kubeflow.org
      scope: Namespaced
      names:
        kind: TFJob
        singular: tfjob
        plural: tfjobs
      versions:
        - name: v1
          served: true
          storage: true
      subresources:
        status: {}
      validation:
        openAPIV3Schema:
          properties:
            spec:
              properties:
                tfReplicaSpecs:
                  properties:
                    # The validation works when the configuration contains
                    # `Worker`, `PS` or `Chief`. Otherwise it will not be validated.
                    Worker:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    PS:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    Chief:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                          maximum: 1
    tfjob1s.kubeflow.org1:
      group: kubeflow.org1
      scope: Namespaced
      names:
        kind: TFJob1
        singular: tfjob1
        plural: tfjob1s
      versions:
        - name: v1
          served: true
          storage: true
      subresources:
        status: {}
      validation:
        openAPIV3Schema:
          properties:
            spec:
              properties:
                tfReplicaSpecs:
                  properties:
                    # The validation works when the configuration contains
                    # `Worker`, `PS` or `Chief`. Otherwise it will not be validated.
                    Worker:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    PS:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    Chief:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                          maximum: 1
  customResources:
    tfjobs.kubeflow.org:
      - apiVersion: "kubeflow.org/v1"
        kind: "TFJob"
        metadata:
          name: "dist-mnist-for-e2e-test"
        spec:
          tfReplicaSpecs:
            PS:
              replicas: 2
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
            Worker:
              replicas: 8
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
    tfjob1s.kubeflow.org1:
      - apiVersion: "kubeflow.org1/v1"
        kind: "TFJob1"
        metadata:
          name: "dist-mnist-for-e2e-test11"
        spec:
          tfReplicaSpecs:
            PS:
              replicas: 2
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
            Worker:
              replicas: 8
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
      - apiVersion: "kubeflow.org1/v1"
        kind: "TFJob1"
        metadata:
          name: "dist-mnist-for-e2e-test12"
        spec:
          tfReplicaSpecs:
            PS:
              replicas: 2
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
            Worker:
              replicas: 8
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0

Lifecycle of custom resources

Charms can decide when custom resources get deleted by specifying proper labels.

...
  customResourceDefinitions:
    - name: tfjobs.kubeflow.org
      labels:
        foo: bar
        juju-resource-lifecycle: model | persistent
...
  customResources:
    tfjobs.kubeflow.org:
      - apiVersion: "kubeflow.org/v1"
        kind: "TFJob"
        metadata:
          name: "dist-mnist-for-e2e-test"
        labels:
          foo: bar
          juju-global-resource-lifecycle: model | persistent
  1. If no juju-resource-lifecycle label set, the custom resource gets deleted with the application.
  2. If juju-resource-lifecycle is set to model, the custom resource will not get deleted when the application is removed and waits until the model is destroyed.
  3. If juju-resource-lifecycle is set to persistent, the custom resource will never get deleted by Juju even when the model is destroyed.

Secrets

Secrets will ultimately be modelled by Juju. We’re not there yet so we add the secrets definitions to the k8s specific YAML file (initially). The syntax and supported attributes are tied directly the the k8s spec. Both string and base64 encoded data are supported.

kubernetesResources:
  secrets:
    - name: build-robot-secret
      type: Opaque
      stringData:
          config.yaml: |-
              apiUrl: "https://my.api.com/api/v1"
              username: fred
              password: shhhh
    - name: another-build-robot-secret
      type: Opaque
      data:
          username: YWRtaW4=
          password: MWYyZDFlMmU2N2Rm

Webhooks

Charms can create mutating and validating webhook resources.

Juju will prefix any global resources with the model name to ensure applications deployed multiple times into different namespaces do not conflict. However, some workloads which Juju has no control over (yet) expect web hooks (in particular) to have fixed names. Charms can now define an annotation on mutating and validating webhooks to disable this name qualification:

annotations:
  model.juju.is/disable-prefix: "true"

Example webhooks:

kubernetesResources:
  mutatingWebhookConfigurations:
    - name: example-mutatingwebhookconfiguration
      labels:
        foo: bar
      annotations:
        model.juju.is/disable-prefix: "true"
      webhooks:
        - name: "example.mutatingwebhookconfiguration.com"
          failurePolicy: Ignore
          clientConfig:
            service:
              name: apple-service
              namespace: apples
              path: /apple
            caBundle: "YXBwbGVz"
          namespaceSelector:
            matchExpressions:
              - key: production
                operator: DoesNotExist
          rules:
            - apiGroups:
                - ""
              apiVersions:
                - v1
              operations:
                - CREATE
                - UPDATE
              resources:
                - pods
  validatingWebhookConfigurations:
    - name: pod-policy.example.com
      labels:
        foo: bar
      annotations:
        model.juju.is/disable-prefix: "true"
      webhooks:
        - name: "pod-policy.example.com"
          rules:
            - apiGroups: [""]
              apiVersions: ["v1"]
              operations: ["CREATE"]
              resources: ["pods"]
              scope: "Namespaced"
          clientConfig:
            service:
              namespace: "example-namespace"
              name: "example-service"
            caBundle: "YXBwbGVz"
          admissionReviewVersions: ["v1", "v1beta1"]
          sideEffects: None
          timeoutSeconds: 5

Ingress resources

Charms can create ingress resources. Example:

kubernetesResources:
  ingressResources:
    - name: test-ingress
      labels:
        foo: bar
      annotations:
        nginx.ingress.kubernetes.io/rewrite-target: /
      spec:
        rules:
          - http:
              paths:
                - path: /testpath
                  backend:
                    serviceName: test
                    servicePort: 80

Additional service accounts

Sometimes it’s necessary for a charm to create additonal service accounts which are needed by the upstream OCI image they are deploying.

kubernetesResources:
  serviceAccounts:
    - name: k8sServiceAccount1
      automountServiceAccountToken: true
      roles:
        - name: k8sRole
          rules:
            - apiGroups: [""]
              resources: ["pods"]
              verbs: ["get", "watch", "list"]
            - nonResourceURLs: ["/healthz", "/healthz/*"] # '*' in a nonResourceURL is a suffix glob match
              verbs: ["get", "post"]
            - apiGroups: ["rbac.authorization.k8s.io"]
              resources: ["clusterroles"]
              verbs: ["bind"]
              resourceNames: ["admin", "edit", "view"]
        - name: k8sClusterRole
          global: true
          rules:
            - apiGroups: [""]
              resources: ["pods"]
              verbs: ["get", "watch", "list"]

Additional services

It may also be necessary to create extra services.

kubernetesResources:
  services:
    - name: my-service1
      labels:
        foo: bar
      spec:
        selector:
          app: MyApp
        ports:
          - protocol: TCP
            port: 80
            targetPort: 9376
    - name: my-service2
      labels:
        app: test
      annotations:
        cloud.google.com/load-balancer-type: "Internal"
      spec:
        selector:
          app: MyApp
        ports:
          - protocol: TCP
            port: 80
            targetPort: 9376
        type: LoadBalancer

Charm deployment info in metadata.yaml

The charm can require that it only be deployed on a certain minimum k8s cluster API version.
It can also specify what type of service to create to sit in front of the workload pods, and can ask for the service to be not created at all using omit.

deployment:
    min-version: x.y
    type: stateless | stateful
    service: loadbalancer | cluster | omit

Last updated a month ago. Help improve this document in the forum.