Charm development best practices

This document describes the best practices for charm development.

This is likely to change as the ecosystem and practice matures, so make sure to come back from time to time to stay up to date.

Contents:

Conventions

Naming

As it is common to develop multiple charms for the same underlying technology or software, using clear and consequent naming is a vital part of creating a new charm. If we take prometheus as an example, multiple charms exist that cover different scenarios:

  • Running on bare metal as a machine charm
  • Running in kubernetes as a k8s charm

To make it obvious to the user which one to use, we’ve put together a separate document detailing the do’s and don’ts of charm naming, which is available here.

When naming configuration items or actions, the convention is to use lowercase alphanumeric names, separated with underscores if required. For example timeout or enable_feature.

Generality

When building your charm or library, pay attention to wider usability; try not to encapsulate logic that is highly specific to your use-case. There is always a balance between configurability and simplicity, but consider how your implementation would meet the use-cases of others trying to consume your charm or library.

State

Charms should generally aim to be stateless. In the event that a charm needs to track state between invocations, it should create an instance of StoredState in a class attribute called _stored. It is worth noting that any StoredState is specific to a single unit rather than shared across an application, and will be lost if the unit is restarted (for example, if a Kubernetes pod is rescheduled).

For sharing state between units of the same application, we instead recommend using peer relation data bags.

Do not track the emission of events, or elements relating to the charm’s lifecycle, in a state. Where possible, construct this information by accessing the model, i.e. self.model, and the charm’s relations; peer relations or otherwise.

Resources

Resources can either be of the type oci-image or file. When providing binary files as resources, you need to provide binaries for all CPU architectures your binary might end up being run on. An example of this can be found here.

At the time of writing, we try to include amd64 and arm64 versions of such resources, but you should also make sure that you implement the usage of these resources in such a way that the user may build a binary for their architecture of choice and supply it themselves. An example of this can be found here.

Relations

A relation is formed between a “providing” and a “requiring” charm sharing the same interface. We recommend using Charm Libraries to distribute code that simplifies implementing any relation for people who wish to integrate with your application. Generally, we recommend naming a charm library after the relation interface it manages (example).

To keep this neat and easy to reason about, we implement a separate class for each side of the relation in the same library, for instance:

class MetricsEndpointProvider(Object):

# …

class MetricsEndpointRequirer(Object):

# …

These classes should do whatever is necessary to handle any relation events specific to the relation interface you are implementing, throughout the lifecycle of the application. By passing the charm object into the constructor to either the Provider or Requirer, you can gain access to the on attribute of the charm and register event handlers on behalf of the charm, as required.

Application and unit statuses

In general, you should aim to only make changes to the charm’s application or unit status directly within an event handler. This makes it easier for other developers to understand which states you intend the charm to be in, and when. If status is set across multiple event handlers, helper methods and other functions, this can be more difficult.

An example:


class MyCharm:

    # This is an event handler, and can therefore set status
    def _on_config_changed(self, event):

        if self._some_helper():
                self.unit.status = ActiveStatus()

    # This is a helper method, not an event handler, so don't set status here
    def _some_helper(self):
        # do stuff
        return True

When authoring libraries, consider that charm authors should always be given the option to decide whether something going wrong in a library function should be considered status-altering for their charm. To facilitate this, libraries should never mutate the status of a unit or application. Instead, raise exceptions and let them bubble back up to the charm for the charm author to handle as they see fit.

In cases where the library has a suggested default status to be raised, use a custom exception with a .status property containing the suggested charm status as shown here or here. The calling charm can then choose to accept the default by setting self.unit.status to raised_exception.status or do something else.

The only special case to this guidance is when authoring a lifecycle event handler in a library , where the library function subscribing to the hook is the entrypoint. In this case, it is considered valid to change the state, as long as this behavior is documented.

Logging

Templating

Use the default Python logging module. The default charmcraft init template will set this up for you. Do not build strings for the logger.

Prefer

logger.info("something %s", var)`

over

logger.info("something {}".format(var))`

Frequency

Try to avoid spurious logging, ensure that log messages are clear and meaningful and provide the information a user would require to rectify any issues.

Avoid excess punctuation or capital letters.

logger.error("SOMETHING WRONG!!!")

is significantly less useful than

logger.error("configuration failed: '8' is not valid for field 'enable_debug'.")

Sensitive information

Never log credentials or other sensitive information. If you really have to log something that could be considered sensitive, use the trace error level.

Documentation

Documentation should be end-user oriented. Document what your charm does and how it’s used. A copy-paste from the application documentation is not helpful, instead: include a link to it.

A good rule of thumb when testing your documentation is to ask yourself whether it provides a means for “guaranteed getting started”. You only get one chance at a first impression, so your quick start should be rock solid.

The front page of your documentation should not carry information about how to build, test or deploy the charm from the local filesystem: put this information in separate docs specific to the development of and contribution to your charm. This information can live as part of your Charm Documentation, or in the version control repository for your charm (example).

Custom events

Charms should never define custom events themselves. They have no need for emitting events (custom or otherwise) for their own consumption, and as they lack consumers, they don’t need to emit any for others to consume either. Instead, this should be done in a library.

Backward compatibility

When authoring your charm, consider the target Python runtime. Kubernetes charms will have access to Python 3.8 by default (the charm container is based on Ubuntu 20.04), but other charms consuming your libraries may not.

To stay compatible with existing installations, libraries should be compatible with Python 3.5. This requirement is likely to change over time as older versions of Ubuntu reach end of life. Python 3.5 is the version of Python shipped with Ubuntu 16.04, which is still in extended support.

Compatibility checks for Python 3.5 can be automated in your CI or using mypy.

Dependency management

External dependencies must be specified in a requirements.txt file. If your charm depends on other libraries, you should vendor and version the library you depend on (see the prometheus-k8s-operator). This is the default behaviour when using charmcraft fetch-lib. For more information see the docs on Charm Libraries.

Code style

Clarity

Charm authors should choose clarity over cleverness when writing code. A lot more time is spent reading code than writing it, so when possible, we opt for clear code that is easily maintained by anyone.

User experience / UX

Charms should aim to keep the user experience of the operator as simple and obvious as possible. If it is harder to use your charm than to set up the application from scratch, why should the user even bother with your charm?

Where possible, try to ensure that the application can be deployed without providing any further configuration options, e.g.

juju deploy foo

is preferable over

juju deploy foo --config something=value

This will not always be possible, but will provide a nicer user experience where applicable. Also consider if any of your configuration items could instead be automatically derived from a relation.

A key consideration here is which of your application’s configuration options you should initially expose. If your chosen application has many config options, it may be prudent to provide access to a select few, and add support for more as the need arises.

For very complex applications, consider providing “configuration profiles” which can group values for large configs together.

Event handler visibility

Charms should make event handlers private: _on_install, not on_install. There is no need for any other code to directly access the event handlers of a charm.

Linting

To keep down the cognitive overhead and maintenance effort needed, we use linters to make sure the code we write has a consistent style regardless of the author. An example configuration can be found in the pyproject.toml of canonical/operator-template.

This config makes some decisions about code style on your behalf. At the time of writing, it configures code formatting using black, with a line length of 99 characters. It also configures isort to keep imports tidy, and flake8 to watch for common coding errors.

In general, we run these tools inside a tox environment. We generally configure an environment named lint, and one called fmt alongside any testing environments required. See the Recommended Tooling section for more details.

Docstrings

Charms should have docstrings. At Canonical, we use the Google docstring format when writing docstrings for charms. To enforce this, we then use flake8-docstrings as part of our linter suite.

Class layout

The class layout of a charm should be organized in the following order:

  • Constructor (inside which events are subscribed to, roughly in the order they would be activated)
  • Factory methods (classmethods), if any
  • Event handlers, placed in order that they’re subscribed to
  • Public methods
  • Private methods

Further, we discourage the use of nested functions (i.e scope internal functions), and instead suggest using either private methods or free/unscoped functions. Likewise we discourage the use of static methods.

Patterns

Fetching Network Information

In some charms we need to get unit’s address to share over relation data with other charms. Getting unit IPs is error prone, as the bind_address depending on the state of the charm may return None, for instance:

    @property
    def _address(self) -> Optional[str]:
        """Get the unit's ip address.

        Technically, receiving a "joined" event guarantees an IP address is available. If this is
        called beforehand, a None would be returned.
        When operating a single unit, no "joined" events are visible so obtaining an address is a
        matter of timing in that case.

        This function is still needed in Juju 2.9.5 because the "private-address" field in the
        data bag is being populated by the app IP instead of the unit IP.
        Also in Juju 2.9.5, ip address may be None even after RelationJoinedEvent, for which
        "ops.model.RelationDataError: relation data values must be strings" would be emitted.

        Returns:
          None if no IP is available (called before unit "joined"); unit's ip address otherwise
        """
        # if bind_address := check_output(["unit-get", "private-address"]).decode().strip()
        if bind_address := self.model.get_binding(self._peer_relation_name).network.bind_address:
            bind_address = str(bind_address)
        return bind_address

A better approach is to instead rely on the FQDN returned by the socket module:

import socket

...

    @property
    def address(self) -> str:
        """Unit's hostname."""
        return socket.getfqdn()

:warning: External Accessibility

Keep in mind that the FQDN won’t be resolvable outside of the model (or rather, outside of the Kubernetes cluster). To make our charm resolvable by external workloads, we also need to first check whether we have an ingress address and if that is the case, return that address instead.

Testing

Charms should have tests to verify that they are functioning correctly. These tests should cover the behavior of the charm both in isolation (unit tests) and when used with other charms (integration tests). Charm authors should use tox to run these automated tests.

Unit tests

Unit tests are written using the unittest library shipped with Python. To facilitate unit testing of charms, we use a testing harness specifically designed for charmed operators available in the Charmed Operator SDK. An example of charm unit tests can be found here.

Functional tests

Functional tests in charms often take the form of integration-, performance- and/or end-to-end tests.

For integration and end-to-end tests, we use the pytest library. We also provide a testing library for interacting with Juju and your charm called pytest-operator. Examples of integration tests for a charm can be found in the prometheus-k8-operator repo

Continuous integration

The quality assurance pipeline of a charm should be automated using a continuous integration (CI) system.

For repositories on GitHub, the easiest way to go about this is to use the actions-operator, which will take care of setting up all dependencies needed to be able to run charms in a CI workflow. You can see an example configuration for linting and testing a charm using Github Actions here.

The automation should also allow the maintainers to easily see whether the tests failed or passed for any available commit.

Additionally, it is important to provide enough data for the reader to be able to take action, i.e. dumps from juju status, juju debug-log, kubectl describe and similar. To have this done for you, you may integrate charm-logdump-action into your CI workflow.

Linters

At the time of writing, linting modules commonly used by charm authors include black, flake8, flake8-docstrings, flake8-copyright, flake8-builtins, pyproject-flake8, pep8-naming, isort, and codespell.

Common integrations

Observability

Charms should provide a metrics endpoint for Prometheus, as well as reasonable (read: conservative but useful) alert rules and default dashboards.

Example repositories

A good starting point is the Operator Framework Template repository. This template will provide you with everything that you’ll need to get started writing your own charms.

Kubernetes charms

Machine charms


Last updated 4 months ago.