Werk #5958: Introduce docker monitoring with Check_MK

Component Checks & agents
Title Introduce docker monitoring with Check_MK
Date Apr 10, 2018
Checkmk Edition Checkmk Raw (CRE)
Checkmk Version 1.5.0b1 1.6.0b1
Level Prominent Change
Class New Feature
Compatibility Compatible - no manual interaction needed

With this change we prepare Check_MK for monitoring docker environments out of the box. These checks work in different layers (node, container).

The docker monitoring is currently available through the linux agent. To get a docker node monitored it should be enough to simply deploy the agent as usual on the node. Check_MK will find all relevant checks automatically.

The agent on the node will iterate over all containers and execute the Check_MK agent in the context of the container. In case there is a agent already installed in the container, the agent of the container will be used. Otherwise the node will execute the nodes agent in the context of the container.

In case you need specific agent plugins executed in the container, you can add them to the container image together with the agent just like you would do it for regular hosts.

By default the docker container specific parts are transported via piggyback from the node to the Check_MK server. This means that you will have to create hosts in your Check_MK use the short container ID as name.

For the docker container hosts please use the following configuration:

  • Set the "Check_MK Agent" option to "No agent".
  • Set the "IP address family" to "No IP" for only processing the piggyback data.
  • Set the docker node as parent.
  • Enable HW/SW inventory for the node and the containers

The manual (or scripted) configuration of these hosts will be necessary with the 1.5. Check_MK 1.6 will solve this problem automatically in a more elegant way.

There are other use cases, for example if you have not access to the node, then you can also install the agent (including optional config and plugins) into the image and make the container open a dedicated network port for agent communication.

We'll add a dedicated docker monitoring page to the documentation in the near future to describe this in detail.

The following changes have been made for now:

New check plugins

  • docker_node_info: Check the status of docker daemon
    Whether or not the docker daemon is running and functional on the docker node.
  • docker_node_info.containers: Count number of containers
    Counts the number of containers in the different states. Creates metrics out of these information. Thresholds can be configured on the number of containers in the different states.
  • docker_node_disk_usage: Disk usage of docker files
    This check summarizes the disk usage of docker files (images, ...) on the disks. It tells you whether or not you can safe disk space by cleaning up things.
  • docker_container_cpu: Check the CPU utilization of a docker container
    This check reports the percentage CPU utilization of a docker container. Unlike the Linux CPU utilization check (kernel.util) it does only report user and system time. More detailed values, like iowait, are not available.
  • docker_container_mem: Docker container specific memory checking
    Instead of using the default linux memory check (mem), Check_MK is now using the container specific memory check. The main reason is that the memory information in the container is not available through /proc/meminfo as usual. The memory data is available through the kernels cgroup interface which is available in the containers context below /sys/fs/cgroup/memory/memory.stat The features of both checks are exactly the same.
  • docker_container_status: Checks running state of container
    The check docker_container_status checks whether a container is running or not.
  • docker_container_status.health: Check healthcheck API of containers
    Check the status of containers as reported by Docker's healthcheck API.

New HW / SW inventory plugins

  • docker_node_images: Inventorize docker node information
    Inventorizes information about repository, tag, ID, creation time, size, labels and the amount of docker images. It also collect information about how many containers currently use this image.
  • docker_node_info: Inventory plugin displaying docker version
    Adds the docker version and node labels to the inventory tree.
  • docker_container_labels: Inventorize the labels of container
  • docker_container_node_name: Inventorize node name of containers

Preparing linux agent for docker monitoring

  • The agent now detects whether or not it is being executed in a docker container context.
  • Find docker containers and execute agent in context
    In case the agent is running on a docker node, it iterates all running containers and executes the Check_MK agent in to context of the container to gather container specific information. In case a check_mk_agent is already installed in the container, then this agent is executed. In case there is no check_mk_agent installed, the agent of the docker node is executed in the container.

Changed checks

  • lnx_if: Exclude veth* network interfaces on docker nodes
    The veth* network interfaces created for docker containers are now excluded by the linux agent in all cases. The interface names have no direct match with the docker container name or ID. They seem to have some kind of random nature. These container specific interfaces are not relevant to be monitored on the node. We are monitoring the docker network interfaces in the container.
  • df: Exclude docker local storage mounts on docker nodes
    The df check is now excluding all filesystems found below /var/lib/docker, which is the default location for the docker container local storage. Depending on the used storage engine docker creates overlay filesystems and mounts below this hierarchy for the started containers. The filesystems are not interesting for our monitoring. They will be monitored from the container context.
  • df mounts: Skip docker mounts for name resolution in container
    When docker containers are configured to perform name resolution there are mounts at /etc/resolv.conf, /etc/hostname and /etc/hosts which are not relevant to be monitored. These checks are now always skipped.
  • uptime: Is now reported correctly for docker containers
    In previous versions of the linux agent the uptime of the docker node was reported by the agent when it is being executed in a docker container context.
  • Checks disabled in docker container contexts
    These checks do not make sense in the context of a docker container. The agent is now skipping this section when executed in a container. For some of the checks docker specific ones have been added (see above).
    • kernel
    • cpu.threads
    • cpu.load
    • drbd
    • lnx_thermal

To the list of all Werks