Checkmk
to checkmk.com

1. Why own checks?

Checkmk already monitors many types of relevant data using a large number of its own standard check plug-ins. Nevertheless, every IT environment is unique, so that often very specialized requirements can arise. With local checks you have a facility to extend the agent on the target host for quickly and easily creating your own services.

These local plug-ins differ in one significant aspect from other checks: the calculation of a status occurs directly in the host on which the data is also retrieved. In this way the complex creation of checks in Python is not needed and there is thus a completely free choice of coding language for scripts.

2. Writing a simple local check

2.1. Creating the script

A local check can be written in any programming language supported by the target host. The script must be constructed so that each check produces a status line consisting of four parts. Here is an example:

0 "My service" myvalue=73 My output text which may contain spaces

The four parts are separated by blanks and have the following meanings:

Example value Meaning Description

0

Status

The status of the service is given as a number: 0 for OK, 1 for WARN, 2 for CRIT and 3 for UNKNOWN. Alternatively, the status can also be calculated dynamically: then the number is replaced by a P.

My service

Service name

The service name as shown in Checkmk, in the output of the check in double quotes. If the service name does not contain blanks, you can save the quotes.

myvalue=73;65;75

Value and metrics

Metric values for the data. More information about the construction can be found in the chapter on metrics. Alternatively a minus sign can be coded if the check produces no metrics.

My output text which may contain spaces

Status detail

Details for the status as they will be shown in Checkmk. This part can also contain blanks.

There must always be a blank character between the individual parts of the output and the first text of the detailed status. Everything following will then count as status detail, which is why blank characters are allowed.

If there is uncertainty about a possible output, it can be simply tested by writing a small script with the echo command. Insert the output to be tested into the echo command. Make sure to mask the quotes for the service name with \ so that these characters are not interpreted by the echo command:

mylocalcheck
#!/bin/bash
echo "0 \"My 1st service\" - This static service is always OK"

For Windows hosts, such a script will look very similar to this:

mylocalcheck.bat
@echo off
echo 0 "My 1st service" - This static service is always OK

Both scripts lead to the same result in the output:

0 "My 1st service" - This static service is always OK

For Checkmk only this output is relevant, not how you created this output.

By the way — you can write any number of outputs in a script. Each output line will have its own service created in Checkmk. Therefore, no newline characters are allowed in the output — unless they are masked, for example for a multi-line output in Checkmk.

How it can be checked whether the local script will be correctly invoked by the agent can be seen in the Error analysis.

2.2. Distributing the script

Once the script has been written it can be distributed to the appropriate hosts. The path used will depend on the operating system. A list of path names can be found in Files and directories below.

Don’t forget to make the script executable on unix-type systems. The path shown in this example is for Linux:

root@linux# chmod +x /usr/lib/check_mk_agent/local/mylocalcheck

If you use the Agent Bakery, the script can be distributed with a rules-based procedure. More on rule-creation can be found in the chapter Distribution via the Agent Bakery.

2.3. Adding the service to the monitoring

At every invocation of the Checkmk agent the local check contained in the script will also be executed and appended to the agent’s output. The service discovery also functions automatically like with other services:

localchecks services

Once the service has been added to the monitoring and the changes have been activated, the implementation of the self-created service with the aid of a local check will be complete. Should a problem arise during the service discovery, the Error analysis can be of help.

3. Extended functions

3.1. Metrics

With a local script metrics can also be set. To enable them the value of the check must always return P. Then the state will be calculated by Checkmk.

The general syntax for this data is as follows:

metricname=value;warn;crit;min;max

where value is the current value, warn and crit set the (upper) thresholds, and min and max fix the range of values — for example like this:

count=73;80;90;0;100

The values are separated with a semicolon. All values except value are optional. If a value is not required, the field remains empty or is omitted at the end, as in the following for warn, crit and max:

count=42;;;0

Note: In the commercial editions the values for min and max can indeed be set — but only for compatibility reasons. Limiting the associated graph to a certain range of values has no effect in the commercial editions.

3.2. Metric name

You should take special care when choosing the identifier of this metric - called metricname in the example here. We recommend prefixing the identifiers to prevent overlap with metrics already present in Checkmk.

So, for example, instead of simply calling a metric that represents the number of currently waiting requests in a queue you are monitoring, 'current', we recommend a clearer identifier with a prefix - such as: mycompany_current_requests.

If you were to choose an identifier here that already exists in Checkmk, the representation of your metrics in graphs would be overwritten with the definitions that already exist.

Of course, you can also reuse an existing metric from Checkmk intentionally. So, for a metric for an electrical current you could simply use the identifier current in your local check. In case of doubt, however, you have to look up the definition of this metric in ~/lib/python3/cmk/gui/plugins/metric by yourself.

OMD[mysite]:~$ grep -r -A 4 'metric_info\["current"\]' ./lib/python3/cmk/gui/plugins/metrics/

3.3. Multiple metrics

You can also have several metrics output. These are then separated by the 'pipe' character |, for example like this:

count1=42|count2=23

Attention: On Windows hosts you have to prepend a caret (^) to the pipe in the script, so that it does not get interpreted.

mylocalcheck.bat
@echo off
echo 0 "My 2nd service" count1=42^|count2=23 A service with 2 graphs

A complete output with two metrics will look like this:

root@linux# /usr/lib/check_mk_agent/local/mylocalcheck
0 "My 2nd service" count1=42|count2=23 A service with 2 graphs

After you have also included the new service in the monitoring, you will see the text for the status detail in the Summary field in the service list. After clicking on the service, the page with the service details is displayed. The metrics are shown in the Details field and below this you will see the service graphs automatically generated by Checkmk:

localchecks graphs2

3.4. Calculating status dynamically

In the previous chapters, you learned how to set threshold values for metrics and how to display them in the graphs. The next obvious step is to use these thresholds for a dynamic calculation of the service state. Checkmk provides exactly these options for extending a local check.

If you pass the letter P instead of a number in the first field of the output that determines the state, the service’s status will be calculated on the basis of the threshold as provided.

An output will then look like this:

root@linux# /usr/lib/check_mk_agent/local/mylocalcheck
P "My 1st dynamic service" count=40;30;50 Result is computed from two threshold values
P "My 2nd dynamic service" - Result is computed with no values

… and the display in a service view like this:

localchecks dynsrv

The display differs in two points from the one that we saw earlier:

  • For services in the WARN or CRIT state, the Summary of the service shows all important information about the metrics (name, value, thresholds). This means you can always see how this status was calculated from a value. For all other states, the metric information is only displayed in the Details field.

  • If no metrics have been passed the service’s status will always be OK.

3.5. Upper and lower thresholds

Some parameters have not only an upper threshold but also a lower threshold. An example is humidity. For such cases the local check has the option of passing two threshold values each for the states WARN and CRIT. They are separated by a colon and represent the lower and the upper threshold value respectively.

In the general syntax, it looks like this:

metricname=value;warn_lower:warn_upper;crit_lower:crit_upper

… and in the example like this:

root@linux# /usr/lib/check_mk_agent/local/mylocalcheck
P "My 3rd service" humidity=37;40:60;30:70 A service with lower and upper thresholds

… and in the display of a service view like this:

localchecks lower

If you are only concerned with lower thresholds, leave out the upper threshold fields:

root@linux# /usr/lib/check_mk_agent/local/mylocalcheck
P "My 4th dynamic service" count_lower=37;40:;30: A service with lower thresholds only

With this output, you specify that the service should become WARN if the value is less than 40 and CRIT if it is less than 30: thus, at the specified value of 37, the service will get the WARN state.

3.6. Multi-line outputs

The option to spread an output over multiple lines is also available. Because Checkmk runs under Linux you can work with the Escape sequence '\n' in order to force a line-break. Even if due to the scripting language the backslash itself needs to be escaped, it will be correctly interpreted by Checkmk:

root@linux# /usr/lib/check_mk_agent/local/mylocalcheck
P "My service" humidity=37;40:60;30:70 My service output\nA line with details\nAnother line with details

In the service’s details these additional lines will be visible under the Summary:

localchecks srv details

3.7. Executing asynchronously and caching output

The output of local checks, like that of agent plug-ins, can be cached. This can be necessary if a script has a longer processing time. Such a script is then executed asynchronously and only in a defined time interval and the last output is cached. If the agent is queried again before the time expires, it uses this cache for the local check and returns it in the agent output.

Note: Caching is only available for AIX, FreeBSD, Linux, OpenWrt and Windows.

Configuring Linux

Under Linux or another unix-type operating system, any plug-in can be executed asynchronously. For a local check, the necessary configuration is very similar to that of a plug-in. To do this, create a subdirectory called the number of seconds you want the output to be cached and put your script in that subdirectory.

In the following example, the local check will be executed only every 10 minutes (600 seconds):

root@linux# /usr/lib/check_mk_agent/local/600/mylocalcheck
2 "My cached service" count=4 Some output of a long running script

The cached data is written to a cache directory.

For a service that provides cached data, the cache-specific information is added to the service view:

localchecks srv cached

Configuring Windows

Under Windows, the configuration is also analogous to that of a plug-in. Instead of using a special subdirectory as with Linux & Co, the options are set in a configuration file:

C:\ProgramData\checkmk\agent\check_mk.user.yml
local:
    enabled: yes
    execution:
        - pattern     : $CUSTOM_LOCAL_PATH$\mylocalcheck.bat
          async       : yes
          run         : yes
          cache_age   : 600

As you can see above, under Windows you can configure the asynchronous execution (with async) and the time interval (with cache_age) separately.

Alternatively, on Windows you can also do the configuration in the Agent Bakery.

4. Distribution via the Agent Bakery

CEE If you are already using the Agent Bakery, you can also distribute the scripts with local checks to several hosts this way.

To do this, first create the directory custom on the Checkmk server as site user below ~/local/share/check_mk/agents/ and in it a subdirectory tree for each package of local checks:

OMD[mysite]:~$ cd ~/local/share/check_mk/agents
OMD[mysite]:~$ ~/local/share/check_mk/agents$ mkdir -p custom/mycustompackage/lib/local/

The package directory in the above example is mycustompackage. Below that, the lib directory flags the script as a plug-in or as a local check. The subsequent local directory then allocates the file explicitly. Place the script with the local check in this directory.

Important: On Linux, you can configure asynchronous execution analogously as described in the previous chapter by now creating a directory under custom/mycustompackage/lib/local/ with the number of seconds of the execution interval and placing the script there. Under Windows, you can use the rule sets Set execution mode for plugins and local checks and Set cache age for plugins and local checks. These and other rule sets for local checks under Windows can be found in the Agent Bakery under Agent rules > Windows Agent.

In the configuration environment of Checkmk, the package directory mycustompackage will be shown as a new option: Open Setup > Agents > Windows, Linux, Solaris, AIX, create a new rule with Agents > Agent rules > Generic options > Deploy custom files with agent and select the newly-created package:

localchecks custom

Checkmk will then autonomously integrate the local check correctly into the installation packet for the appropriate operating system. After the changes have been activated and the agent baked, the configuration will be complete. Now the agents only need to be distributed.

5. Error analysis

5.1. Testing the script

If you run into problems with a self-written script, you should check the following potential error sources:

  • Is the script in its correct directory?

  • Is the script executable, and are the access permissions correct? This is especially relevant if you are running the agent or script and you are not the root/System user.

  • Is the output compliant with the given syntax? The output of the local check must conform to the syntax as described in the chapters Creating the script and Extended functions. Otherwise, error-free execution cannot be guaranteed.

    Problems and errors can arise in particular when a local check is intended to perform a task that requires a full-fledged check plug-in, for example when the output of the local check itself contains a section header or the definition of a host name as used when transporting piggyback data.

5.2. Testing agent output on the target host

If the script itself is correct, the agent can be run on the host. With unix-type operating systems such as Linux, BSD, etc., the command below is available. With the -A option the number of additional lines to be displayed following a hit can be specified. This number can be customized to suit the number of expected output lines:

root@linux# check_mk_agent | grep -v grep | grep -A2 "<<<local"
<<<local:sep(0)>>>
P "My service" humidity=37;40:60;30:70 My service output\nA line with details\nAnother line with details
cached(1618580356,600) 2 "My cached service" count=4 Some output of a long running script

In the last line, you can recognize a cached service by the preceding cache information with the current Unix time and the execution interval in seconds.

Under Windows, you can achieve a very similar result with PowerShell and the Select-String 'cmdlet' as with the grep command under Linux. In the following command, the two digits behind the Context parameter determine how many lines are to be output before and after the hit:

PS C:\Program Files (x86)\checkmk\service> ./check_mk_agent.exe test | Select-String -Pattern "<<<local" -Context 0,3
> <<<local:sep(0)>>>
  0 "My 1st service" - This static service is always OK

  cached(1618580520,600) 1 "My cached service on Windows" count=4 Some output of a long running script

5.3. Testing agent output on the Checkmk server

As a last step the processing of the script output can also be tested on the Checkmk server with the cmk command — once for the service discovery:

OMD[mysite]:~$ cmk -IIv --detect-plugins=local mycmkserver
Discovering services and host labels on: mycmkserver
mycmkserver:
...
+ EXECUTING DISCOVERY PLUGINS (1)
  2 local
SUCCESS - Found 2 services, no host labels

… and also the processing of the service output with a similar command:

OMD[mysite]:~$ cmk -nv --detect-plugins=local mycmkserver
Checkmk version 2.0.0p2
+ FETCHING DATA
...
+ PARSE FETCHER RESULTS
Received no piggyback data
My cached service    Some output of a long running script(!!), Cache generated 6 minutes 52 seconds ago, Cache interval: 10 minutes 0 seconds, Elapsed cache lifespan: 68.71%
My service           My service output\, humidity: 37.00 (warn/crit below 40.00/30.00)(!)

For both commands we have shortened the output by lines not relevant for this topic.

If there are errors in a local check, Checkmk will identify them in the service output. This applies as well for erroneous metrics, for false or incomplete information in the script output, or an invalid status. These error messages should aid in quickly identifying errors in a script.

6. Files and directories

6.1. Script directory on the target host

Path name Operating system

/usr/check_mk/lib/local/

AIX

/usr/local/lib/check_mk_agent/local/

FreeBSD

/usr/lib/check_mk_agent/local/

HP-UX, Linux, OpenBSD, OpenWrt and Solaris

%ProgramData%\checkmk\agent\local

Windows

6.2. Cache directory on the target host

Cached data of individual sections, including the local section, is stored here and appended to the agent again with each execution, as long as the data is valid.

Path name Operating system

/tmp/check_mk/cache/

AIX

/var/run/check_mk/cache/

FreeBSD

/var/lib/check_mk_agent/cache/

Linux, OpenWrt and Solaris

On this page