Checkmk Conference #6 is coming! Regular sale ends in . Get your tickets here!
You can find the shiny new documentation here, which is replacing over time this one.
Thus, this article is obsolete and may be not valid anymore - however, the new one is not finished yet!
1. Why not use local checks or MRPE?
Using local checks or MRPE for adding your own self-written checks to Checkmk is easy. Even inventory and performance data are supported. So why should you want to write native checks for Checkmk? Well, there can be several reasons:
- You want to define your check parameters in main.mk or WATO rather then locally on each target host.
- You want to exploit currently unused information sent already by an agent (for example Windows' numerous performance counters)
- You want to implement SNMP based checks.
- You want your check to be easily ported to other installations of Checkmk.
- You want your check to become official part of Checkmk.
- You are simply interested in how Checkmk works.
If one or more of those issues are relevant for you, then you'll find all information needed for writing your own checks in this article and a couple of further articles.
- Agent based checks - Writing checks based on the Checkmk agent
- The new Check API - Version 1.2.0 offers a new and simpler API for checks
- Error handling - How to (not) handle errors in checks
- SNMP based checks - Writing checks that use SNMP
- SNMP auto detection - Automatic detection of checks with cmk -I needs your help
- Working with counters - How to make use of performance data that comes as a counter
- Include files and shared sections - How to share code with other checks
- Dictionary based parameters - 1.1.11i2 How to handle complex check parameters
- Check Manpages - How to write a man pages for a check
- Guidelines for Checks - Checks becoming official part of Checkmk must adhere to our guidelines
- Multisite Perf-O-Meter - How to write Perf-O-Meters for Multisite
2. Do I have to learn Python?
Well, to be honest: yes - at least to a certain basic degree. People have suggested to change Checkmk such that checks can be written in other languages as well. I understand this request very well. But from a technical point of view I cannot image how such an integration could be done in a clean, simple and performant way. Checkmk's checks are not standalone programs or scripts but are closely integrated into the check mechanism. They need to have access to some of Checkmk's internal functions. In the end, for each host one Python program will be created by combining a base and all checks used by this host into one new program. This feature saves about 75% of the CPU resources when compared to directly calling check_mk for checking.
On the other hand, Python is a language which is cleanly designed, elegant and easy learn. I'm sure you'll like it once you have some experience with it (even if you dislike its style of indentation).
3. How Checkmk's checks work
Each check consists at least of the following three components:
- a unique name
- a data source definition
- a check function
Two further components are optional but strongly recommended:
- an inventory function
- a manual page
If your check outputs performance data, then two further components form a perfect check:
- a PNP graph template
- a Multisite Perf-O-Meter
3.1. The data source
Everything begins with the data source, i.e. source of the data the check operates on. Currently there are two different kinds of data sources: agent sections (tcp), and SNMP queries (snmp). An agent section is a part of the output of an agent, for example the output of the Linux command df. An SNMP based data source returns data retrieved by one or several SNMP queries on certain OIDs. Both data sources are presented to the check function as a table (a Python list of lists). We will call these data the "agent data".
3.2. The agent plugin
If you write a TCP based check you need a plugin for the agent. This is a typically small executable script which is placed in the plugins directory of the agent. It uses standard operating system methods for retrieving the data of interest.
It is important to understand the philosophy of Checkmk at this point. The plugin should:
- use standard operating system commands generally available
- remove unneccessary output (such as headings)
- not remove any of the actual data, even if its not needed in the first version of your check
- not decide about the status of a check
- not process the data by any means (other than removing garbage output)
- not break anything on hosts that do not support the used commands
- not run longer than a couple of seconds
3.3. The inventory function
If you want your check to support inventory (which is always a good idea), you have to supply an inventory function. This function examines the agent data of a host and creates a list of all items to be checked on this specific host. An item uniquely identifies a thing to be checked on a host within that type of check. Some examples of items are:
- The check "df" uses the mount point as the item, for example "/var/log".
- The check "services" uses the Windows service name as its item, for example "TnsListener".
- The check "ps" uses an artificial user supplied item.
- The check "local" uses the service description as output by the local check.
Some checks do not need to distinguish items. This is because the thing they check does not exist more than once on a host. An example is the check mem. But Checkmk always requires an item, so these checks simply use None as the item.
Please note that this does not mean you cannot do an inventory on mem. It's just that the number of items the inventory returns is at most one. In some cases it is even zero: when the agent output does not contain the information needed for the check. This is a very useful feature and enables the Nagios administrator to automatically perform the right checks on the right operating systems.
Your inventory function does not need to worry whether a certain item was already configured manually or detected by a previous inventory. Checkmk handles this in a general way, and makes sure that only newly detected items are added to the list of services.
3.4. The check function
When an actual check of a host is done, all services for this host will be checked in turn. When it's your check's turn, Checkmk will call your check function for each item that is automatically or manually configured for the check and host. Your function will be provided with the checked item, the (optional) parameters of the check and the agent data.
The check function then
- extracts the information relevant for the item in question from the agent data
- decides on the Nagios status of the service
- creates one line of text output for Nagios
- optionally computes performance data
- returns status, line of text and performance data as a Python tuple
3.5. The manual page
If you want to pass your check along to others, a manual page for the check is strongly
recommended. Checkmk has its own concept and syntax of check manuals. You do not need to learn
NROFF syntax or stuff like that. A check manual is a relatively simple text file named after
the check and usually installed in
Please read our article of how to write man pages for further information.
3.6. The PNP template
If you check delivers performance data (i.e. not only returns a status and an explanatory text but also values like memused=77364), you should provide a template for PNP4Nagios which nicely displays the evolution of the value.
The same holds for the Perf-O-Meter for Multisite. People like Perf-O-Meters. If you do not use Multisite then Perf-O-Meters are of no use to you. Checks wanting to be part of Checkmk must provide Perf-O-Meters (even if some older checks of Checkmk still do not have ones either).
4. Let's jump to practice: Preparing the agent
Let's now write our first check. For a start we offer two tutorials.