Werk #11500: Microcore: Improved memory efficiency of helper processes

Component Core & Setup
Title Microcore: Improved memory efficiency of helper processes
Date Oct 20, 2020
Checkmk Editon Checkmk Enterprise (CEE)
Checkmk Version 2.0.0i1
Level Major Change
Class New Feature
Compatibility Compatible - no manual interaction needed

In previous versions the Checkmk Microcore used so called Checkmk helper processes to execute the "Check_MK" services of the monitored hosts.

In larger installations, these processes consumed a lot of memory, because a) they held the Checkmk configuration in memory and b) you needed to configure a lot of them to scale the performance of your monitoring with the growing number of hosts. This resulted in a resource bottleneck.

Checkmk 2.0 comes with a completely reworked helper model. This introduces two kinds of helper processes.

  • Fetcher: It's only task is to fetch the needed information from the monitored hosts. So it handles the network communication with the Checkmk agent, SNMP agent or other special agents. It may take some time to gather these information and it also may be blocked by network timeouts. But it consumes only a small amount of memory. So you can configure a lot of these processes without problems.
  • Checker: It's task is to parse, analyze and evaluate the information gathered by the fetcher. It produces the check results for your services. It is a memory hungry process, because it needs to know all of your Checkmk configuration. It only takes a very short time to process the information from the fetcher. There is no network IO done by this helper process, which makes it pretty fast. You only need a small number of these processes.

This new model separates the problems of the previous "Checkmk helpers" into two separate pools: a) The network IO bound fetching of information and b) the CPU bound checking of the fetched information. We can now scale these different helper types independently from each other.

Bottom line: Checkmk 2.0 has consumes significantly less memory (~ factor of 4, depending on your configuration) while achieving the same amount of checks per second. As a result, Checkmk 2.0 can monitor even more hosts on the same platform than before.

The new model is enabled with Checkmk 2.0 by default. It can be configured using the global settings "Use separate fetchers and checkers", "Maximum concurrent Checkmk fetchers", "Maximum concurrent Checkmk checkers".

All sites start with 13 fetcher processes and 4 checker processes.

After updating you should have a look at the "Fetcher helper usage" and "Checker helper usage". It can be viewed in the "Micro core statistics" snapin and the detailed output of the "OMD [SITE] performance" services on your Checkmk host. The usage of both pools should not exceed 80%. In case it does, you should consider increasing the number of helpers of that type.

To the list of all Werks