Werk #12908: Add predefined cluster modes for all services

Component Checks & agents
Title Add predefined cluster modes for all services
Date Aug 9, 2021
Checkmk Edition Checkmk Raw (CRE)
Checkmk Version 2.1.0b1
Level Trivial Change
Class New Feature
Compatibility Incompatible - Manual interaction might be required

This werk changes the behaviour of (some) services on clusters.

Affected services will go to UNKNOWN. To fix this, users must explicitly select the cluster mode they wish to use. This can be done using the ruleset "Aggregation options for clustered services".

All services written against the old API are affected, and few of the modern plugins (see below for a list of the latter).

Since this is the second time the behaviour of clustered services changes (1.6 to 2.0 and 2.0 to 2.1), we provide an overview. Note that we must consider four types of plugins here: Plugins programmed against the old API (refered to as 'legacy') or the new API ('modern'), and plugins developed with their behaviour on clusters explicitly considered ('native cluster mode') or not.

In Checkmk 1.6 and earlier all plugins (now legacy) can be configured to run on a cluster. For plugins without a native cluster implementation, the behaviour is unspecified. They simply operate on the concatenated output of all nodes, which may or may not result in the desired behaviour (or even crash).

In Checkmk 2.0 the behaviour for legacy plugins is unchanged. Modern plugins can only be run on a cluster, if they natively implement a cluster mode. Otherwise the service will be in a permanent WARNING state, telling the user to change their configuration.

In Checkmk 2.1 even legacy plugins are no longer run on clusters in this unspecified manner. By default, all services on a cluster are run in the native mode (or issueing a WARNING if it does not exist). If the plugin in question does not support a native cluster mode, you can use the ruleset "Aggregation options for clustered services" to select one of three other aggregation modes ("failover", "worst", "best"), where the results of the individual nodes are aggregated in a predetermined way.

For a description of the available modes please refer to the mentioned rulesets help.

The native cluster mode should be documented in the plugins man page.

As a result some of the native implementations have been removed, as they re-implemented one of the other aggregation modes (only with fewer options). These are the affected plugins and the cluster modes they were implicitly operating on:

  • apache_status: failover
  • cmk_site_statistics: failover
  • f5_bigip_vcmpguests: worst
  • infoblox_node_services: best
  • infoblox_services: best
  • job, local: The cluster behaviour has been configurable for these plugins. Choose according to your previous configuration. The options in the plugin specific rulesets are ignored.
  • livestatus_status: failover
  • mrpe: best
  • mssql_counters_cache_hits, mssql_counters_file_sizes, mssql_counters_locks, mssql_counters_locks_per_batch, mssql_counters_page_life_expectancy, mssql_counters_pageactivity, mssql_counters_sqlstats, mssql_counters_transactions: worst
  • mssql_datafiles, mssql_transactionlogs: failover
  • netscaler_sslcertificates: worst
  • sap_hana_diskusage, sap_hana_ess, 'sap_hana_events, sap_hana_instance_status, sap_hana_memrate, sap_hana_proc, sap_hana_replication_status: best

Note to developers: Every plugin you develop will have the predefined cluster modes ready to use -- there is nothing you have to do. If none of the three modes failover, worst or best suit your needs, you can implement your own native mode using the cluster_check_function. Please refer to the sphinx documentation for details.

To the list of all Werks