Werk #12908: Add predefined cluster modes for all services

Component

Checks & agents

Title

Add predefined cluster modes for all services

Date

Aug 9, 2021

Level

Trivial Change

Class

New Feature

Compatibility

Incompatible - Manual interaction might be required

Checkmk versions & editions

2.1.0b1	Checkmk Raw (CRE), Checkmk Enterprise (CEE), Checkmk MSP (CME)

This werk changes the behaviour of (some) services on clusters.

Affected services will go to UNKNOWN. To fix this, users must explicitly select the cluster mode they wish to use. This can be done using the ruleset "Aggregation options for clustered services".

All services written against the old API are affected, and few of the modern plugins (see below for a list of the latter).

Since this is the second time the behaviour of clustered services changes (1.6 to 2.0 and 2.0 to 2.1), we provide an overview. Note that we must consider four types of plugins here: Plugins programmed against the old API (refered to as 'legacy') or the new API ('modern'), and plugins developed with their behaviour on clusters explicitly considered ('native cluster mode') or not.

In Checkmk 1.6 and earlier all plugins (now legacy) can be configured to run on a cluster. For plugins without a native cluster implementation, the behaviour is unspecified. They simply operate on the concatenated output of all nodes, which may or may not result in the desired behaviour (or even crash).

In Checkmk 2.0 the behaviour for legacy plugins is unchanged. Modern plugins can only be run on a cluster, if they natively implement a cluster mode. Otherwise the service will be in a permanent WARNING state, telling the user to change their configuration.

In Checkmk 2.1 even legacy plugins are no longer run on clusters in this unspecified manner. By default, all services on a cluster are run in the native mode (or issueing a WARNING if it does not exist). If the plugin in question does not support a native cluster mode, you can use the ruleset "Aggregation options for clustered services" to select one of three other aggregation modes ("failover", "worst", "best"), where the results of the individual nodes are aggregated in a predetermined way.

For a description of the available modes please refer to the mentioned rulesets help.

The native cluster mode should be documented in the plugins man page.

As a result some of the native implementations have been removed, as they re-implemented one of the other aggregation modes (only with fewer options). These are the affected plugins and the cluster modes they were implicitly operating on:

apache_status: failover
cmk_site_statistics: failover
f5_bigip_vcmpguests: worst
infoblox_node_services: best
infoblox_services: best
job, local: The cluster behaviour has been configurable for these plugins. Choose according to your previous configuration. The options in the plugin specific rulesets are ignored.
livestatus_status: failover
mrpe: best
mssql_counters_cache_hits, mssql_counters_file_sizes, mssql_counters_locks, mssql_counters_locks_per_batch, mssql_counters_page_life_expectancy, mssql_counters_pageactivity, mssql_counters_sqlstats, mssql_counters_transactions: worst
mssql_datafiles, mssql_transactionlogs: failover
netscaler_sslcertificates: worst
sap_hana_diskusage, sap_hana_ess, 'sap_hana_events, sap_hana_instance_status, sap_hana_memrate, sap_hana_proc, sap_hana_replication_status: best

Note to developers: Every plugin you develop will have the predefined cluster modes ready to use -- there is nothing you have to do. If none of the three modes failover, worst or best suit your needs, you can implement your own native mode using the cluster_check_function. Please refer to the sphinx documentation for details.

To the list of all Werks