Werk #15696: Linux agent: timing problem with 5 minute check interval

Component Checks & agents
Title Linux agent: timing problem with 5 minute check interval
Date Apr 21, 2023
Level Trivial Change
Class Bug Fix
Compatibility Incompatible - Manual interaction might be required
Checkmk versions & editions
2.3.0b1
Not yet released
Checkmk Raw (CRE), Checkmk Enterprise (CEE), Checkmk Cloud (CCE), Checkmk MSP (CME)
2.3.0b1 Checkmk Raw (CRE), Checkmk Enterprise (CEE), Checkmk Cloud (CCE), Checkmk MSP (CME)
2.2.0b5 Checkmk Raw (CRE), Checkmk Enterprise (CEE), Checkmk Cloud (CCE), Checkmk MSP (CME)
2.1.0p27 Checkmk Raw (CRE), Checkmk Enterprise (CEE), Checkmk MSP (CME)

When setting the check interval of a Linux host to 5 minutes, you recently may have experienced sporadic connection resets. This results in a critical Check_MK service, showing a summary containing

[agent] Communication failed: [Errno 111] Connection refused

or similar.

The reason for this was that the agent controller checks for an active connection registry every 5 minutes and by that closes the socket for an instant.
Since this 5-minute timeout starts whenever a connection has been accepted successfully, there's a significant chance to exacly hit a request from a 5-minute check interval.

To fix this, the config reload is now done without closing the socket temporarily.

To apply this fix, you have to update agents once on affected hosts.
Since the agent communation only fails sporadically, this can be accomplished by an automatic agent update, if configured.

To the list of all Werks