Werk #19986: Redfish: retry transient Systems failures and keep previous data instead of dropping services

Component

Checks & agents

Title

Redfish: retry transient Systems failures and keep previous data instead of dropping services

Date

Jun 12, 2026

Level

Trivial Change

Class

Bug Fix

Compatibility

Compatible - no manual interaction needed

Checkmk versions & editions

3.0.0b1 Not yet released	Checkmk Community, Checkmk Pro, Checkmk Ultimate, Checkmk Cloud, Checkmk Ultimate MT
2.5.0p9	Checkmk Community, Checkmk Pro, Checkmk Ultimate, Checkmk Cloud, Checkmk Ultimate MT
2.4.0p34	Checkmk Community, Checkmk Pro, Checkmk Ultimate, Checkmk Cloud, Checkmk Ultimate MT

Some management controllers (notably Dell iDRAC) intermittently answer the central GET /redfish/v1/Systems request with a transient error such as HTTP 503 or 404, while all other endpoints respond normally. Because that request is the parent of every system-scoped section, a single such failure made the agent emit a successful but incomplete result, and the affected services — CPUs, memory, storage, drives, volumes and network interfaces — briefly vanished from the monitoring (Item not found in monitoring data).

The special agent now retries the system-data request on a transient failure before giving up. By default it retries 3 times with a 2 second delay; this is configurable per rule under Redfish Compatible Management Controller → Retry fetching system data (set the number of retries to 0 to disable retrying).

If the request still cannot be fetched after the configured retries, the agent now aborts the run instead of publishing an incomplete dataset. As a result the monitoring keeps the previously collected data and the system services no longer disappear during a short controller hiccup; the Check_MK service of the affected host turns CRITICAL for that interval instead.

Per-section resilience is unchanged: a failure of an individual section (e.g. a single drive) still only affects that section.

No user action is required. To tune the behaviour, adjust the new rule option.

To the list of all Werks