Explore the latest product updates and best practices at our hybrid Checkmk Conference #12 from June 16-18, 2026 – Register here

Werk #18747: agent_elasticsearch: Handle HTTP errors and produce deterministic section output

Component Checks & agents
Title agent_elasticsearch: Handle HTTP errors and produce deterministic section output
Date May 7, 2026
Level Trivial Change
Class Bug Fix
Compatibility Compatible - no manual interaction needed
Checkmk versions & editions
2.6.0b1
Not yet released
Checkmk Community, Checkmk Pro, Checkmk Ultimate, Checkmk Cloud, Checkmk Ultimate MT
2.5.0p6
Not yet released
Checkmk Community, Checkmk Pro, Checkmk Ultimate, Checkmk Cloud, Checkmk Ultimate MT
2.4.0p32
Not yet released
Checkmk Community, Checkmk Pro, Checkmk Ultimate, Checkmk Cloud, Checkmk Ultimate MT

The Elasticsearch special agent could intermittently produce partial or empty output when one of the queried sections returned a non-2xx HTTP response. The agent passed the response body to the JSON decoder without checking the status code, so an error response was parsed as if it were valid data, causing a validation error that the outer exception handler swallowed and that aborted all remaining sections.

Three issues are fixed:

  • The nodes endpoint is narrowed from /_nodes/_all/stats to /_nodes/stats/process. The agent only ever consumed the process sub-tree of the response, so requesting the rest needlessly increased the payload and exposed the agent to upstream serialization bugs in unused stats categories. This was the trigger seen on AWS OpenSearch, which occasionally returns HTTP 400 from /_nodes/_all/stats because of negative byte counts (integer overflow) in stats categories the agent does not read. If a future check needs JVM, filesystem or other categories, the URL must be broadened again.

  • The HTTP status code is now checked before the response is decoded as JSON. Non-200 responses are logged to stderr and the affected section is skipped, letting the remaining sections run.

  • The list of sections to query is now iterated in a fixed order (cluster_health, nodes, stats) instead of in the iteration order of a set(). A failure in one section can no longer retroactively suppress the output of an earlier successful section.

To the list of all Werks