Werk #19950: BI: add new downtime handling computation option
| Component | BI | ||||||
| Title | BI: add new downtime handling computation option | ||||||
| Date | May 7, 2026 | ||||||
| Level | Trivial Change | ||||||
| Class | New Feature | ||||||
| Compatibility | Compatible - no manual interaction needed | ||||||
| Checkmk versions & editions |
|
A new computation option, Show downtime only when all current problems are covered, is now available in the BI aggregation configuration under Computation options. The option is disabled by default, so existing aggregations are unaffected after upgrading.
Default behavior (option disabled)
A BI aggregation is marked as in downtime as soon as the nodes that have an active maintenance window are sufficient to satisfy the aggregation function — regardless of whether other components are also failing without any maintenance window. This means a partial maintenance window can silently mask an unplanned outage elsewhere in the aggregate.
This default is appropriate when you treat the aggregate as a single service unit, where maintenance on any key component means the whole branch is under planned maintenance and alerting should be suppressed.
With this option enabled
The aggregate is only marked in downtime when every currently failing component has an active maintenance window that explains its failure. If even one failing component is not under a maintenance window, the aggregate is shown as not in downtime — making the unplanned failure visible to on-call teams.
This mode is intended for aggregates where you want to guarantee that no active, unplanned failure can be hidden behind a partial maintenance window.
Example
Consider a BI aggregate for a web application with three components: a web
server, a database, and a cache. The database is scheduled for maintenance and
is CRIT in downtime.
- Default behavior: if the aggregation function's threshold is satisfied by
the database node alone (e.g.
WORST), the whole aggregate is marked in downtime — even if the web server is alsoCRITwithout any maintenance window. The web server failure goes unnoticed. - With this option enabled: because the web server is
CRITand has no maintenance window, the aggregate is not marked in downtime. On-call teams see the unplanned failure immediately.
Note: when this option is enabled, the Escalate downtimes as WARN option has no effect, as the downtime coverage check replaces the threshold-based logic entirely.