Ep. 5: Using metrics and graphs in Checkmk 2.0

[0:00:01] Hi, today we're discussing metrics and graphs in Checkmk.
[0:00:14] Welcome back to the Checkmk channel. You've already seen them in previous episodes, but today I'm going to take a closer look at metrics and graphs.
[0:00:22] When Checkmk monitors the service it not only determines if the state is OK, WARN or CRIT. But it also stores metrics, for example, the CPU utilization or the transfer bytes per second.
[0:00:34] And this is quite interesting and useful on one hand to recognize and diagnose failures, and on the other hand for planning. For example, when a certain threshold is reached, you can decide it's time to scale up to your resources. And I'll quickly show you how this looks in Checkmk.
[0:00:51] In order to show you more realistic data, we'll be doing this episode in our own internal monitoring system at tribe29. So let me just quickly pick a host.
[0:01:05] Now here in the service overview, you sometimes see this graph icon. This indicates that there is that there are metrics available for this service. And if you hover over one of these icons, you'll see a preview of all the graphs.
[0:01:20] And when I click on one, you'll get to an overview where all the values of all the metrics are depicted in time-series graphs. Checkmk stores the values of the metrics for up to 4 years.
[0:01:34] With a measurement interval of 1-minute, this can quickly add up to a lot of data.
[0:01:38] In order for you to be able to save this data on a normal hard drive, Checkmk compresses the data. For 2 days you will have data on a 1-minute interval, for 10 days the interval will be 5 minutes, then for the first 90 days there will be data available per 30 minutes and for 4 years you will have data points each 6 hours.
[0:02:01] Each measurement interval Checkmk stores the minimum, maximum and average values.
[0:02:06] This means that for 4 years you have access to all 3 of these aggregated values. Here in the top graph, you see the values of 2 metrics. The green part is the input bandwidth and the blue part is the output bandwidth.
[0:02:20] When you scroll with your mouse wheel when hovering over the graph you can zoom in on the x-axis this allows you to investigate the smaller time period. Here on the right side of the graph you have five predefined time ranges each with a preview.
[0:02:37] If you click on them for example last 8 days, you'll see the data for the last 8 days. You can also move over the x-axis by clicking and holding down the left mouse button and then moving from left to right while on the graph so this allows you to move through time.
[0:02:55] When you move up and down with your mouse also while holding down the left mouse button, you can zoom in on the y-axis. So if you want to investigate a small peak you can zoom in and then zoom in on the y-axis as well.
[0:03:13] You can resize the graph by clicking and holding down this bottom right corner. Let's collapse the sidebar. So when you drag it to the size that you want, you'll notice that the rest of the graphs are still the same size and you can fix that simply by reloading the page.
[0:03:32] So now the graphs are all the same size. Many services come with more than one metric, so if you scroll down you see all the metrics that are part of this service. Network cards and switchboards have this metric called "number of packages per second". If you look at the top graph you'll see that the input data is a lot higher or a lot more than the output data.
[0:03:58] But if you compare that to the number of packages per second you'll see that the input and output are roughly the same. There's also another important metric called, "the number of errors".
[0:04:10] You'll see that right now it's empty like it's supposed to be. But if you look a bit further back in time you'll see that there were some errors. Whether or not these were important it's hard to determine from this graph.
[0:04:25] But it can be an indication that you should do some more investigation to discover the underlying problem here.
[0:04:33] You can do a whole lot more with these graphs than what I just showed you. There are graph collections in which you can add multiple single graphs in order to more easily compare them. You can also add graphs to dashboards and to reports
[0:04:48] You can even create your own, where you pick specific metrics from multiple hosts and services and add them into one graph.
[0:04:55] There's also a Grafana connector which allows you to use data directly from Checkmk and Grafana and then display the data in a Grafana dashboard.
[0:05:05] All the information we discussed in this video about metrics and graphs can also be found in our documentation.
[0:05:11] That's it for today, if this video was helpful to you please like and subscribe. See you next time.

Want to know more about Checkmk? Join us for our Introduction to Checkmk Webinar

Register now