Ep. 13: Scheduling downtimes in Checkmk
[0:00:00] | Welcome back to the Checkmk channel and in this episode, we're taking a look at scheduled downtimes. |
[0:00:15] | Maintenance times what are they and how does Checkmk handle them? You want your monitoring system to detect problems and alert you when there are hard or software failures. However, there can be planned outages often related to maintenance. For example when you want to upgrade the firmware of a switch. Then you know that the device won't be available for a period of time. |
[0:00:38] | And you can schedule these maintenance times in Checkmk. When you do this certain things will happen for example an icon will appear on the host or service, the problems won't appear on any of the problem views or dashboards you won't receive any notifications during the downtime and also when you do an availability analysis later on then these planned outages will be handled differently than the unplanned ones if you choose to do so. |
[00:01:09] | Also at the beginning and end of a downtime, all affected people will get a notification informing them about what is happening. And how you set this up in Checkmk is what I'll show you next. |
[00:01:21] | If you watched our episode on acknowledging problems then some of this might already look familiar to you. To schedule a downtime you first have to choose the host or service that will be affected. So for now let's pick DB_server 1. So it's important to know that when you apply a downtime to a host it will also be applied to all its services. |
[00:01:45] | But if I would apply it here then it would only be applied to all the services but not to the host itself. So we click on the hostname here. And here we got two options we can either click this button 'Schedule downtimes' or we can go to the commands menu and click on Schedule downtimes. |
[0:02:05] | Now, this is everything you can configure regarding downtime. So the command is required and you can use this to inform your colleague for the reason of the downtime, so I will pick 'Needs firmware upgrade', and then you can configure the time how long it will take and when it should start. |
[0:02:26] | You can say from now for 60 minutes or whatever you type in there or you can say from now for 2 hours or today, the rest of the week, this month, the rest of the year or you can set a custom time range. For example, if you know that the downtime will be tomorrow for two hours from 2:00 to 4:00 then you can configure that here. |
[0:02:54] | But for now let's just set it from now for 2 hours and press 'Confirm' here and let's go back to the host details. Now you see this icon here this indicates that this host is now in downtime and whenever there is a problem with this host it won't show up here in the overview in a sidebar. |
[0:03:17] | But to test it out let's quickly make sure that this host goes down so we can see it in action. We can now clearly see that this host is down but it doesn't show up under unhandled problems here in the overview. |
[0:03:29] | If you want to learn more about the downtime or you want to edit it you can click on this icon here and it will show you a list of all current and future downtimes. |
[0:03:45] | You can see when it starts, when it ends and you can see the comment here as well. Because not everything always goes to plan, often it happens that you want to extend the downtime. |
[0:04:03] | And to do that you can edit it but however if there are multiple down times here then whatever you do with remove or edit downtimes it will apply to all the downtimes in the list. |
[0:04:15] | So if that's the case you would have to use the checkboxes and select the scheduled downtime that you want to edit or remove. So now let's edit it. And now you can see we can easily extend the planned maintenance or downtime by 30 minutes, one hour or a custom time period. So let's say we need one hour more we can simply add it like this, once again we press 'Confirm'. |
[0:04:50] | And let's go back to the list. Okay and now you see that the end will be in 176 minutes instead of 116 minutes. And like you might have seen before there were a few other options that we could configure when we were creating the downtime. There were a few more interesting options when configuring a downtime so let's head back to our host. |
[0:05:26] | And configure a new downtime. So we already covered the comment section and the time period, now let's talk about these three checkboxes. So the first one this will let you configure a flexible time window in which the downtime should start. So if you know the downtime will last one hour but you don't know exactly when it will start you can configure a time window here and then whenever the host will go down that's when the scheduled downtime will start for one hour. |
[0:05:57] | Then there is the second option, also set the downtime on all the child hosts and this is especially useful when you perform maintenance on a router or a switch which has other hosts connected to it. Because from the point of view of the monitoring system if that router or switch were to go down then all of the connected host would also go down and this can trigger a bunch of problems and notifications that you might not want. |
[0:06:28] | And you can also do this recursively if you have multiple levels of connected hosts. Then the third option with this third option you can make this downtime a recurring event so for example if you know that a certain server you want to reboot that every week at a given time, then you can schedule this downtime to match the reboot and you don't have to reconfigure it every week basically. However, this option is only available in the enterprise edition and the free version of the enterprise edition, that's why it says only works with the microcode. |
[0:07:11] | When you have a large number of hosts with a recurring maintenance schedule following the same principle, then it can be quite cumbersome to configure the downtime for each host individually. |
[0:07:22] | Especially when you add a host and you would have to configure that downtime once more. And that's why you can schedule recurring maintenance using rules. So if you open the setup menu and search for 'downtime'. |
[0:07:38] | You can see these two options one to set up recurring downtimes for services and one for hosts. Now here you can create a rule like you would create any other rule we already covered this in a previous episode. |
[0:07:53] | Now for the configuration, we have to set up the comment as always so reboot of database servers. Then we have to set the first or current of the downtime so let's set that to today at 5 pm. |
[0:08:12] | You can also set the last occurance, so if you know that this will only be for one year then you can say that okay we set this for 2022 that will be the last time of this downtime will occur. But let's not do that for now. |
[0:08:28] | Then we need to set the interval so this can be every hour, every day, every second week. But let's stick to every week and then the duration of our downtime. And for a reboot that should not take more than let's say 10 minutes and we can also configure the flexible starting time. |
[0:08:49] | So maybe not every server will be rebooting at the exact same moment, so let's set 30 minutes. And then we need to set the condition so to which host will this will apply. And we can use a host tag for this we already added host tags in a previous episode. |
[0:09:14] | For example we set it to all our database servers. And now this rule should be applied to all servers or all hosts with the host tag application database. And the last thing I wanted to show you is the historical overview of all downtimes. |
[0:09:36] | So if you go to the monitoring menu you can search for downtime and here you see the downtime history. And this is an overview of all the downtimes of all your servers. So this is rather a short list but of course, this will grow over time. |
[0:09:54] | So that was it for this episode. Thanks for watching. If this was helpful to you, please subscribe to the channel and like the video. I hope to see you in the next episode. |
Want to know more about Checkmk? Join us for our Introduction to Checkmk Webinar