Ep. 13: Scheduling downtimes in Checkmk
Read Video Transcript
|[0:00:00]||Welcome back to the Checkmk channel and in this episode, we're taking a look at scheduled downtimes.|
|[0:00:15]||Maintenance times what are they and how does Checkmk handle them? You want your monitoring system to detect problems and alert you when there are hard or software failures. However, there can be planned outages often related to maintenance. For example when you want to upgrade the firmware of a switch. Then you know that the device won't be available for a period of time.|
|[0:00:38]||And you can schedule these maintenance times in Checkmk. When you do this certain things will happen for example an icon will appear on the host or service, the problems won't appear on any of the problem views or dashboards you won't receive any notifications during the downtime and also when you do an availability analysis later on then these planned outages will be handled differently than the unplanned ones if you choose to do so.|
|[00:01:09]||Also at the beginning and end of a downtime, all affected people will get a notification informing them about what is happening. And how you set this up in Checkmk is what I'll show you next.|
|[00:01:21]||If you watched our episode on acknowledging problems then some of this might already look familiar to you. To schedule a downtime you first have to choose the host or service that will be affected. So for now let's pick DB_server 1. So it's important to know that when you apply a downtime to a host it will also be applied to all its services.|
|[00:01:45]||But if I would apply it here then it would only be applied to all the services but not to the host itself. So we click on the hostname here. And here we got two options we can either click this button 'Schedule downtimes' or we can go to the commands menu and click on Schedule downtimes.|
|[0:02:05]||Now, this is everything you can configure regarding downtime. So the command is required and you can use this to inform your colleague for the reason of the downtime, so I will pick 'Needs firmware upgrade', and then you can configure the time how long it will take and when it should start.|
|[0:02:26]||You can say from now for 60 minutes or whatever you type in there or you can say from now for 2 hours or today, the rest of the week, this month, the rest of the year or you can set a custom time range. For example, if you know that the downtime will be tomorrow for two hours from 2:00 to 4:00 then you can configure that here.|
|[0:02:54]||But for now let's just set it from now for 2 hours and press 'Confirm' here and let's go back to the host details. Now you see this icon here this indicates that this host is now in downtime and whenever there is a problem with this host it won't show up here in the overview in a sidebar.|
|[0:03:17]||But to test it out let's quickly make sure that this host goes down so we can see it in action. We can now clearly see that this host is down but it doesn't show up under unhandled problems here in the overview.|
|[0:03:29]||If you want to learn more about the downtime or you want to edit it you can click on this icon here and it will show you a list of all current and future downtimes.|
|[0:03:45]||You can see when it starts, when it ends and you can see the comment here as well. Because not everything always goes to plan, often it happens that you want to extend the downtime.|
|[0:04:03]||And to do that you can edit it but however if there are multiple down times here then whatever you do with remove or edit downtimes it will apply to all the downtimes in the list.|
|[0:04:15]||So if that's the case you would have to use the checkboxes and select the scheduled downtime that you want to edit or remove. So now let's edit it. And now you can see we can easily extend the planned maintenance or downtime by 30 minutes, one hour or a custom time period. So let's say we need one hour more we can simply add it like this, once again we press 'Confirm'.|
|[0:04:50]||And let's go back to the list. Okay and now you see that the end will be in 176 minutes instead of 116 minutes. And like you might have seen before there were a few other options that we could configure when we were creating the downtime. There were a few more interesting options when configuring a downtime so let's head back to our host.|
|[0:05:26]||And configure a new downtime. So we already covered the comment section and the time period, now let's talk about these three checkboxes. So the first one this will let you configure a flexible time window in which the downtime should start. So if you know the downtime will last one hour but you don't know exactly when it will start you can configure a time window here and then whenever the host will go down that's when the scheduled downtime will start for one hour.|
|[0:05:57]||Then there is the second option, also set the downtime on all the child hosts and this is especially useful when you perform maintenance on a router or a switch which has other hosts connected to it. Because from the point of view of the monitoring system if that router or switch were to go down then all of the connected host would also go down and this can trigger a bunch of problems and notifications that you might not want.|
|[0:06:28]||And you can also do this recursively if you have multiple levels of connected hosts. Then the third option with this third option you can make this downtime a recurring event so for example if you know that a certain server you want to reboot that every week at a given time, then you can schedule this downtime to match the reboot and you don't have to reconfigure it every week basically. However, this option is only available in the enterprise edition and the free version of the enterprise edition, that's why it says only works with the microcode.|
|[0:07:11]||When you have a large number of hosts with a recurring maintenance schedule following the same principle, then it can be quite cumbersome to configure the downtime for each host individually.|
|[0:07:22]||Especially when you add a host and you would have to configure that downtime once more. And that's why you can schedule recurring maintenance using rules. So if you open the setup menu and search for 'downtime'.|
|[0:07:38]||You can see these two options one to set up recurring downtimes for services and one for hosts. Now here you can create a rule like you would create any other rule we already covered this in a previous episode.|
|[0:07:53]||Now for the configuration, we have to set up the comment as always so reboot of database servers. Then we have to set the first or current of the downtime so let's set that to today at 5 pm.|
|[0:08:12]||You can also set the last occurance, so if you know that this will only be for one year then you can say that okay we set this for 2022 that will be the last time of this downtime will occur. But let's not do that for now.|
|[0:08:28]||Then we need to set the interval so this can be every hour, every day, every second week. But let's stick to every week and then the duration of our downtime. And for a reboot that should not take more than let's say 10 minutes and we can also configure the flexible starting time.|
|[0:08:49]||So maybe not every server will be rebooting at the exact same moment, so let's set 30 minutes. And then we need to set the condition so to which host will this will apply. And we can use a host tag for this we already added host tags in a previous episode.|
|[0:09:14]||For example we set it to all our database servers. And now this rule should be applied to all servers or all hosts with the host tag application database. And the last thing I wanted to show you is the historical overview of all downtimes.|
|[0:09:36]||So if you go to the monitoring menu you can search for downtime and here you see the downtime history. And this is an overview of all the downtimes of all your servers. So this is rather a short list but of course, this will grow over time.|
|[0:09:54]||So that was it for this episode. Thanks for watching. If this was helpful to you, please subscribe to the channel and like the video. I hope to see you in the next episode.|
Ep. 1: Installing Checkmk 2.0 and monitoring your first host
In this video, Baris explains how to take get started with Checkmk and start monitoring your first host within a few minutes.
Ep. 2: The Checkmk 2.0 user interface
In this video, Baris take you through the new user interface in Checkmk 2.0. He explains the various components of the User interface such as the new navigation menus, the Sidebar, main dashboard, tactical overview, how to switch between the Checkmk interface themes and much more
Ep. 3: Using SNMP to monitor network devices in Checkmk 2.0
In this episode, Baris explains how to monitor network devices with Checkmk. SNMP is a protocol that many switches, routers, printers, UPSs, hardware sensors and other devices have implemented with the purpose of being able to monitor them easily.
Ep. 4: Monitoring Windows in Checkmk
In this video of our Getting started with Checkmk series, Baris explains how to install a Checkmk agent on a Windows host system and add that into your monitoring environment.
Ep. 5: Using metrics and graphs in Checkmk 2.0
In the 5th episode of the Getting started with Checkmk series, Baris explains using various metrics that you can monitor in Checkmk such as CPU utilization, CPU load etc. You can also see graph visualizations for these metrics or create and customize your own as per your requirements.
Ep. 6: Updating Checkmk 2.0 and using multiple instances
In this video, Baris explains how to update your Checkmk instance. It is very easy and can be done within minutes. You can run multiple Checkmk instances with different versions on the same system. This gives you the flexibility to test the new version before using it in production.
Ep. 7 (part 1): Working with rules and setting thresholds in Checkmk
In the following three-part videos series, Baris explains rule-based monitoring with Checkmk. In the first part, he shows you how you can work with rules and set threshold values. Rule-based configuration is one of the key features for Checkmk which helps you to scale your monitoring easily within minutes.
Ep. 7 (part 2): Smart rules with Host Tags in Checkmk
In the second part of this video, Baris explains using Smart rules with host tags in Checkmk. In the first part, he shows you how you can work with rules and set threshold values. These are features that you can use to build your rules even more intelligently and to better organize your monitoring.
Ep. 7 (part 3): Managing Hosts in Folder in Checkmk
In this final part of our episode on Rule-based monitoring in Checkmk, Baris demonstrates how to manage hosts in folders in Checkmk. This helps you to apply your monitoring configurations at scale and organize your hosts according to your needs.
Ep. 8: Working with Host and Service Groups in Checkmk
In this Baris demonstrates how to create host and service groups in Checkmk, so you can perform actions on an entire group instead of configuring each of them individually.
Ep. 9: Using the Quicksearch function in Checkmk
In this episode of the Checkmk tutorials, Baris shows how you can use the Quicksearch function in Checkmk. You can use it to easily find and manage certain hosts or services. He also explains some examples of filters to you. In Checkmk 2.0 you can use the same syntax in the Seach function found in the monitor menu to get identical results.
Ep. 10: Detecting configuration errors with the Analyze Configuration feature
With the Analyze Configuration feature, you can check if there are any configuration errors in your installation. Checkmk controls a number of possible security risks or potential performance restrictions and indicates if there are any problems.
Ep. 11: View creation and customization in Checkmk
In this video, Baris demonstrates how to customize headers, columns, and more in Views in Checkmk for yourself or other users. He also explains how to create custom views and add desired information to these views.
Ep. 12: Acknowledging problems in Checkmk
In this video, Baris explains how you can acknowledge problems in Checkmk. This function helps you to qualify the states of hosts and services. This allows you to keep track of messages in the main dashboard and, for example, you can add comments to problems.
Ep. 14: Distributed monitoring with Checkmk
In this video, Baris explains how you can connect several Checkmk instances to a monitoring system and then manage it.
Ep. 15: MKPs and Plugins in Checkmk
In the 15th episode of our Getting started with Checkmk tutorial series, Baris explains what are Checkmk Extension Packages (MKPs) and how easy it is to integrate them into your Checkmk monitoring environment. MKPs are the preferred format when you make your own extensions as it makes it easy to share with other users or deploy in distributed environments.
Ep. 16: Working with 'Bulk Actions' in Checkmk
In this episode of our Checkmk tutorials series, Baris explains how you can save a lot of time with bulk actions. With this feature you can perform various tasks such as deleting, renaming, service discovery etc. on a large number of hosts simultaneously.
Ep. 17: Working with network topologies in Checkmk
In this video of our gettign startted with Checkmk series, Baris explains how to map network topologies in Checkmk. This feature is quite helpful to manage your network and prevent any unnecessary notifications from the devices in your network.
Ep. 18: Creating and customizing dashboards in Checkmk
In this video of our Getting started with Checkmk series, Mathias explains how you can create and customize dashboards in Checkmk 2.0, so you can get insights into your monitoring according to your requirements. Find out more in this video.
Ep. 19: Monitoring websites and their certificates with Checkmk
In this episode, Bastian demonstrates how to monitor a website and its certificate with Checkmk. You can also monitor specific web pages with Checkmk by using the several options that will suit your use case. Learn more in this video.
Ep. 20: Configuring dashboard elements in Checkmk
Learn how to add data visualization elements of the various metrics into your Checkmk Dashboard. In this video, Mathias explains how you can configure these elements and create a dashboard as per your requirements.
Ep. 21: Setting up notifications in Checkmk
Learn how to set up notifications in Checkmk and assign relevant contacts and contact groups to be notified for various events. Later in this video, our presenter Bastian also demonstrates how you can set up rule-based notifications according to different conditions for hosts and services.
Ep. 22: Monitoring logfiles with Checkmk
Monitor your logfiles with Checkmk using its Logwatch plugin. It is very useful when you want to monitor your logfiles regardless of whether you are using a UNIX/Linux or a windows based system. Learn more in this video.
Ep. 24: 3 Rules for efficient network monitoring
In this video, Bastian demonstrates 3 rules that will help you to efficiently monitor your network interfaces. With Checkmk 2.0, with just three rules, you can set up an efficient network monitoring that will not only monitor all of your network interfaces but also simultaneously provide a detailed overview of all of your ports.
Ep. 25: New UX and security improvements in Checkmk 2.1
Checkmk 2.1 come with many UX improvements such as pre-built dashboards for Linux and Windows, faster core performance and much more. Security features such as two-factor authentication etc. were also added in this new version. Watch this video to learn how to use these new features and enhancements in Checkmk.
Ep. 28: Working with InfluxDB integration in Checkmk
Learn how to send data to InfluxDB from Checkmk. As InfluxDB introduced a new protocol to send data to it, a new connector was developed with Checkmk to talk natively with it. Learn more about it in this video.
Ep. 29: New agent architecture in Checkmk 2.1
With Checkmk 2.1, the agent architecture was modified to enable performance improvements and add new features such as TLS encryption, data compression, and the reversal of direction of communication from the agent. This will enable push mode and pull mode.
Ep. 30: Clustering the Checkmk appliance
In this video, Robin demonstrates how you can cluster your Checkmk appliance to make it resilient against hardware failures. If you are using the Checkmk hardware appliance, it may be helpful to cluster your appliance to maintain high availability.
Ep. 32: Working with the Agent bakery in Checkmk
In this video, Robin demonstrates how to roll out agent packages with the required configuration for different monitored systems using the agent bakery in Checkmk. The "Automatic agent update" is quite a helpful feature as it pulls the latest configurations for an agent automatically and you don't need to manually update all of your agents deployed on different systems.
Ep 33: Monitoring Docker containers with Checkmk
Learn how to monitor Docker containers with Checkmk.In this video, Robin demonstrates the process of setting up a rule to configure the docker plugin and bake an agent with the desired settings for the Docker host.
Ep 34: Introduction to Checkmk Ansible collection
Last year the Checkmk Ansible collection was created to interact with the Checkmk REST API. In this video, Robin demonstrates how you can use this Ansible collection to automate your monitoring with Checkmk.
Ep 35: Monitoring SQL databases with Checkmk
In this video, Robin demonstrates how you can configure your Checkmk site to monitor your SQL databases. As there are many flavours of SQL databases, the process is mostly the same.