Ep. 12: Acknowledging problems in Checkmk

[0:00:00]	Welcome back to the Checkmk channel and in this episode, we're going to take a look at acknowledging problems in Checkmk.
[0:00:16]	So what is a problem, simply said whenever a host or service does not have the status OK. So when a host has the status down or a service has the state is CRIT, WARN or Unknown. Checkmk distinguishes between handled and unhandled problems, whenever a problem is handled that means that the user knows about it and that someone has taken care of it and marking these problems as handled is what we call acknowledging the problem.
[0:00:45]	Whenever you acknowledge a problem it will have three effects: you will see an icon next to the service name indicating that the problem is acknowledged, you will stop receiving any notifications and it will be removed from certain views. For example, it will disappear from the unhandled problem section in the sidebar.
[0:01:03]	Now let's take a look how all of this works. Acknowledging problems works via commands to execute a command you first have to choose the service on which you want to execute the command. So let's now go to the service problems view and we can get there by using the sidebar and clicking on the unhandled problems here.
[0:01:26]	Here you see all the unhandled problems for this system. Let's choose the root file system of our Checkmk server for example. Now to acknowledge a problem we can simply click on the acknowledge problems button or we can go to the commands menu and click on acknowledge problems here.
[0:01:48]	And you see that there is a required field called Comment, you could use this for example to let your colleagues know how you are going to fix the problem. So let's type for example: ordered new disk and Acknowledge. Yes, confirm.
[0:02:10]	And let's go back to the view and now you see that there are two new icons here, you got this one (acknowledging) indicating that the problem has been acknowledged and you got the speech bubble and if you hover over it you'll see the message that we just typed in, 'ordered new disk'.
[0:02:30]	Now if we refresh the page or wait a little, we'll also see that here the unhandled problems is one less than we just had. So it's four now instead of five. So let's go back to that view again.
[0:02:47]	Now you see that there is just four unhandled problem. You can also acknowledge multiple problems at the same time and you could do that by simply clicking acknowledge problems here in this view.
[0:03:02]	This would acknowledge the problem for all of the services in this view. So we could do here and we type for example 'Load is OK' and now acknowledge and now you see here in this message 'Do we really want to acknowledge the problems for the following 4 services?' So let's confirm.
[0:03:30]	And now let's go back to the view and you see that all the problems have disappeared here. And once again in the sidebar there's no unhandled problems anymore. You might run into a situation where you want to remove an acknowledgement. To do that let's go to monitor and then 'Service problems'.
[0:03:54]	Here in this view you will see all handled and unhandled or unacknowledged problems. Once again you can remove the acknowledgement for a single problem or for all at once. To remove it for all at once, simply click on acknowledge problems and I click the 'Remove acknowledgement' button.
[0:04:17]	Once again you see the message (if) 'Do you really want to remove the acknowledgement for all 5 services?' Yes, we want to so press 'Confirm'. Let's head back to the service problems view and now you'll see that the two icons have disappeared.So now also we have five unhandled problems again. Like I said you can do that this for one service at a time or for all at once but you can also make a sub-selection and perform the command on a few problems at a time.
[0:04:48]	To do that you could use a filter to filter out some of the problems and then apply the acknowledgement on that filtered view or you can use the checkboxes like this. And you could, for example, pick these three, press 'Acknowledge problems', type in your comment and press Confirm.
[0:05:23]	Now you have two links here instead of one. If you want to return to the previous view with the select boxes still checked then you can press 'Back to view', if you want to reset the checkboxes press the second link. So if you want to perform another command on the same services, you could go back to view with the checkboxes still selected.
[0:05:52]	You might have noticed that there were some additional options when acknowledging problems which I haven't explained yet. So now let's take a look at those because they can be quite useful. So let's go back to the unhandled problems again.
[0:06:07]	And click on Acknowledge problems, now you see that there are three checkboxes here and the possibility to add a date in days hours and minutes. Let's start with this. So when you set a time here you basically set an expiration date for the acknowledgement.
[0:06:27]	For example, when you expect a problem to be resolved in one day you can configure that here. Then after that day if the problem still exists it will reappear under unhandled problems.
[0:06:38]	And you will once again start receiving notifications. However, this feature is only available in the enterprise edition and in the free version of the enterprise edition.
[0:06:48]	Then there is also this checkbox called 'sticky', this is a bit more subtle. It can happen that a service first goes to WARN and later to a CRIT state, the question then is what happens with the acknowledgement? For example, let's say that the file system server should go to warning when it's 80% full, but then later on it goes to the CRIT state.
[0:07:13]	Should it be a new unhandled problem? or should it still be acknowledged? When you want it to appear as a new problem then you need to make sure that this checkbox is not selected.
[0:07:26]	If it is selected then the problem will be acknowledged until the service goes back to OK. The send notifications checkbox simply means that a notification will be sent (to the responsive people) to the responsible people notifying them that the problem has been acknowledged.
[0:07:46]	We'll make a separate video about notifications sometime in the future. Then lastly we have the persistent comment feature. If this is selected that means that the comment will not disappear when the acknowledgement does.
[0:08:03]	Normally they are connected in such a way that when a service goes back to OK both the acknowledgement and the comment disappear but with this you can basically pin the comment to the service. This also means that you would have to remove it manually when you don't need it anymore.
[0:08:24]	So that was it for acknowledging problems in Checkmk. I hope this episode was helpful to you. If so, please subscribe to the channel and like the video. See you in the next episode.