Ep. 17: Working with network topologies in Checkmk

To load this YouTube video you are required to accept advertising cookies.


Welcome back to the Checkmk channel and in this episode, we're taking a look at parents and network topology.

[0:00:16] Today's subjects are the parents and a network topology. And I'll show you how you can use them to prevent unnecessary notifications.
[0:00:24] Once again we'll be using our test system for this video. So once again we're back in our small distributed setup. We have a total of 10 hosts. Let's quickly go to Monitor and All hosts.
[0:00:38] So we have four database servers. We have two Checkmk servers. We got a web server and three switches. Now let's go to Monitor and then Network topology.And here we have this cloud-like diagram.
[0:00:56] Each of our Checkmk sites are depicted by this purple node. And then all the nodes around it, these are the hosts that are being monitored by that site.
[0:01:10] All the green notes, these are hosts which are up, which currently don't have a problem. Then there is this red host which is down. But other than that, this whole thing doesn't tell me a lot. So now let's create a more interesting topology and talk about why we will do this.
[0:01:31] Besides our database servers or web server and Checkmk servers, we also have three switches. And now let's assume that these four database servers are only accessible via switch1. Then what would happen if that switch were to go down?
[0:01:49] Well, then our four database servers would no longer be accessible for a monitoring system and would also get the status 'Down'. And that will result in a lot of notifications, not only for the switch but also for the database servers.
[0:02:06] It would be more useful to receive a notification regarding the issue of the switch, including the information that the database servers downstream are no longer accessible for the monitoring system. And for this, there is a status in Checkmk called unreachable.
[0:02:21] And if that's the case, then you will normally not receive any notifications regarding the database servers because you already received one for the switch.
[0:02:30] And it could be the case that the database server is working perfectly fine and even accessible for the users just not for the monitoring system. And to make this work, we need to configure in Checkmk, that the database servers are being accessed over the switch. And for that, there is a property called Parents. So, now let's go ahead and edit one of our hosts.
[0:02:56] As always go to Setup and then Hosts and then to the Munich folder, Servers, and let's edit db-2-server.
[0:03:11] Now there is this property called Parents. Let's check the box and now we can select the parent host and we'll pick 'switch1'. Now let's simply save and then activate our changes. And if we now head back to our Network topology, Monitor and then Network topology, you see that it's still the same diagram.
[0:03:44] And in order to see the children we have to increase the number of hops. So now you see that db-server-2 is connected to switch 1. And the direction of the lines indicates in which direction the data is flowing.
[0:04:03] You see here, it goes from db-server-2 to switch1 and then to the monitoring site. And now Checkmk knows that in order to reach a database server it needs to go through this switch. So when a switch would go down, it knows that it should not be able to reach the database server. And now let's do the same for all other database servers. Let's head back to the host configuration.
[0:04:35] First, we will remove the parent from db-server-2. Save. And now instead of setting the parent explicitly on all servers we can also go back to Munich here and then edit the folder Servers. So now let's set the Parents on the folder level, switch1, again Save & activate the changes.
[0:05:16] And now let's go back to our Network topology. Increase the number of hops again. And now you see that every single host in that folder is connected to switch1. Whenever we add a new host to that folder, it will automatically or Checkmk will automatically know that it's only accessible over switch 1.
[0:05:48] And next thing we should do is try if this actually works. And how can we do that? Well, switch1 has to go down so then we can see what would happen. But because this is a test system, I cannot simply turn off the switch. We have to do a little trick. 
[0:06:11] If we right click on the host and then go to details of host and then edit the host property so click on the host name, then Hosts configuration. Instead of turning off the switch we will simply change the IP address a little. Activate changes.
[0:06:44] Now our host or Checkmk should assume that our host is down. Increase the number of hops, now you see switch 1 is down. You also see that db-server-1 is now unreachable. It was previously down but now Checkmk assumes that it's down, likely because of the fact that switch 1 is down.
[0:07:10] You might ask yourself why the other three database servers did not go to the unreachable state. Normally in a real system they probably would but because this is a test system and Checkmk is still able to ping these hosts, they will remain up.
[0:07:27] So that means if Checkmk is still able to reach the host even though that the parent is down. They will remain up and Checkmk will keep monitoring them as usual.
[0:07:42] But what if you have a system with redundant switches? If the hosts were accessible over more than one switch, well, then as long as one of the switches is functional, then the host behind it should still be accessible.
[0:07:56] And in that case, you won't have a tree structure but a meshed structure and you can also configure this in Checkmk. So once again let's go to the host configuration. Setup --> Hosts. And let's go back to the folder we were editing.
[0:08:12] And you might have seen this already before, here at the Parents you can select multiple parents. So let's add one, let's add switch2. Once again, save and activate. Now let's head back to the Network topology. Increase the number of hops. 
[0:08:42] And you see that it became a bit more cluttered and that's because all these servers are now connected to both switch1 and to switch2. So switch2 and switch1 are connected to the site and all other servers are connected to these switches.
[0:09:00] And you'll also see that now db-server-1 is no longer unreachable but down. Because Checkmk sees that it should be accessible over switch1 and  switch2 and one of them is functional, so it knows that the problem lies with the host itself with db-server-1 and not with the switch.
[0:09:25] As you can see you have the ability in Checkmk to map your network topology primarily to avoid unnecessary alerts. And when we talk about network topology in Checkmk then we mean the topology from the point of view of the monitoring system.
[0:09:40] So how Checkmk reaches the hosts and not your network topology in general. And if you have a distributed setup, then every site has its own topology and that's because the monitoring takes place locally from the site.
[0:09:56] If you configure your network topology and Checkmk correctly, you can avoid a lot of unnecessary alerts. And this is absolutely recommended, especially if you have a few sites with a large amount of hosts that are only accessible over a few routers or switches.
[0:10:14] So that was it again for this episode about network topology. I hope it was helpful to you. If so, subscribe to our channel and like the video. See you next time.

Want to know more about Checkmk? Join us for our Introduction to Checkmk Webinar

Register now

More Checkmk Videos