Ep. 29: New agent architecture in Checkmk 2.1

To load this YouTube video you are required to accept advertising cookies.

[0:00:00] Welcome to the Checkmk channel. Today, we are taking a look at the new agent architecture.
[0:00:15] So, the Checkmk agent has been around for quite some time and there haven't been major architecture changes. Of course, the agent has been extended but the general idea of how the agent is working never really changed.
[0:00:28] Let me quickly explain you how it worked. So, first there was a super server called xinet, which was used to listen on port 6556. And if a connection came in, this super server would then start the Checkmk agent script which in turn would collect all the monitoring data and send it back on that socket before it gets closed.
[0:00:53] Some years ago, systemd came around, which is nowadays the de facto init system for all major Linux distributions. So, we added support for that but the idea behind it is still the same.
[0:01:06] So, there is a systemd unit listening on port 6556 and if a connection comes in, it runs the agent script, fetches the information, gives it back to the Checkmk server.
[0:01:17] So, this is how it's been until version 2.1. With Checkmk 2.1, we changed this architecture.  We have 2 new components, one on the Checkmk server, and the other one on the Checkmk agent. On the server side, we have the so called Agent Receiver, which listens on a new port.
[0:01:40] It has to be separated from the web interface, so it's not port 80 and 443. It's a port starting at 8000.
[0:01:48] And on the agent side, we have the Agent Controller component, which takes care of all the communication between the Checkmk agent and the Checkmk server.
[0:02:02] On the agent side then, the Agent Controller talks to the actual Agent Script, which is running as a daemon on the Linux system that is being monitored through a unix socket, so the communication or the communication and the checking logic is separated by this unix socket, so to speak.
[0:02:23] So, the Agent Controller really only does the communication. It encrypts the communication using asymmetrical TLS encryption between the Checkmk server and the Checkmk agent.
[0:02:35] It also compresses the data that is being sent for a more efficient transport and it is possible to extend the Agent Controller in several ways.
[0:02:46] But right now, the two features that we implemented is the compression, and most importantly, the TLS encryption. So, on the checking site, nothing really has changed.
[0:02:57] It is still the agent that you know for years now. It's still the shell script that is doing the heavy lifting of monitoring. We just created clean systemd services and units that are taking care of all this. I already talked about the socket.
[0:03:13] The Checkmk agent is running as a service. We have a service that is taking care of asymmetrical sections of the agent and all of that comes together communicating through the unix socket.
[0:03:22] So, if you are talking to the agent from the Checkmk server, that goes through the agent controller. But it is more or less the same as you've known it before. There is still the connection on port 6556 and that's all there really is at the moment.
[0:03:38] For the future, there will be a special Checkmk edition that will include a so-called push agent which will be able to push data to the Checkmk server, reversing the way of communication that you know.
[0:03:51] That is targeted at Checkmk installations in the cloud where you do not want the Checkmk server to be able to query the agent directly, so there we can reverse the direction of communication and have the Checkmk agent push this information to the server.
[0:04:07] Okay, this was quite some technical talk and you didn't really see much. So, let's have a look at what is actually going on with the new agent.
[0:04:17] So, now we are looking at a freshly upgraded Checkmk site and a freshly upgraded Checkmk agent. And we can see here that there is a one unmonitored service which is new.
[0:04:30] It's the Checkmk agent service. And if we take a look at that service or rather we first inventorize it, just real quickly, then we can see that there is a warning state regarding the agent. So, let's take a look what the agent is telling us.
[0:04:51] And here we can see the agent is running on version 2.1. This is still a better version at the time of recording. And it says TLS is not activated on monitored host.
[0:05:03] So, what the agent actually does is it checks whether it would be possible to enable TLS encryption, and if it is, it will print this warning.
[0:05:13] So, what are the requirements for that? The requirement really just is that the agent is running or that the agent is at version 2.1 and it's running on a system with systemd because the agent controller has a requirement for systemd.
[0:05:27] Without systemd, there is no agent controller. That means on systems without systemd the Checkmk agent would be running in the legacy mode, which i'd explained earlier.
[0:05:39] And if systemd is available, then the Checkmk agent controller will be enrolled and activated and then we will get this warning. So, what do we do about it?
[0:05:52] First, of course, you can configure the service to just ignore this warning and to run in legacy mode. Because, of course, the agent will continue to work after the upgrade.
[0:06:03] It will not stop working if you upgrade your existing system, so you will still be able to monitor, but we definitely encourage you to enable the TLS encryption to be protected against a man in the middle attacks that would be listening in on your monitoring data.
[0:06:20] So, how do we do that? How do we activate TLS? For that, we need to go to the command line of the monitored system. And there we have the cmk agent controller. So, this is the binary managing, everything agent controller related.
[0:06:39] Now let's take a look what it can do. There is a lot of commands and sub-commands that you see here. The daemon mode is the one that the agent controller runs in the background and it's already running in the background right now.
[0:06:54] And we have the register command here, which is the one that we want to use. There is also a proxy-register which can be used to create registration information for air gap systems.
[0:07:06] So, if there is a system that cannot directly register at your monitoring server you can do that from any other system that can talk to the monitoring server and then import this configuration on the airgap system.
[0:07:20] But let's keep it simple in this example. We are just using the register command directly because we can talk to the Checkmk server. So, we have the register command and let's have a look what this command can do.
[0:07:35] And there we can see several options again. The important ones are the OPTIONS on the bottom. So, we need a hostname. Let's start with that. I called this system localhost, so the hostname really just is localhost.
[0:08:00] I need to give the site name with -i. I call my site agent, pretty basic. And with -u, we need to give the user that is allowed to register the agent. Now let's do this. Now we have presented with the server site certificate of the Checkmk server.
[0:08:22] We get information about it. We could actually verify the whole certificate against the server. But in this case I know who I'm talking to, so I'm going to confirm this here. Yes.
[0:08:33] Now we're being asked for the password for the user that we just gave. Let me provide that. And if there's no output in Linux, everything worked.
[0:08:43] So, we successfully registered the agent controller for TLS encryption with the Checkmk server. And now let's take a look at the status of the agent.
[0:08:56] Because we also have the status command for the agent controller. And there we can see the connection here is to localhost port 8000 to the agent site.
[0:09:09] The connection has a UUID and there is some more information about the certificate and the registration overall. And we can see the registration state is operational, which means everything is working as expected.
[0:09:24] You can actually register your host to several Checkmk servers. So, if you have, for example, a test site running, you can, of course, register the agent with that site too so that site would be able to query this agent too.
[0:09:38] On the other hand, if you have a site that is not registered with this agent, then it cannot query the agent.
[0:09:45] So, if there's no registration, the agent will not communicate with their site unless you're running in legacy mode as said earlier, then the old rules still apply if you have an IP white list that will be in effect and the classical encryption will also be in effect, of course.
[0:10:04] But once you start registering with the first registration, the agent controller then only accepts registered connections.
[0:10:13] So, after we did this, let's switch back to the web interface. And now we can see the Checkmk agent service already went green.
[0:10:21] So, there's nothing to complain about, which means we are communicating encrypted to the agent.  The communication is secured and no one else can intercept this communication.  Alright, that concludes this quick demonstration.
[0:10:38] I hope you could get some insight into the new agent. If you want to know more about it, take a look at our official guide. The link will be in the description.
[0:10:47] And with that, let me thank you for watching this video. Be sure to subscribe to never miss a video in the future. And I will see you next time.

Want to know more about Checkmk? Join us for our Introduction to Checkmk Webinar

Register now

More Checkmk Videos