Proxmox is a popular Linux-based server virtualization platform. The open source-based software combines KVM (Kernel-based Virtual Machine) and LXC (Linux containers), two virtualization techniques for operating virtual machines (VMs) and containers. Containers, VMs, storage and virtual networks on the host systems can be easily managed via the Proxmox configuration interface. You can operate Proxmox either as a single node or run several Proxmox servers in a cluster, which is generally the practice.
To ensure smooth operation of the containers and VMs managed via Proxmox, it is advisable or even necessary to include the Proxmox nodes or clusters in your IT infrastructure monitoring. With Checkmk you have the capability to monitor all the key metrics of your Proxmox environment and thus identify problems or bottlenecks in a timely manner.
Preparing a Proxmox server for monitoring
To do this, it is first necessary to install the Checkmk Linux monitoring agent on your Proxmox nodes. In our manual, you will find instructions on how to install the Linux agent. You can then query the monitoring data via port 6556 in the local network or – if it is not in the local network – in encrypted form via SSH using the public IP address. In addition to the agent, you should configure and use the Proxmox VE special agent in Checkmk.
First, however, you will need to do some preliminary work in the Proxmox configuration interface and set up API access. To do this, create a user 'checkmk' with the realm 'pve' for your cluster under the 'Permissions' tab. This user is authenticated by Proxmox and not by the underlying Debian.
Next, assign the created user to the group 'Read only'.
Finally, you must now click on 'Permissions' (still at the cluster level) and define the desired path for the 'Read only' group. This path determines the starting point for the API, so to speak. In our example, this is /, i.e. the root directory. We define 'PVEAuditor' as the role for the group here. In combination with the root path, the user may now see or read everything on the node – but has no further rights.
Note: To simplify monitoring at a later point, it is recommended that you optimize your network monitoring with Checkmk, including an efficient monitoring of your interfaces. This way, all interfaces in your IT infrastructure will have a meaningful name. In our blog article 'Three rules to rule them all' you will find detailed instructions on how to set up an efficient network monitoring in Checkmk with just three rules.
Adding Proxmox nodes to the monitoring
Once you have completed the preliminary work, you can now add your Proxmox nodes to the monitoring with Checkmk. It is important that you select Configured API integrations and Checkmk agent in the 'Monitoring agents' section when adding the host.
Then configure the API integration (special agent). To do this, search for 'Proxmox' in the Setup menu and select the Proxmox VE rule under Agents ➳ VM, Cloud, Container. Now enter the user you created in Proxmox under 'Username'. Under 'Explicit hosts', enter the nodes of your Proxmox cluster that Checkmk should monitor via the API integration. After saving the rule and activating the changes, Checkmk should now find all of the services on your Proxmox hosts, including the various interfaces.
I followed the three rules to rule them all blog I mentioned before and as result I am not ending up with names like 'Interface 1', 'Interface 2'. Instead, they should be the same as they are in the Linux kernel, for example 'Interface ens16'. This allows you to see which interfaces belong to the local node – these start with 'bond' or 'eno', for example – and which have been created by the virtualization for the VMs. The VM interfaces start with 'fw' and 'tap' depending on the configuration.
Configuring a Proxmox cluster in the monitoring
As already mentioned, Proxmox servers can be operated either as single nodes or in a cluster, which is the standard practice. You should also map the Proxmox cluster in your monitoring, otherwise the VM interfaces mentioned above will 'wander' between the nodes when you move a VM or a container from one node to another.
To do this, go to Hosts in the Setup menu and then click on create new cluster under the Host tab. There you define a name for your cluster under Hostname and then add all the Proxmox servers you want to assign to the cluster under Nodes. It is important that you set the drop-down menu under IP address family to No IP. Although the individual Proxmox nodes have their own IP, the Proxmox cluster itself does not. Therefore, it is not necessary or possible to ping the cluster.
It is also important that you set the Checkmk agent / API integrations option under Monitoring agents to the same setting that you have already set for the individual nodes: Configured API integrations and Checkmk agent. If the cluster and nodes do not have the same data source, you will receive an error message. Finally, save the cluster.
Assigning services to a cluster
Now enter 'clustered' into the search field in the Setup menu and under Service monitoring rules select the Clustered services rule. In the following menu, activate the checkbox for Explicit hosts and add your Proxmox nodes to the row. In our example screenshot, these are the nodes 'pve-xyz-001.checkmk.com', 'pve-xyz-002.checkmk.com' and 'pve-xyz-003.checkmk.com'.
Now activate the checkbox for Services and enter the search prefixes Interface fw and Interface tap in the fields. As already mentioned, these are the interfaces of a VM or a container. Since as it states in the description, Checkmk applies the rule to the nodes and not to the cluster host, it now assigns these interfaces found on the nodes to the cluster.
By assigning the container or VM interfaces to a cluster, the services can move between the individual nodes, as already mentioned, without Checkmk showing these interfaces as Vanished Services on the former node or recognizing them as new services on the 'new' node. This way you do not have to remove the interface services on one node and add them on the other, as they remain part of the cluster. Once you have saved the rule and activated the changes, Service Discovery will now recognize the interfaces in question on the cluster host.
Note: The assignment of the interfaces at the cluster level however only works if you have followed the instructions 'Three rules to rule them all' in the link referenced above. Only then will you be able to filter the interfaces in Checkmk in a meaningful way, as the interfaces will have then received the required naming.
With this configuration, Checkmk offers you another advantage: In the cluster's host view, under Service, you will now see something like the interface 'fwbr' with a numeric suffix. This number is the VM ID. In the Summary, you can also see on which node the interface is located. For example, you can see at a glance that the VM with the ID 100 is located on node 'pve-xyz-002.checkmk.com'. In this way, Checkmk provides you with a comprehensive monitoring of your Proxmox environment.
The special case of volatile VMs
If you have very volatile VMs in your Proxmox environment – for instance, test VMs that you do not want to monitor – I would recommend the following procedure: Create a VM or container in Proxmox with a custom VM ID that is outside your normal range of numbers, such as 600-699. The interfaces of these VMs appear in Checkmk as 'Interface tap666i0', for example. With the Disabled services rule in the Setup menu, you can use the regex 'Interface fw6/d/d' and 'Interface tap6/d/d' to match these interfaces and thus exclude them from your monitoring.
Further advantages of Proxmox monitoring with Checkmk
Thanks to the piggyback mechanism provided by the API integration, Checkmk can also retrieve information from the VMs – the storage space or memory used by the VM for instance. You also get insight into how much of the allocated memory the container is already using or what its backup status is. If you are not permitted to install an agent in a VM, Checkmk will at least provide you with the most important information about the VM and you will always have an overview of the current backup status.
If you use Ceph storage for your Proxmox server, I also recommend the Ceph statistics plug-in from our Exchange. This provides information on Ceph statistics, pools, OSDs and the general status. Checkmk additionally comes with some official plug-ins for Proxmox monitoring. You can customize the default thresholds for your Proxmox services by clicking on Parameters for this service in the hamburger menu for the respective service.