1. What is SNMP?
1.1. SNMP instead of a Checkmk agent
Routers, switches, firewalls, printers, appliances, UPSs, hardware sensors, and many other devices do not allow the installation of a Checkmk agent. They do however already have a built-in interface for monitoring provided by their manufacturer – an SNMP agent. This agent can be accessed via the Simple Network Management Protocol (SNMP). Checkmk uses SNMP to monitor such devices. The benefit for you is that setting up the monitoring is very easy.
Incidentally, there are also SNMP agents for Windows and Linux. However using these instead of the Checkmk agent is not recommended. SNMP is not very performant, so using this for monitoring generally means that the Checkmk server needs more CPU and memory per host than when working with its own agent. In addition the data provided via SNMP are incomplete.
Monitoring SNMP devices with Checkmk is very easy. If you just want to get started quickly with SNMP, you will probably need to read the short section on SNMP in the Getting Started Guide. This article goes into much more depth, and shows you all of the details and special cases when SNMP monitoring with Checkmk.
1.2. SNMP Versions
The SNMP protocol is available in different versions. These are all incompatible with each other, and so the monitoring system and the monitored device must always use the same protocol version consistently. Checkmk supports versions v1, v2c, and v3. In practice, an estimated 99% of installations use v2c. Here is an overview of all relevant versions of SNMP:
|Version||Features||Checkmk||Description and application in practice|
|v1||yes||Use only on very old (say, 10 years and older) devices that do not support v2c, or their support for v2c is defective.|
|yes||This is the standard in practice. v2c is a ‘light’ variant of v2 and the ‘c’ here stands for Community, which performs the role of a password in SNMP. The 64-bit counters are essential in monitoring switch ports with 1 Gbps and more. The bulk queries accelerate monitoring by up to a factor of 10.|
|v2||Security||no||Version 2 offers even better security options in addition to the features of v2c. Version 2 of SNMP is not found in practice, therefore CMK does not support this protocol version. If you need security use version 3 instead.
Attention: Since the ‘real’ version 2 has no relevance, many masks in Checkmk refer simply to v2, but always really mean v2c.
|yes||Version 3 is used when encrypting SNMP traffic. With v2c and v1 this runs in plain text – including in the community. In practice, version 3 is rather less common, because this version requires significantly more computing power, and also the cost of the configuration is significantly higher than with v2c. The Contexts are a concept in which different information is visible in the same area of the SNMP data structure (OID), depending on the context ID. This would be used for partioning of fibre-channel-switches for example.|
Checkmk uses active requests for SNMP monitoring. Checkmk sends a UDP packet (port 161) with an SNMP request to the device requesting the provision of specific data. The device then responds with a UDP packet containing the response data (or an error message).
But SNMP has a second variation: SNMP Traps. These are spontaneous messages sent by devices at configured addresses via UDP (Port 162). Traps have many disadvantages over active requests, which is why they are not very important to monitoring. Some of the disadvantages are:
- Traps are not reliable. UDP packets can be lost. There is no confirmation of receipt.
- Mostly only Error messages are sent, but no Recovery messages. Thus the current status in the monitoring is unclear.
- If thousands of switches simultaneously send traps (for example, if an important upstream service is not available for them), the trap receiver will not be able to handle it and will break under the load. Monitoring is then overloaded when you need it most.
- When changing the trap receiver’s IP address, all devices must be reconfigured.
- Traps are hard to test. Few devices even have a function to send a generic test trap – let alone test real error messages. Therefore it is difficult to predict whether an important trap will be processed correctly when its first invocation is after a few months or years.
However if you want or need to work with traps, the Event Console provides a solution. This can receive traps and generate events from them.
2. Setting-up SNMP in Checkmk
2.1. Preparing a device
The first step is to prepare the device. Each device that supports SNMP has its own configuration mask somewhere in its configuration. Make the following settings in this configuration mask:
- Go to Configuration for Active Queries (SNMP GET). (Please do not confuse this with the traps – the terminology in the configuration dialogs can be very confusing).
- Enable SNMP for reading requests.
- Enter the addresses of your Checkmk servers as the permitted IP addresses. It may also be useful to provide a test instance of Checkmk here. Important: If you have multiple redundant Checkmk servers, do not forget to also specify the IP address(es) used after a failover. In the case of the Checkmk appliance in particular, this uses the IP address of the active node as the source IP address for outgoing connections – and not the service IP address. In a distributed environment the IP address of the slave site from which the device is monitored is critical.
- Assign a Community if protocol versions v1 and v2c are to be used.
The Community is a kind of password, except that there is no user name for SNMP. There is a convention that the community is public. This is the default for many devices – and also for Checkmk. Of course you can argue that this is insecure and that you should specify another community. This certainly makes sense, but you should know that SNMP transmits the community in plain text (except for SNMP Version 3). Anyone who can listen to packets can therefore very easily identify the community. On the other hand you have limited access to read-only access, and most of the information that can be retrieved via SNMP is not very critical.
Furthermore, the use of different communities per device is very cumbersome to handle. After all these must not only be maintained in the devices, but also in the monitoring system. That is why in practice users usually use the same community everywhere – or at least everywhere within a region, department, computer center, etc.
Tip: If you want to increase the security even without SNMP version 3, it makes sense to extend the network concept so that you put the traffic with the management services, and thus also SNMP, in a separate management VLAN and secure the access with the firewall.
2.2. Adding a device into Checkmk
As usual, add the monitored devices as hosts in Checkmk. If you have chosen your folder structure so that only one folder contains SNMP devices, you can make the other settings directly in the folder. This makes it easier to add additional hosts later, and also avoids errors.
Now in the properties of the host (or folder), in the Data sources box set Check_MK Agent to No agent. An exception to this would be if you want to monitor a host simultaneously with a normal Checkmk agent and SNMP. There is occasionally a reason for this – namely, that on a server you have installed a manufacturer’s hardware monitoring agent which provides its data via SNMP, which is the case with Fujitsu ServerView for example.
In the same box, also activate SNMP and as the SNMP Protocol select SNMP v2 or v3. The selection of protocol version 1 is an makeshift solution only for very old devices. You should use this only if you know that v2 is really not supported, or the implementation for the device is defective (in practice, only in isolated cases). Above all, SNMP version 1 is very slow because it does not support bulk accesses. This difference is very significant.
The third and final setting is called SNMP credentials. Here again a choice of the protocol version is necessary, since v2c and v3 differ here. We will discuss version 3 below. If you do not have very high security requirements, you will be well-served by version 2c. Version 2c requires the entry of the Community as discussed above.
There is an alternative way to configure the SNMP-credentials, if you can not easily pass it through your folder structure: the Access to Agents ➳ SNMP credentials of monitored hosts Rule Set. This will allow you to assign the credentials based on host tags, labels and similar properties. The principle is that a community that is set directly at the host or folder always takes precedence over the rules.
When you have finished with the settings, you can make a short detour via the diagnostics page. To do this save with the Save & Test button. Here is an example of the diagnostics for a switch. Various protocol versions of SNMP are tried simultaneously, namely:
- SNMP v1
- SNMP v2c
- SNMP v2c without Bulk Queries
- SNMP v3
A normal, modern device should respond to all four variants with the same data – however this may be limited depending on the configuration. The result will look like this:
The four information outputs are described here:
|sysDescr||The description of the device as it is hard-coded in the device firmware by the manufacturer. This text is very important to Checkmk for automatic service discovery.|
|sysContact||This field is for specifying a contact person and is defined by you in the device configuration.|
|sysName||Here is the host name of the device. This field is also configured on the device. For the actual monitoring the name plays no further role and is only displayed for information. However, it makes sense and is helpful if the host name here matches the host name in Checkmk.|
|sysLocation||This is a field for a free text entry – purely for information – in which you can enter the location of the device.|
2.4. The service configuration
Special features of SNMP devices
After saving the host properties (and optionally the diagnostics), the usual next step is the configuration of services. There are some peculiarities with this, because internally the service recognition is done very differently in SNMP devices compared to hosts, which are monitored with the Checkmk agent – Checkmk can simply look at the agent’s output and find the items of interest using the individual check plug-ins. With SNMP a little more work is necessary. Although Checkmk could perform a detection and generate a full output of all SNMP data (SNMP Walk), and in this look for interesting information, but there are devices for which a single detection would take several hours!
However Checkmk has a smarter approach. Initially, it only retrieves the very first two records (OIDs) from the device – the sysDescr and sysObjectID. Thereafter, as needed, further queries are invoked. Based on the results, each of the nearly 1,000 supplied SNMP check plug-ins decides whether the device actually supports this plug-in. Checkmk calls this phase the SNMP scan, and as a result the software produces a list of check plug-ins that serve as candidates for the actual service discovery.
In a second step the actual detection runs. The plug-ins found retrieve the exact data they need using local SNMP queries, and use this data to determine the services to be monitored. The data retrieved are precisely those which will later be fetched regularly for monitoring.
For devices in a LAN the whole process usually does not take very long – more than several seconds would be an exception. If you monitor devices over high-latency WAN links however, the entire scan may take several minutes. A scan also takes longer for switches with hundreds of ports of course. Now it would be very impractical if you had to wait so long every time you open the services’ site.
Therefore WATO normally skips the scan, and does the detection only with the check plug-ins already in use at the host. The SNMP Walks are then already available as cache files through the normal monitoring, and their detection thus takes fractions of a second. With this you will be able to find new items from existing plug-ins (for example, new switch ports, hard disks, sensors, VPNs, etc.), but not find brand new plug-ins.
The Full scan button forces an SNMP scan and fetches fresh data via SNMP. As a result services from completely new plug-ins are also found. It may be necessary to wait for slow-responding devices.
No matter which device you monitor via SNMP – as a minimum the following three services should appear in the configuration:
The first service is a check that monitors the network ports. At least one must have the device and be active – otherwise SNMP would not function. In general Checkmk is preset so that it includes all ports that are active at the time of service detection (operational status ‘up’) in the monitoring. You can influence this with the Parameters for discovered services ➳ Discovery – automatic service detection ➳ Network Interface and Switch Port Discovery set of rules.
By the way, in the beginner’s manual you will find a chapter on best practices when monitoring switch ports.
The second is the SNMP Info service which displays the same four pieces of information that you saw in the diagnosis. This has a purely informal function and is always OK.
3. When devices create problems
3.1. A defective SNMP-Implementation
It actually seems as if any conceivable mistake that can theoretically be made when implementing SNMP has already been made by some manufacturer at some point! And so there are devices with which SNMP works reasonably well, but certain parts of the protocol do not, or have been incorrectly implemented.
No response for a request to sysDescr
One possible error is when SNMP agents fail to respond to the request for standard information – no reply to the sysDescr for example. These devices are as good as dead in a diagnosis, and they will not deliver any results to a service recognition if you don’t help them with a special configuration. To do this, for affected hosts create a rule under Access to agents ➳ Hosts without system description OID with Positive outcome. Checkmk then simply assumes that everything is fine and skip the test with the sysDescr. Although no check plug-ins will be detected that expect specific parts in this text, in practice this does not matter as the affected plug-ins are designed to accommodate such a condition.
V2c works, but bulk-requests fail
Some devices support version v2c – and will provide an answer to this in the diagnostics – however, the implementation of the GetBulk command is missing in the protocol. This is used by Checkmk to get as much information as possible with a single request and is very important for the performance.
With such a host, some simple SNMP checks will work – such as SNMP Info or SNMP Uptime, but other services will be missing – especially the network interfaces that must be present on each device.
If you actually have a host where this is the case, you can run it with v2c, but without bulk requests. Configure such a host as follows:
- Set the SNMP version for the host properties to SNMP v1
- In the Access to agents ➳ Legacy SNMP devices using SNMP v2c rule chain, create a rule for the host, and set the value typically to Positive match (Add matching hosts to the set).
This forces the host to use the SNMP v2c protocol – although version 1 has been set – however without Bulkwalk. Incidentally, we do not recommend the use of SNMP v1 – even if that is supported – because it does not support 64-bit counters. This can lead to missing or erroneous measurement data for network ports which are subject to heavy traffic.
Devices which respond very slowly
There are some devices with which some SNMP queries need a very long time. This is partly due to incorrect implementations. Here it can sometimes help to go back to SNMP v1 – which is usually much slower, but can still sometimes be faster than a broken SNMP v2c. Before you try this however, you should check whether the manufacturer provides a firmware upgrade that solves the problem.
A second cause may be that the device has very many switch ports, and also a slow SNMP implementation. If you only want to monitor very few of the ports (only the first two ports, for example), you can manually limit Checkmk to polling only specified ports. Details can be found below in Performance.
3.2. Only the standard services are found
You have included an SNMP device in the monitoring but Checkmk recognizes only the SNMP Info and SNMP Uptime services and the interfaces. This can be due to a number of causes:
a) There are no plug-ins
Checkmk provides nearly 1,000 check plug-ins for SNMP devices, but even this list is naturally never complete. Over and over again it is found that for certain devices Checkmk does not provide any specific plug-in, meaning you can only monitor the standard services as mentioned. Here you have the following options:
- You might find a suitable plug-in on the Checkmk Exchange, where users can upload their own plug-ins.
- You can develop your own plug-ins. Information on writing your own plug-ins can be found in several articles in the manual.
- You contact our support team or one of our partners and request that they develop suitable plug-ins.
b) The plug-ins cannot be recognised
It sometimes occurs that a new firmware version for a device results in Checkmk plug-ins no longer recognizing the device – e.g. because a text has changed in the system description for the device. In such a case the existing plug-ins must be adapted. Contact our support team for this.
c) The device does not deliver the required data
Some (few) devices have the ability to individually-configure access to specific information areas in their SNMP configuration. Your device may be set to deliver the default information, but not that for the device-specific services.
On a few devices you must use SNMP v3 and Contexts to get the data you want.
3.3. Devices that do not respond at all to SNMP
If the ping works, but none of the SNMP protocol versions work, there can be several possible causes:
- The device is not reachable via IP at all. You can check this with the Ping Test (first box).
- The device does not support SNMP at all.
- The SNMP share is not configured correctly (activation, allowed addresses, Community).
- A firewall blocks SNMP. You need UDP port 161 to be active.
4. SNMP v3
By default SNMP is unencrypted and is therefore very poorly-authenticated by a Community transmitted as plain-text over the network. This level may still be sufficient for a local, isolated network, as here monitoring is limited to accessing read-only operations.
If you still want a higher level of security you will need SNMP version 3. This provides the possibility of encryption and genuine authentication. For this however a corresponding configuration is also necessary.
SNMP v3 recognises various levels of security:
|noAuthNoPriv||No real, user-based authentication, no encryption. Nonetheless, the advantage over v2c is that the password is no longer transmitted in plain text, instead it is hashed.|
|authNoPriv||User-based authentication with a name (Security name) and a password, but no encryption.|
|authPriv||User-based authentication as with authNoPriv, and additionally all data is encrypted. Here you have to manually exchange a Key – that is, deposit the Key both in the device and in Checkmk.|
The necessary setting in Checkmk is made in the same place where you also defined the Community – either under the host tags or the SNMP credentials of monitored hosts rule set. There, instead of SNMP Community, select one of the three levels of v3 and configure the necessary values:
SNMP v3 introduces the concept of Contexts. A device can show different information at one and the same point in the SNMP tree – depending on which Context ID is given in the query.
If you have a device that works with such contexts, you will need two settings in Checkmk:
- First, the device must be queried using SNMP v3 (as described in the previous section).
- Then you need another rule in the rule set SNMPv3 contexts to use in requests. Here you select the check plug-in for which contexts are to be activated, and then the list of contexts that should be queried in the monitoring.
Luckily there are very few situations in which you have to work with contexts, because unfortunately it is not possible for the monitoring to recognize them automatically. A manual configuration of the contexts is always necessary.
5. Performance and Timing
Performance always plays a role – especially in environments with many hosts – and monitoring with SNMP consumes more CPU and memory than with Checkmk agents.
While the Raw Edition makes SNMP requests in the classic way via the snmpget or snmpbulkwalk command-line commands, the Enterprise Editions have a built-in SNMP engine that performs SNMP requests very efficiently without generating any extra processes. With this, CPU consumption for SNMP processing is approximately halved. The shorter polling times also reduce the number of Checkmk processes needed concurrently, and thus the memory usage.
5.2. Check intervals for SNMP checks
If your resources reach their limits, or if it takes more than 60 seconds to poll a single device, you can reduce the interval at which Checkmk queries the host(s).
With the Normal check interval for service checks rule set, which you apply specifically to the Check_MK services of hosts, you can extend the general interval of one minute to, for example, 2 or 5 minutes.
Especially for SNMP checks, there is also the rule set Check intervals for SNMP checks. This allows you to lower the interval for individual check plug-ins. It is important to know that you can never set the interval to faster than the interval for general monitoring by the Check_MK service.
5.3. Timing settings for SNMP access
By default Checkmk expects a response in less than one-second for an SNMP request. It also tries a total of three times before giving up. For devices that respond very slowly, or that can only be reached over a very slow network, it may be necessary to change these parameters. You do this through the Timing settings for SNMP access rule set:
Please note that these settings apply to an individual SNMP request. The complete process of monitoring a host consists of many separate requests. The total timeout is therefore a multiple of the settings specified here.
5.4. Bulk walk: Number of OIDs per bulk
By default SNMP transmits 10 responses in one packet per GetBulk request. Try the Bulk walk: Number of OIDs per bulk experimental rules chain to see if a higher value performs better. However this will only be the case when large tables are transferred to the host – e.g., if it is a switch with many ports.
SNMP always fills the packets up to the specified number, including any records following the actual required ones. And if only a few of these records are really needed, extra data is transferred uselessly and the overhead increases.
5.5. Limiting OID-Ranges
Checkmk normally works by always getting the information on all switch ports, even though not all are actually being monitored. This is a good thing anyway, since this is normally faster because single queries cannot be done with the efficient bulk queries. In addition, from our point of view, it is always advisable to monitor all ports in order to find faulty ports or cables with high error rates. If ports are not reliably UP, you can also flag the link status DOWN as being OK.
However, there are isolated cases where switches have very many ports, and which for some reason respond very slowly, or process SNMP very inefficiently, so that it is no longer possible to monitor with complete retrieval of all port information.
For such cases, there is the Limit SNMP OID ranges rule chain. This allows you to statically limit the list of queried data (e.g., ports). In the rule’s value, for each particular check plug-in you specify which indexes of the respective table are to be fetched.
The usual plug-in for switchports is called SNMP interface check with 64bit counters. The following example shows a setting in which only the first two ports are fetched via SNMP:
Note: This filtering is then in effect before the service detection and monitoring. Depending on the switch port discovery setting, this does not automatically mean that these two ports really are monitored.
6. Simulation through SNMP-Walks
The CMK SNMP engine has a very handy feature – you can have a monitored device write a complete snapshot of all its SNMP data to a file, an SNMP-Walk. You can use this file later to simulate monitoring the device on another Checkmk server, even if this other server has no actual network connection to the device.
We use this feature very intensively, for example, when our support team is developing new check plug-ins for our customers. Our developers therefore do not need access to your devices – just an SNMP walk.
6.2. Creating a Walk via the GUI
You can create an SNMP walk directly from the GUI. This function can be found in the Check_MK host’s service context menu, and also the Download SNMP walk option in the host’s menu:
6.3. Creating a Walk from the Command line
Alternatively, you can also create walks from the command line. Log on to the instance from which the device is being monitored. The creation of the walk is simply done with the cmk --snmpwalk command and the specified host (which must be configured in monitoring):
OMD[mysite]:~$ cmk --snmpwalk myswitch01
Also use the -v switch to see more detailed output on the progress:
OMD[mysite]:~$ cmk -v --snmpwalk myswitch01 myswitch01: Walk on ".18.104.22.168.2.1"...3664 variables. Walk on ".22.214.171.124.4.1"...5791 variables. Wrote fetched data to /omd/sites/mysite/var/check_mk/snmpwalks/myswitch01.
The file will be placed in the var/check_mk/snmpwalks directory, where it simply carries the name of the host. It is a text file. If you are curious you can view this – e.g., with less – and quit the program with the Q key:
OMD[mysite]:~$ less var/check_mk/snmpwalks/myswitch01 .126.96.36.199.188.8.131.52.0 JetStream 24-Port Gigabit L2 Managed Switch with 4 Combo SFP Slots .184.108.40.206.220.127.116.11.0 .18.104.22.168.4.1.11822.214.171.124 .126.96.36.199.188.8.131.52.0 560840147 .184.108.40.206.220.127.116.11.0 email@example.com .18.104.22.168.22.214.171.124.0 MKSW001 .126.96.36.199.188.8.131.52.0 Core Switch Serverraum klein .184.108.40.206.220.127.116.11.0 3 .18.104.22.168.22.214.171.124.0 27
The command cmk --snmpwalk has some more useful options:
When Checkmk performs a walk on a host, it generally retrieves two subtrees from
the SNMP data area. These are specified in the SNMP tree using so-called OIDs
(object identifiers). These are MIB-2 and enterprises – that is,
on the one hand a standard area that is normalized and the same for all SNMP devices,
and on the other hand a manufacturer-specific area.
If SNMP is implemented correctly, this should cause the device to send all data that it provides. If this is not the case and you are looking for a specific range, you can add its OID to the walk with this option, e.g. cmk --snmpwalk --extraoid .126.96.36.199 myswitch01. Don’t forget the ‘period’ at the beginning of the OID.
|--oid||This option is similar to --extraoid, but only retrieves the specified OID. This is of interest for testing purposes. Note, however, that the walk will be incomplete.|
|-v||The v stands for verbose and will output some interesting information during the walk.|
|-vv||The vv stands for very verbose and outputs much more information.|
6.4. Using saved walks for simulations
If you want to use this walk on a different (or the same) Checkmk instance for a simulation, then save the walk file with the name of the host on this instance under var/check_mk/snmpwalks.
Now create a rule in the Simulating SNMP by using a stored SNMP walk rule that accesses the affected host(s).
From now on, only the saved file will be used to monitor the host. There is no longer network access to the host – except the ping for the host check, and possibly any configured active checks. You can simply redirect these to the Checkmk server by giving the IP address 127.0.0.1 to the hosts.
7. Files and directories
|var/check_mk/snmpwalks||Here SNMP walk files are generated or also expected if you want to use them to simulate SNMP data.|