We use cookies to ensure that we give you the best experience on our website.  Visit our Privacy Policy to learn more. If you continue to use this site, we will assume that you are okay with it.

Your choices regarding cookies on this site.
Your preferences have been updated.
In order for the changes to take effect completely please clear your browser cookies and cache. Then reload the page.

Monitoring via SNMP

1. What is SNMP?

1.1. SNMP instead of a Checkmk agent

Routers, switches, firewalls, printers, appliances, UPSs, hardware sensors, and many other devices do not allow the installation of a Checkmk agent. They do however already have a built-in interface for monitoring provided by their manufacturer – an SNMP agent. This agent can be accessed via the Simple Network Management Protocol (SNMP). Checkmk uses SNMP to monitor such devices. The benefit for you is that setting up the monitoring is very easy.

Incidentally, there are also SNMP agents for Windows and Linux. However using these instead of the Checkmk agent is not recommended. SNMP is not very performant, so using this for monitoring generally means that the Checkmk server needs more CPU and memory per host than when working with its own agent. In addition the data provided via SNMP are incomplete.

Monitoring SNMP devices with Checkmk is very easy. If you just want to get started quickly with SNMP, you will probably need to read the short section on SNMP in the Getting Started Guide. This article goes into much more depth, and shows you all of the details and special cases when SNMP monitoring with Checkmk.

1.2. SNMP Versions

The SNMP protocol is available in different versions. These are all incompatible with each other, and so the monitoring system and the monitored device must always use the same protocol version consistently. Checkmk supports versions v1, v2c, and v3. In practice, an estimated 99% of installations use v2c. Here is an overview of all relevant versions of SNMP:

Version Features Checkmk Description and application in practice
v1 yes Use only on very old (say, 10 years and older) devices that do not support v2c, or their support for v2c is defective.
v2c Bulk queries,
64-Bit-Counter
yes This is the standard in practice. v2c is a ‘light’ variant of v2 and the ‘c’ here stands for Community, which performs the role of a password in SNMP. The 64-bit counters are essential in monitoring switch ports with 1 Gbps and more. The bulk queries accelerate monitoring by up to a factor of 10.
v2 Security no Version 2 offers even better security options in addition to the features of v2c. Version 2 of SNMP is not found in practice, therefore CMK does not support this protocol version. If you need security use version 3 instead.

Attention: Since the ‘real’ version 2 has no relevance, many masks in Checkmk refer simply to v2, but always really mean v2c.

v3 Security,
Context
yes Version 3 is used when encrypting SNMP traffic. With v2c and v1 this runs in plain text – including in the community. In practice, version 3 is rather less common, because this version requires significantly more computing power, and also the cost of the configuration is significantly higher than with v2c. The contexts are a concept in which different information is visible in the same area of the SNMP data structure (OID) depending on the context ID. This would be used for partioning of fibre-channel-switches for example.

1.3. SNMP-Traps

Checkmk uses active requests for SNMP monitoring. Checkmk sends a UDP packet (port 161) with an SNMP request to the device requesting the provision of specific data. The device then responds with a UDP packet containing the response data (or an error message).

But SNMP has a second variation: SNMP Traps. These are spontaneous messages sent via UDP (port 162) by devices at configured addresses. Traps have many disadvantages over active requests, which is why they are not very important to monitoring. Some of the disadvantages are:

  • Traps are not reliable. UDP packets can be lost. There is no confirmation of receipt.
  • Mostly only Error messages are sent, but no Recovery messages. Thus the current status in the monitoring is unclear.
  • If thousands of switches simultaneously send traps (for example, if an important upstream service is not available for them), the trap receiver will not be able to handle it and will break under the load. Monitoring is then overloaded when you need it most.
  • When changing the trap receiver’s IP address, all devices must be reconfigured.
  • Traps are hard to test. Few devices even have a function to send a generic test trap – let alone test real error messages. Therefore it is difficult to predict whether an important trap will be processed correctly when its first occrrence is after a few months or years.

However if you want or need to work with traps, the Event Console provides a solution. This can receive traps and generate events from them.

2. Setting-up SNMP in Checkmk

2.1. Preparing a device

The first step is to prepare the device. Each device that supports SNMP has its own configuration mask somewhere in its configuration. Make the following settings in this configuration mask:

  1. Go to configuration for active queries (SNMP GET). (Please do not confuse this with the traps – the terminology in the configuration dialogs can be very confusing).
  2. Enable SNMP for reading requests.
  3. Enter the addresses of your Checkmk servers as permitted IP addresses. It may also be useful to provide a test instance of Checkmk here. Important: If you have multiple redundant Checkmk servers, do not forget to also specify the IP address(es) used after a failover. In the case of the Checkmk appliance in particular this uses the IP address of the active node as the source IP address for outgoing connections – and not the service IP address. In a distributed environment the IP address of the slave site from which the device is monitored is critical.
  4. Assign a community to use protocol versions v1 and v2c.

The community is a kind of password, except that there is no user name for SNMP. There is a convention that the community is public. This is the default for many devices – and also for Checkmk. Of course you can argue that this is insecure and that you should specify another community. This certainly makes sense, but you should know that SNMP transmits the community in plain text (except for SNMP Version 3). Anyone who can listen to packets can therefore listen to the community very easily. On the other hand you have limited access to read-only access, and most of the information that can be retrieved via SNMP is not very critical.

Furthermore, the use of different communities per device is very cumbersome to handle. After all these must not only be maintained in the devices, but also in the monitoring system. That's why in practice users usually use the same community everywhere – or at least everywhere in a region, department, computer center, etc.

Tip: If you want to increase the security even without SNMP version 3, it makes sense to extend the network concept so that you put the traffic with the management services, and thus also SNMP, in a separate management VLAN and access it via firewall safeguards.

2.2. Adding a device into Checkmk

As usual, add the monitored devices as hosts in Checkmk. If you have chosen your folder structure so that only one folder contains SNMP devices, you can make the other settings directly in the folder. This makes it easier to add additional hosts later, and also avoids errors.

Now in the properties of the host (or folder), in the Data sources box set Check_MK Agent to No agent. An exception to this would be if you want to monitor a host simultaneously with a normal Checkmk agent and SNMP. There is occasionally a reason for this – namely, that on a server you have installed a manufacturer’s hardware monitoring agent which provides its data via SNMP, which is the case with Fujitsu ServerView for example.

In the same box, also activate SNMP and select it as SNMP protocol SNMP v2 or v3. The selection of protocol version 1 is an emergency solution only for very old devices. You should use this only if you know that v2 is really not supported, or the implementation for the device is defective (in practice, only in isolated cases). Above all, SNMP version 1 is very slow because it does not support bulk accesses. This difference is very significant.

The third and final setting is called SNMP credentials. Here again a choice of the protocol version is necessary, since v2c and v3 differ here. We will discuss version 3 below. If you do not have very high security requirements, you will be well-served by version 2c. Version 2c requires the entry of the community as discussed above.

There is an alternative way to configure the SNMP-credentials, if you can not easily pass it through your folder structure: the Access to Agents ➳ SNMP credentials of monitored hosts ruleset. This will allow you to assign the credentials based on host characteristics, labels and similar properties. The principle is that a community that is set directly at the host or folder always takes precedence over the rules.

2.3. Diagnostics

When you have finished with the settings, you can make a short detour via the diagnostics page. To do this save with the Save & Test button. Here is an example of the diagnostics for a switch. Various protocol versions of SNMP are tried simultaneously, namely:

  • SNMP v1
  • SNMP v2c
  • SNMP v2c without Bulk Queries
  • SNMP v3

A normal, modern device should respond to all four variants with the same data – however this may be limited depending on the configuration. The result will look like this:

The four information outputs are described here:

sysDescr The description of the device as it is hard-coded in the device firmware by the manufacturer. This text is very important to Checkmk for automatic service discovery.
sysContact This field is for specifying a contact person and is defined by you in the device configuration.
sysName Here is the host name of the device. This field is also configured on the device. For the actual monitoring the name plays no further role and is only displayed for information.
sysLocation This is a field for a free text entry – purely for information – in which you can enter the location of the device.

2.4. The service configuration

Special features of SNMP devices

After saving the host properties (and optionally the diagnostics), the usual next step is the configuration of services. There are some peculiarities, because internally the service recognition is done very differently in SNMP devices compared to hosts, which are monitored with the Checkmk agent – Checkmk can simply look at the agent's output and find the interesting items using the individual check plug-ins. With SNMP a little more work is necessary. Although Checkmk could perform a detection and generate a full output of all SNMP data (SNMP Walk), and in this look for interesting information, but there are devices for which a single detection would take several hours!

However Checkmk is smarter. Initially, it only retrieves the very first two records (OIDs) from the device – the sysDescr and sysObjectID. Thereafter, as needed, further queries result. Based on the results, each of the nearly 1,000 supplied SNMP check plug-ins decides whether the device actually supports this plug-in. Checkmk calls this phase the SNMP scan, and as a result receives a list of check plug-ins that serve as candidates for the actual service discovery.

In a second step the actual detection runs. The found plug-ins retrieve precisely the data they need using local SNMP queries, and use them to determine the services to be monitored. The retrieved data are precisely those which will later be fetched regularly for monitoring.

For devices in the LAN the whole process usually does not take very long – more than several seconds would be an exception. If you monitor devices over high-latency WAN links however, the entire scan may take several minutes. A scan also takes longer for switches with hundreds of ports of course. Now it would be very impractical if you had to wait so long every time you open the services’ site.

Therefore WATO normally skips the scan, and does the detection only with the check plug-ins already in use at the host. The SNMP Walks are then already available as cache files through the normal monitoring, and their detection thus takes fractions of a second. With this you will be able to find new items from existing plug-ins (for example, new switch ports, hard disks, sensors, VPNs, etc.), but not find brand new plug-ins.

The Full scan button forces an SNMP scan and fetches fresh data via SNMP. As a result services from completely new plug-ins are also found. It may be necessary to wait for slow-responding devices.

Standard services

No matter which device you monitor via SNMP – at minimum the following three services should appear in the configuration:

The first service is a check that monitors the network ports. At least one must have the device and be active – otherwise SNMP would not work well. In general Checkmk is preset so that it includes all ports that are active at the time of service detection (operational status ‘up’) in the monitoring. You can influence this with the {{Parameters for discovered services|Discovery – automatic service detection|Network Interface and Switch Port Discovery} set of rules.

By the way, in the beginner's manual you will find a chapter on best practices when monitoring switchports.

The second is the SNMP Info service which displays the same four pieces of information that you saw in the diagnosis. This has a purely informal function and is always OK.

Finally there is the SNMP Uptime service, which shows you when the device was last restarted. This service is always OK by default, but you can set upper and lower thresholds for the uptime.

3. When devices create problems

3.1. A defective SNMP-Implementation

It actually seems as if any conceivable mistake that can theoretically be made when implementing SNMP has already been made by some manufacturer at some point! And so there are devices with which SNMP works reasonably well, but certain parts of the protocol do not, or have been incorrectly implemented.

No response for a request to sysDescr

One possible error is when SNMP agents fail to respond to the request for standard information – a reply to the sysDescr for example. These devices are as good as dead in a diagnosis. And also in a service detection they will not give any results unless they help with a special configuration. To do this, for affected hosts create a rule under Access to agents ➳ Hosts without system description OID with Positive outcome. Checkmk then simply assumes that everything is fine and skip the test with the sysDescr. Although no check plug-ins will be detected that expect specific parts in this text, in practice this does not matter as the affected plug-ins are designed to accommodate such a condition.

V2c works, but bulk-queries do not work

Although some devices support version v2c – and will also provide a response in the diagnosis – but the protocol does not include the implementation of the command GetBulk. This is used by Checkmk to get as much information as possible with a single request and is very important for the performance.

With such a host, some simple SNMP checks will work – such as SNMP Info or SNMP Uptime, but other services are missing – especially the network interfaces that must be present on each device.

If you actually have a host where this is the case, you can run it with v2c, but without bulk requests. Configure such a host as follows:

  • Set the SNMP version for the host properties to SNMP v1
  • In the Access to agents ➳ Legacy SNMP devices using SNMP v2c rule chain, create a rule for the host, and set the value typically to Positive match (Add matching hosts to the set).

This forces the host to use the SNMP v2c protocol – although version 1 has been set – however without Bulkwalk. Incidentally, we do not recommend the use of SNMP v1 – even if that is supported – because it does not support 64-bit counters. This can lead to missing or erroneous measurement data for network ports which are subject to heavy traffic.

Devices which respond very slowly

There are some devices with which some SNMP queries need a very long time. This is partly due to incorrect implementations. Here it can sometimes help to go back to SNMP v1 – which is usually much slower, but can still sometimes be faster than a broken SNMP v2c. Before you try this however, you should check whether the manufacturer provides a firmware upgrade that solves the problem.

A second cause may be that the device has very many switch ports, and also a slow SNMP implementation. If you only want to monitor very few of the ports (for example, only the first two), you can manually limit Checkmk to polling only specified ports. Details can be found below in Performance.

3.2. Only the standard services are found

You have included an SNMP device in the monitoring but Checkmk recognizes only the SNMP Info and SNMP Uptime services and the interfaces. This can be due to a number of causes:

a) There are no plug-ins

Checkmk provides nearly 1,000 check plug-ins for SNMP devices, but even this list is naturally never complete. Over and over again it is found that for certain devices Checkmk does not provide any specific plug-in, meaning you can only monitor the standard services as mentioned. Here you have the following options:

  • You might find a suitable plug-in on the Checkmk Exchange, where users can upload their own plug-ins.
  • You develop your own plug-ins. Information for this can be found in several articles in the manual.
  • You contact our support team or one of our partners and request that they develop suitable plug-ins.

b) The plug-ins cannot be recognised

It sometimes occurs that a new firmware version for a device results in Checkmk plug-ins no longer recognizing the device – e.g. because a text has changed in the system description for the device. In such a case the existing plug-ins must be adapted. Contact our support team for this.

c) The device does not deliver the required data

Some (few) devices have the ability to individually-configure access to specific information areas in their SNMP configuration. Your device may be set to deliver the default information, but not that for the device-specific services.

On a few devices you must use SNMP v3 and contexts to get the data you want.

3.3. Devices that do not respond at all to SNMP

If the ping works, but none of the SNMP protocol versions work, there are several possible causes:

  • The device is not reachable via IP at all. You can check this with the ping test (first box).
  • The device does not support SNMP at all.
  • The SNMP share is not configured correctly (activation, allowed addresses, community).
  • A firewall blocks SNMP. You need UDP port 161 to be active.

4. SNMP v3

4.1. Security

By default SNMP is unencrypted and is therefore very poorly-authenticated by a community transmitted as plain-text over the network. This level may still be sufficient for a local, isolated network, as here monitoring is limited to accessing read-only operations.

If you still want a higher level of security you will need SNMP version 3. This provides the possibility of encryption and genuine authentication. For this however a corresponding configuration is also necessary.

SNMP v3 recognises various levels of security:

noAuthNoPriv No real, user-based authentication, no encryption. Nonetheless, the improvement over v2c is that the password is no longer transmitted in plain text, instead it is hashed.
authNoPriv User-based authentication with a name (Security name) and a password, but no encryption.
authPriv User-based authentication as with authNoPriv, and additionally all data is encrypted symmetrically with a DES56 algorithm. Here you have to manually exchange a key – that is, deposit the key both in the device and in Checkmk.

The necessary setting in Checkmk is made in the same place where you also defined the community – either under the host properties or the SNMP credentials of monitored hosts rule set. There, instead of SNMP Community, select one of the three levels of v3 and configure the necessary values:

4.2. Contexts

SNMP v3 introduces the concept of contexts. A device can show different information at one and the same point in the SNMP tree – depending on which Context ID is given in the query.

If you have a device that works with such contexts, you will need two settings in Checkmk:

  • First, the device must be queried using SNMP v3 (as described in the previous section).
  • Then you need another rule in the rule set SNMPv3 contexts to use in requests. Here you select the check plug-in for which contexts are to be activated, and then the list of contexts that should be queried in the monitoring.

Luckily there are very few situations in which you have to work with contexts, because unfortunately it is not possible for the monitoring to recognize them automatically. A manual configuration of the contexts is always necessary.

5. Performance and Timing

5.1. Inline-SNMP

Performance always plays a role – especially in environments with many hosts. And monitoring with SNMP needs more CPU and memory than with Checkmk agents.

While the Raw Edition makes SNMP requests in the classic way via the snmpget or snmpbulkwalk command-line commands, the Enterprise Editions have a built-in SNMP engine that performs SNMP requests very efficiently without generating any extra processes. With this, CPU consumption for SNMP processing is approximately halved. The shorter polling times also reduce the number of Checkmk processes needed concurrently, and thus the memory usage.

If you are curious about the difference, you can use the Hosts not using Inline-SNMP rule set to turn off inline SNMP for all or even individual hosts.

5.2. Check intervals for SNMP checks

If your resources reach their limits, or if it takes more than 60 seconds to poll a single device, you can reduce the interval at which Checkmk queries the host(s).

With the Normal check interval for service checks rule set, which you apply specifically to the Check_MK services of hosts, you can extend the general interval of one minute to, for example, 2 or 5 minutes.

Especially for SNMP checks, there is also the rule set Check intervals for SNMP checks. This allows you to lower the interval for single check plug-ins. It's important to know that you can never set it faster than the interval for general monitoring by the Check_MK service.

Overall, however, we recommend that the monitoring be designed so that the standard interval of one minute can be maintained, and only increased in exceptional cases for individual hosts or checks.

5.3. Timing settings for SNMP access

By default Checkmk expects a response in less than one-second for an SNMP request. It also tries a total of three times before giving up. For devices that respond very slowly, or that can only be reached over a very slow network, it may be necessary to change these parameters. You do this through the Timing settings for SNMP access rule set:

5.4. Bulk walk: Number of OIDs per bulk

By default SNMP transmits 10 responses in one packet per GetBulk request. Try the Bulk walk: Number of OIDs per bulk experimental rules chain to see if a higher value performs better. However this will only be the case when large tables are transferred to the host – e.g., if it is a switch with many ports.

SNMP always fills the packets up to the specified number, including any records following the actual required ones. And if only a few of these records are really needed, extra data is transferred uselessly and the overhead increases.

On the other hand, in practice it may occasionally occur that devices with the default value of 10 OIDs have bulk problems. Then it can be useful to reduce the number.

5.5. Limiting OID-Ranges

Checkmk normally works by always getting the information on all switch ports, even though not all are actually being monitored. This is a good thing anyway, since this is normally faster because single queries cannot be done with the efficient bulk queries. In addition, from our point of view, it is always advisable to monitor all ports in order to find faulty ports or cables with high error rates. If ports are not reliably UP, you can also check the link status DOWN as OK.

However, there are isolated cases where switches have very many ports, and which for some reason respond very slowly, or process SNMP very inefficiently, so that it is no longer possible to monitor with complete retrieval of all port information.

For such cases, there is the Limit SNMP OID ranges rule chain. This allows you to statically limit the list of queried data (e.g., ports). In the rule's value, for each particular check plug-in you specify which indexes of the respective table are to be fetched.

The usual plug-in for switchports is called SNMP interface check with 64bit counters. The following example shows a setting in which only the first two ports are fetched via SNMP:

Note: This filtering is then in effect before the service detection and monitoring. Depending on the switch port discovery setting, this does not automatically mean that these two ports really are monitored.

6. Simulation through SNMP-Walks

6.1. Principle

The CMK SNMP engine has a very handy feature – you can have a monitored device write a complete snapshot of all its SNMP data to a file (an SNMP-Walk). You can use this file later to simulate monitoring the device on another Checkmk server, even if this other server has no network connection to the device at all.

We use this feature very intensively, for example, when our support team is developing new check plug-ins for our customers. Therefore our developers do not need access to your devices – just an SNMP walk.

6.2. Creating a Walk via the GUI

You can create an SNMP walk directly from the GUI. This function can be found in the Check_MK host’s service context menu, and also the Download SNMP walk option in the host’s menu:

The creation of the walk takes a few seconds in the best case, but a few minutes are not uncommon. When the build is done you can download the file via the Result line.

6.3. Creating a Walk from the Command line

Alternatively, you can also create walks from the command line. Log on to the instance from which the device is being monitored. The creation of the walk is simply done with the cmk --snmpwalk command and the specified host (which must be configured in monitoring):

OMD[mysite]:~$ cmk --snmpwalk myswitch01

Add -v to get debug output from the process:

OMD[mysite]:~$ cmk -v --snmpwalk myswitch01
myswitch01:
Walk on ".1.3.6.1.2.1"...3664 variables.
Walk on ".1.3.6.1.4.1"...5791 variables.
Wrote fetched data to /omd/sites/mysite/var/check_mk/snmpwalks/myswitch01.

The file is then placed in the var/check_mk/snmpwalks directory where it simply carries the name of the host. It is a text file. If you are curious you can view this – e.g., with less (quit with the Q key):

OMD[mysite]:~$ less var/check_mk/snmpwalks/myswitch01
.1.3.6.1.2.1.1.1.0 JetStream 24-Port Gigabit L2 Managed Switch with 4 Combo SFP Slots
.1.3.6.1.2.1.1.2.0 .1.3.6.1.4.1.11863.1.1.3
.1.3.6.1.2.1.1.3.0 560840147
.1.3.6.1.2.1.1.4.0 bi@mathias-kettner.de
.1.3.6.1.2.1.1.5.0 MKSW001
.1.3.6.1.2.1.1.6.0 Core Switch Serverraum klein
.1.3.6.1.2.1.1.7.0 3
.1.3.6.1.2.1.2.1.0 27

6.4. Using saved walks for simulations

If you want to use this walk on a different (or the same) Checkmk instance for a simulation, then save the walk file with the name of the host on this instance under var/check_mk/snmpwalks.

Now create a rule in the Simulating SNMP by using a stored SNMP walk rule that accesses the affected host(s).

From now on, only the saved file will be used to monitor the host. There is no longer network access to the host – except ping for the host check, and possibly any configured active checks. You can simply turn these over to the Checkmk server by giving the IP address 127.0.0.1 to the hosts.

7. Files and directories

File path Description
var/check_mk/snmpwalks Here SNMP walk files are generated or expected if you want to use them to simulate SNMP data.