IT Monitoring with SNMP

SNMP (Simple Network Management Protocol) is a de facto standard, especially when it comes to network monitoring. Many devices support monitoring via SNMP, sometimes SNMP is one option among several monitoring options. SNMP offers a standardized way of providing monitoring data, thus it is supported by many different monitoring solutions.


What is SNMP?

SNMP (Simple Network Management Protocol) is one of the most common monitoring protocols for network devices. It allows controlling and configuring devices remotely to some extent, and is also used for error detection and error notification.

Although it was developed in the late 1980s, it is still in common use due to its simplicity. A central management unit exchanges information about network packages with the SNMP agent on the respective network device. SNMP ensures standardized data packages and communication to the central management unit and the agent. Thus, it became a standard supported by many vendors.

Services of a firewall in Checkmk

Benefits of SNMP

Icon flexible

Collect all kind of data from your devices

SNMP helps you find out all kind of information about devices in your network. Since a lot of hardware manufacturers support the protocol, you can collect data from network switches, routers, UPS, NAS appliances, printers, etc. and use them in your monitoring software.

Icon network monitoring

Suitable for small and large infrastructures

Thanks to SNMP, the monitoring software is able to retrieve the data from almost all devices – such as the CPU load of the firewall, the toner level of the network printer, the temperature in the server room, or all information on the interfaces of a switch. This means that SNMP can be used to set up comprehensive monitoring in small network environments, as well as in large infrastructures with many components.

Icon configuration

Straightforward monitoring

Since most manufacturers do not allow the installation of third-party software on their hardware, you can still rely on SNMP. It is a good way to monitor any devices without needing to install an extra agent on the hardware.

Monitoring IT with SNMP

Graphs showing details about network bandwith

The data that can be retrieved via SNMP is suitable for getting information on the status of the devices in the network and is thus ideal for IT monitoring. With SNMP monitoring, you can keep an eye on the most important parameters of your switches, access points, and routers, as well as your appliances, hardware sensors, printers, and other network components.

The monitoring solution acts as a central processor that retrieves data in a targeted manner from the SNMP agents on the devices to be monitored, or serves as a recipient of event messages. Despite agents being present, monitoring via SNMP is referred to as agentless monitoring. This is because the manufacturers have already implemented the protocol on their SNMP devices and the administrator therefore does not need to install any additional software, i.e. an agent for providing the monitoring data, on the device.

This has the advantage that the required information can be easily retrieved via SNMP without the manufacturer having to grant further access rights on the device. Likewise, setting up monitoring via SNMP is usually relatively simple.

If, on the other hand, you use the monitoring agents from the monitoring tool, you will need to install and manage these agents on a system. Depending on the monitoring tool, working with agents is more or less time-consuming, but there are also some advantages to using agents. Especially in server monitoring, agents provide deeper insights into the state and health of the monitored platform, and even require fewer resources to do so.

Which variants are available for SNMP?

The monitoring of IT components supports two methods: active requests – so-called SNMP polls–, and messages triggered by events – the SNMP traps. With SNMP polling, the monitoring solution sends a request to the respective device asking it to provide specific data. The device then usually responds with a packet containing the data or an error message. For packet transmission, SNMP uses UDP (User Datagram Protocol). The SNMP agents receive the UDP request through the monitoring solution via port 161. The sending of the request by the monitoring solution can be done from any port.

SNMP traps are spontaneous messages that devices send to configured addresses whenever an event occurs. Sending and receiving SNMP traps is done via port 162, but traps have some disadvantages compared to active SNMP requests. For example, traps are unreliable because there is no acknowledgment of receipt for the UDP packets sent, and it is therefore not noticeable when a packet is lost. In addition, traps only send error messages, but no recovery messages, so the current monitoring status remains unclear. Another disadvantage is the potential flood of traps, which can overload the monitoring should a key component of the infrastructure fail. Furthermore, the administrator must reconfigure all devices if the IP address of the trap recipient changes.

SNMP traps are therefore often a supplement to monitoring with active SNMP queries. A good monitoring solution should also be able to handle both variants – the active querying of monitoring data, and event-based monitoring via SNMP traps.

What SNMP versions are available?

SNMP has been developed continually over the decades, so that there are various versions today, which are not compatible with each other. It is therefore necessary to adapt the protocol version used on the device in the monitoring system accordingly. In our experience, however, in practice v2c is used in 99 percent of applications. SNMP v2c enables bulk queries that significantly speed up monitoring and also has 64-bit counters, which are essential for monitoring switch ports with 1 Gbit/s and more. Security-wise, SNMP v2c is on par with SNMP v1. The 'c' in the version name stands for community, which takes on the role of an access password in SNMP.

SNMP v1, on the other hand, is used on old devices or on devices that do not support v2c or whose v2c support is faulty. Whereby, it may well be that devices with a faulty v2c implementation work without bulkwalk. Another SNMP version is SNMP v2, which has additional security functions compared to v2c. SNMP v2 is not actually used, or is hardly used in practice. With SNMP v3, there is also another common version of the protocol that is used when the SNMP data traffic is to be encrypted. With SNMP versions v1 and v2c, the data traffic including the community string is in plain text – i.e. without encryption. Since SNMP v3 requires significantly more computing power due to the encryption and also significantly increases the effort required to configure the monitoring, this SNMP version is also less common.

Live Webinar: Introduction to Checkmk

Live Webinar: Introduction to Checkmk

How does SNMP work?

Setting up SNMP monitoring should normally take little effort. Network devices in particular often already have an SNMP agent installed, which you will only need to activate. To do this, the device to be monitored must be configured for active requests (SNMP GET) and SNMP for read requests. These options are usually available in the configuration of the devices.

In most IT monitoring solutions like Checkmk, you can easily add these devices as hosts to the monitoring. In addition to the correct SNMP version, you must also specify the community. On most devices, the community should be public by default.

You can of course change the community. But we do not recommend using different communities – at least for devices in a network environment – because they only make monitoring unnecessarily complex.

Services of a switch monitored via SNMP

Once the host has been created, the monitoring solution can now retrieve the services on the device via SNMP. This can be done, for example, via a complete pull of all SNMP data (SNMP walk), but this process has the disadvantage that it could run for several hours for some devices.

Checkmk, for example, takes a different approach to service discovery – it uses sysDescr and sysObjectID to retrieve the very first records (OIDs) on a device. Based on this information, further queries are made as required. Based on the results obtained, the software then decides whether the device supports one of the more than 1,000 supplied SNMP plug-ins. A plug-in is an extension of the monitoring system that ensures that information is transferred to the monitoring system as services. If the supplied plug-ins support the device to be monitored, Checkmk uses these for the actual service detection.

The plug-ins use local SNMP queries to retrieve the data they need for the services to be monitored. This data is the same data that will later be regularly used for monitoring the devices.

Volume Berlin - 94.5% used
Volume Munich - 87.8% used
Diagnosis Status - ok
CPU utilization - 8.1% used
CPU1 - Intel Xeon 1.80GHz
Temperature CPU1 - 45.0 °C
Power Consumption - 70 Watt
Physical Disk 0:1:0 - Offline
CPU utilization - 40% utilization
Power Supply - Normal
Interface 001 - Up
Interface Uplink-Rack 18 - Down
Input Phase 1 - 231.4V, 13.4 A
Output Phase 1 - 230.7V, 19.6 A
UPS Alarms - No alarms
Battery Charge - On mains

What do OIDs and MIBs have to do with SNMP?

To fully understand how SNMP works, you have to look at the role of the OID (Object Identifier) and the MIB (Management Information Base). They play a crucial role in providing the right information to the monitoring tool. For some examples, this could be information on the bandwidth of a switch port, the status of a fan in a server, the CPU load on a firewall, or the toner level in a printer. Each one of these items of information is an object associated with a unique identifier. This assignment is called an OID.

An OID is a long sequence of numbers separated by periods. The OIDs are arranged in a hierarchical order in a tree structure. Most parts of the tree are standardized, so that specific information should always be found under the same branch. It is additionally possible for each manufacturer to define their own product-specific OIDs in a subdirectory.

The MIB is required to read the OIDs. This provides names, definitions and descriptions of the objects. Since many hardware manufacturers use their own OID numbering system, the corresponding MIB is required in order to understand and translate the numbers.

Example for an OID tree of a Cisco device

The advantages of SNMP

Different graphs of a switch in Checkmk

The great strength of SNMP is that with a powerful SNMP monitoring tool, you can relatively easy retrieve data from all devices on your network – as long as they support SNMP. In addition to information on the health of a device, SNMP also gives you 'chassis', or 'housing' data, such as cooling, voltage, or temperature sensors.

For server monitoring, SNMP provides data about the server hardware, such as temperature, which monitoring agents can only retrieve to a limited extent or via detours.

Since there are different types of devices in any network infrastructure – such as switches, printers, UPSs, IoT sensors or gateways – this gives you a quick overview of the health of your IT. It also allows you to put them in context, identify problems and make decisions based on the monitoring data – all regardless of the size of your network environment.

Combined with an SNMP-compatible monitoring tool that helps you monitor your network via SNMP, you'll also be able to use dashboards to get an overview of the state of your network environment and be able to immediately identify potential performance bottlenecks. Furthermore, it can help you structure your environment efficiently, set thresholds for alerts to receive notification in the case of a problem, as well as to generate reports and view statistics.

Live Webinar: Introduction to Checkmk

Live Webinar: Introduction to Checkmk

Problems and disadvantages of SNMP

SNMP does allow a comprehensive monitoring of network devices to be set up with relatively little effort. Nevertheless, problems will still occur from time to time. As already mentioned, the monitoring of devices that only send out SNMP traps can quickly cause problems, since the traps sent as UDP packets can be lost. As a result, a problem that has occurred may go undetected. Similarly, it is not ideal if an important upstream service fails and the monitoring system subsequently goes down under a resulting flood of trap messages – unless the application has an integrated system for the processing of SNMP traps.

But even monitoring with SNMP polling does not always run smoothly. This is often because the manufacturers have not implemented certain parts of the protocol or have implemented them incorrectly when implementing the standard on their devices, and as a result the agent delivers incorrect or faulty data – or the request times out. If this is the case, the administrator must manually fine-tune to remove the incorrect or erroneous information from the monitoring. This can be avoided if the monitoring tool in use already knows which values are incorrect in most cases and can then automatically ignore any incorrect information, as is the case with Checkmk, for example.

Server monitoring via SNMP is also largely limited to hardware data. Information on the operating system or applications on the server is only provided by the protocol to a limited extent or not at all. This means that for these cases, holistic monitoring also requires agents that can provide this data.

Furthermore, SNMP can have a significant impact on monitoring performance, especially in environments with many hosts, so that requests process very slowly and/or time out. Compared to our Checkmk agents, monitoring with SNMP, for example, consumes significantly more CPU and memory. However, since there is often no way around SNMP for network monitoring, we have integrated an SNMP engine in our Checkmk Enterprise Edition, which halves the CPU consumption.

Ready to explore the Checkmk Enterprise Edition?

Download the free trial of the Checkmk Enterprise Edition and see it in action.