In the blog post ‘Network monitoring with SNMP: Stories from hell’ we presented some problems that occur in SNMP monitoring which are often the result of poor implementations of the protocol by the manufacturers. In addition, we have provided detailed instructions for debugging SNMP problems. In this post we will now take a look inside the crystal ball: After all, SNMP has been in use for over 30 years as the de facto standard for monitoring network components – with all of its advantages and disadvantages.
Although SNMP is considered to be non-performant and insecure for various reasons and it also reaches its limits when it comes to scalability, there has been no successful attempt to replace the protocol as such in all this time. With SNMP v2c and v3 there are only extensions. Although various manufacturers offer proprietary interfaces for remote access to their network devices, a uniform approach, such as that which has been available since 2015 with Redfish in server monitoring, is still missing for network monitoring.
Redfish is a specification for the remote maintenance of server systems via a REST-API, which will sooner or later replace the Intelligent Platform Management Interface (IPMI) protocol. It also provides a standardized programming interface for remote server maintenance, delivers schema-based output that is also human-readable, and can also be used for client applications as well as browser-based GUIs. The transmission of data in JSON format is done via HTTPS.
It is surprising that there has not yet been a widespread attempt to introduce a new uniform standard for monitoring network components as well. After all, network trends such as Software Defined Networking (SDN) or Machine Learning (ML) are also becoming increasingly important in the strategies of manufacturers. Intent-based networking (IBN), for example, is intended to automate manual routine tasks, such as configuration, by means of AI (artificial intelligence), and to be able to automatically detect and resolve network problems. In addition, the manufacturers of such an IBN approach offer a real-time insight into network activities. However, even here it is apparent that each manufacturer follows its own definition of IBN – on the other side a ‘monoculture’ in corporate networks is not really the reality.
Many manufacturers also offer proprietary protocols for their network components. For the integration of such an interface, however, development effort is required – Checkmk already offers numerous such so-called special agents.
An SNMP alternative is still to be found
A real alternative to SNMP is yet to be found. Nevertheless, it can be assumed that SNMP will be replaced in the medium term. SNMP is already reaching its limits when it comes to real-time monitoring. This has to do with how the protocol works. As explained in earlier parts of the blog series, SNMP polling and SNMP traps are two different types of monitoring.
During polling, the monitoring instance queries the status of the network components at specified time intervals. If there is an incident on the device between two polls – such as a brief change of status of an interface – the administrator does not find out about it. To get close to real-time queries, the administrator would have to significantly shorten the time intervals between the polls across the entire monitoring environment, which would noticeably increase the load on the monitored devices.
With SNMP traps, however, the network devices send a notification to the monitoring instance if an event occurs. The advantage is that the administrator does not have to initiate regular queries, and receives a notification if an incident occurs. The disadvantage is that SNMP traps send the data as UDP packets and there is no acknowledgement of receipt, so it is not possible to trace whether the packet has even arrived. It can also happen that the recipient, i.e. the monitoring instance, is flooded with notifications if a central component fails.
A combination of status-based and event-based monitoring is therefore useful, but does not solve the problem of missing scalability or blind spots in the network if UDP packets sent by traps are lost. With Streaming Telemetry, for example, there is a new approach to network monitoring that is attracting interest from many companies with large infrastructures, and it is also finding growing support among network manufacturers. Arista, Cisco, Juniper Networks or Nokia, etc. are already working with Streaming Telemetry, and are already running open source projects for the technology with Pipeline (Cisco), OpenNTI (Juniper) and GoArista (Arista).
Network monitoring with Streaming Telemetry
In contrast to SNMP monitoring, the network components, such as a router or switch, continuously stream data to the monitoring instance using a push model. It is also possible for operators and applications to subscribe to specific data elements. The administrator can specify what information is required, at what frequency, and where the network device should transmit the data packet to. In this way, Streaming Telemetry should be able to provide real-time access to data that is also prepared for ML or analysis purposes. Thus, the technology can help to advance topics such as automation, troubleshooting or even traffic optimization in large network environments.
It should be noted, however, that streaming telemetry has not yet been standardized. There are different possibilities and variables, which each manufacturer currently still interprets differently. For example, there are a variety of transport options available, such as TCP, UDP or gRPC. There are also several options for the file formats.
It therefore remains to be seen whether standardisation will take place, or which implementation will possibly prevail. In any case, it seems that streaming telemetry is becoming an option for the network monitoring in large environments. Depending on how the technology evolves and gains relevance, we will also deal with such topics and investigate a possible integration into Checkmk.
Regardless of this, it can therefore be assumed that there is still no end in sight for SNMP. Despite its shortcomings, so far no successor has emerged in over 30 years. Streaming Telemetry is a potential competitor, however, it is still too early to say whether over time this will develop into a serious successor, or ‘only’ into a useful addition. Regardless of whether it is Streaming Telemetry or any other technology, a lot of time will pass before any new technology is so far advanced that it has the potential to knock SNMP off the throne. SNMP is therefore likely to remain the dominant protocol in network monitoring, and if there is to be an heir to the throne SNMP will continue to be used – at least in parallel – for a long time to come.