A discussion point we often stumble across in our conversations with customers is: agent vs. agentless monitoring.

It is an interesting topic to debate, so in this article we're going to explore this topic deeper, more specifically we're going to talk about agentless monitoring and why we believe that the term is often misused.

IT departments typically use a monitoring software solution to get (among others) a better overview of the health of their IT infrastructure and be notified as soon as possible about potential issues which could lead to disruptions of daily operations or cause outages. 

As companies grow, so does the complexity of their internal systems. Today, system administrators need to monitor all sorts of devices both physical (e.g. servers, routers, switches, storage devices) and virtual (e.g. virtual machines, containers and cloud infrastructure) as well as middleware and applications.

In order to provide an accurate overview over a systems' performance, an IT monitoring software needs a way to collect data from all connected devices. This process typically involves installing an agent, which is a small executable file provided by the vendor of your chosen IT monitoring solution, on the target device (thus the term agent-based monitoring).

The agent's job is to collect data about the device's performance and report it back to the IT monitoring software. Pretty straight forward, right?

So, what is agentless monitoring?

Like with any other type of technology that we use, there are also exceptions to the approach mentioned above, namely those devices which do not require the installation of a 3rd party agent in order to monitor performance.

These systems can provide data either through an application-specific API (e.g. VMware) or rely on a standardized technology. Here we primarily refer to two different technologies, SNMP (Simple Network Management Protocol) and WMI (Windows Management Instrumentation), although there are many others that can be used. Without going into too much detail:

- WMI is the Microsoft-based technology used for monitoring and managing Windows-based systems. IT monitoring solutions use the WMI protocol to access the data of various parameters and status values. However, this approach has a high impact on a system's performance and can cause issues.

- SNMP is more general and allows users to monitor a wider range of systems (Windows, Linux, Unix) and other devices (routers, switches etc.). It comes with its own set of drawbacks, e.g. data consistency issues and cumbersome to safeguard

There are advantages and disadvantages on both sides, so a detailed discussion about the pros & cons of using WMI or SNMP will have to be part of another article.

Now, the reason we say that there's no such thing as agentless monitoring is because when software vendors talk about their solution as having “agentless monitoring” capabilities, what they typically really mean is you don't need to install our 3rd party agent in order to monitor that target device

Of course, there is an exception: when you monitor only using APIs, then you in fact are monitoring agentless. But this is typically only possible for applications and virtualized systems. Almost all hardware and operating systems do not have such an API, thus we will exclude this aspect in this article.

So, why is that?

It is a matter of definition: on some devices (e.g. routers, switches) you don't need / cannot actually install a 3rd party agent. Additionally, the vendors who provide such devices add by default a built-in monitoring agent (based on defined industry standards) that can provide data about the device's performance. 

Technically speaking, you can perform agentless monitoring on these devices. The reality is that you're receiving data from a native software agent that is just built into the platform (thus eliminating the need to install a 3rd party monitoring agent). 

You're still using a monitoring agent to collect data about the device, it's just not the one provided by your chosen IT monitoring software vendor. Thus, we differentiate between using a dedicated monitoring agent (agent-based monitoring) and using a native, built-in solution (agentless monitoring).

Practical differences between agent and agentless discovery

To determine the status of the network and the devices to monitor, a discovery step is necessary for both agent-based and agentless monitoring methods. A two-steps process is therefore started. With Checkmk this means auto-discovering the assets present on the network by sending TCP/IP packets to every possible device. These can be laptops, servers, desktops, printers, virtual machines, routers, and so on. Once all the assets are identified through the previously installed agent, the services for each discovered device are added to the inventory for monitoring.

Even though these steps are executed in both agent and agentless monitoring solutions, they are quite different. With an agent-based method, the installed agent takes care of collecting the information about the device and replies to the queries of the main monitoring instance. Without an agent, it is the task of the main appliance to query the operating system of each asset through various protocols or APIs, and have enough permissions to collect the desired metrics.

It’s easy to see that the practical differences between agent and agentless discovery are about who does the job. If an agent is installed, it will be its duty to collect the information about the device and send it back to the monitoring server periodically or when requested. Without it, the inventory step takes place when the main server requests it. Additionally, an agentless monitoring solution usually sends more data over the network than an agent-based solution.

The pros and cons of agentless monitoring

Although we may not agree with the typical use of this term as it is often misleading, we'd like to continue by discussing the pros & cons of using this approach.

Advantages of agentless monitoring

There are a couple of advantages when it comes to using the native, built-in monitoring agent, and they can be summed up as:

  • Streamlined implementation: it is a 'less intrusive' way of monitoring your devices (although it also raises serious security concerns, which we'll talk about in a bit)
  • Fast deployment:it is easy & fast to deploy (involves having less moving parts in your monitoring environment)
  • Maintenance: lower maintenance costs, since there's no need to constantly upgrade/update agents
  • Costs: a lower total cost of ownership (TCO)

Disadvantages of agentless monitoring

Here are some of the shortcomings of agentless monitoring and why an agent-based monitoring approach makes more sense:

  • Security issues: agentless monitoring uses interfaces that allow remote access. With WMI or SNMP, in addition to accessing a device's performance data, a user also gets capability management access (e.g. rebooting a server). While a proper configuration will help reduce the risk of unwanted behavior occurring, through this approach the responsibility is left entirely to the user and their level of understanding of the agentless technologies they're using. Furthemore, if the main appliance is compromised, all the devices that allowed it access on locale are compromised too. In security conscious environments, agentless monitoring is often considered a single point of vulnerability.
  • In-depth metrics: while agentless monitoring is less intrusive, it is also limited in terms of the data that it collects. As an IT admin, you will most likely want to go beyond the standard metrics for your IT infrastructure, and this is where the agent-based monitoring approach comes into place. It provides broader & deeper monitoring capabilities and a richer set of info for analysis.
  • Configuration flexibility: in an agent-based monitoring setup, agents can often be extended to monitor more than one thing (e.g. collect general data about the operating system and application-specific data). Extending the monitoring capabilities of an agentless solution to include custom application and service monitors is either very difficult to implement or simply not possible.
  • Overhead: with an agent-based monitoring solution, it is straightforward to calculate the overhead of each agent. It is simply a matter of checking the resources' usage of the process. When going agentless, this is much harder, as there's no more a single process under which the agent runs.

The pros and cons of agent-based monitoring

We also collected some advantages and disadvantages of agent-based monitoring.

Advantages of agent-based monitoring

As easily imaginable, the shortcomings of the agentless monitoring approach were the focus of the agent-based one. Namely:

  • Security: an agent paired with the main monitoring tool can focus more on a secure transfer of metrics. Where with an agentless solution you have to rely on how well an interface is developed, security-wise, an agent-based one can develop and improve the agent itself, thus ensuring the most technologically advanced approach. TLS/SSL connections over TCP are common with agent-based tools.
  • Detailed metrics: agentless monitoring tools are generally limited in what they can return. This may be sufficient in limited cases, but often in modern IT infrastructures it is not. An agent installed on various hosts and devices can collect a larger spectrum of metrics, deeper and more granular, greatly helping in monitoring potential and troubleshooting real issues.
  • Flexibility: an agent can be developed and configured in plenty of directions. If a specific data isn’t reported yet, it can be with a newer version or by writing a plug-in. The flexibility in uses is unparalleled.
  • Performance: agents in agent-based monitoring solutions have a small footprint and weigh little on the underlying hardware. Despite collecting more data, they can operate without using too many resources.
  • Scope: whereas agentless solutions focus on a specific area to monitor, agent-based monitoring tools can collect data from multiple types of devices and hosts, organizing them together to provide an overview of the whole infrastructure. Components like cloud environments, containers and virtual servers can be monitored, something not possible with an agentless tool.

Disadvantages of agent-based monitoring

Not all is rosy in the land of agent-based monitoring tools, and a few disadvantages exist:

  • Configurability: giving more options and scope is in the long run best, but in the short run it means there are more steps and components to take care of before starting monitoring. Most modern solutions are aiming and streamlining the configuration step, though.
  • Maintenance: with more parts to install and configure comes higher maintenance costs. Not just the monitoring tool needs to be upgraded, but also the various agents installed on the network, a tedious operation. This includes having administrator’s access to all hardware that needs to be monitored.
  • Costs: some agent-based monitoring solutions may have a higher up-front license cost.

What monitoring approach should I use?

So, which approach is better? Should you go for agent-based or agentless monitoring?

This depends very much on the systems you want to monitor, e.g. most hardware can only be monitored via SNMP. On the other hand, some systems do not come with a pre-installed monitoring agent, e.g. in Linux you have to install and configure a SNMP agent. Thus, you are quite restricted in your choice and thus it becomes irrelevant to decide between one or the other approach. A good monitoring solution has to incorporate both agent-based and agentless monitoring options.

Therefore, we think there are different aspects, which are more relevant:

  • Does the monitoring system use pre-installed agents / APIs in case they already exist and are reliable, secure and deliver enough information?
  • Does the monitoring system have own agents, which are light-weight, secure and easy to maintain for all other cases?
  • Does the monitoring system have a broad coverage of the things you want to monitor - without breaking your budget?

As you can see, the question is on a such a generic level irrelevant. Thus, in the next articles in this series, we will go deeper into topics like: