The Coronavirus has triggered a true digitalization offensive around the world. Since its outbreak, numerous companies have switched to home office for their employees. Even companies that were opposed to a home office arrangement in the past have tried to adapt their IT and work style to work from home, or to work remotely, as quickly as possible. This is not surprising, since working from home helps to keep the business going while minimizing the risk of your employees contracting Covid-19.
However, the short-term relocation of many workplaces to a home office presents numerous internal IT departments with great challenges, as in many places there is either no – or an insufficient – infrastructure for remote access to the company network and company-specific applications, and there are IT security concerns when the employee accesses the company network from the kitchen table at home. A classic VPN (Virtual Private Network) can help here. It forms a tunnel from the employee's device to the company network via the Internet. On the one hand, the tunnel protects the transmitted data from outside access by means of encryption and, on the other hand, forwards to private IP addresses via the Internet connection.
In this way it is possible for the employee to access the company infrastructure from their end device in their home network. However, it is all the more annoying if there are problems because the provided VPN capacity is insufficient, or the PC wants to connect to the wrong DNS server. It is just as disruptive if the VPN connection is significantly slower than the usual network connection. For example, this may be due to the gateway being overloaded, or simply because necessary licenses are missing.
Since the data traffic transmitted via a VPN tunnel is encrypted at one end and decrypted at the other end, the VPN gateway requires significant CPU time for the encryption and decryption processes. If the existing performance is no longer sufficient due to the rapidly growing number of VPN connections, the gateway quickly becomes a bottleneck in the remote infrastructure. This not only leads to frustration for the users, but also reduces the productivity of the employees in the home office.
It is therefore important for companies to know how much VPN capacity, connections and bandwidth they need. They should also take into account that different departments may need different capacities. In addition to maintaining the necessary licenses, and monitoring the status of VPN connections and the amount of data transmitted, it is therefore also sensible to monitor the VPN gateway. A VPN gateway can be, for example, a router, a server or a firewall.
Identifying Gateway problems
Monitoring the CPU load and utilization of a VPN gateway does not resolve potential problems or bottlenecks that can occur when using VPN. However, monitoring these parameters can help identify a problem early so that the administrator can either fix it quickly, or avoid it in advance. Both values – i.e. CPU load and CPU utilization – say something about the CPU usage.
For Unix systems, the load is the number of processes for which the CPU is currently calculating and those that are waiting for the CPU to calculate them. For a system with one core, the optimal load should not exceed 1.00. A value of 1.00 means that a process is currently in the processing queue of the processor. A value below 1.00 indicates that the processor is underutilized. If the CPU load is above 1.00, it is overwhelmed accordingly. Ideally, the CPU load value should be below 1.00 so that the CPU can continue to perform a process without waiting. Many system administrators use 0.70 as a guideline here in order to have a little buffer upwards. If the CPU load regularly exceeds this value, the administrator should take action. On machines with multicore processors, the CPU load value, which signals full utilization, depends on the number of cores available. 2.00 for two, or 4.00 for four CPU cores, etc.
CPU utilization is the percentage of available computing time that is consumed by computing. A CPU utilization of 100 percent means that the running processes take up all of the logical CPU cores for the entire time that they are running. Among other things, this can be used to determine whether a system is efficient. A higher utilization of the processor cores can, for example, increase the response time of different applications.
Using the example of an ice cream parlor, the distinction between CPU load and CPU utilization – which applies to all IT systems and not just VPN gateways – can be clarified: there, for example, four employees are supposed to serve the ice cream behind the counter. Utilization is the percentage of employees serving ice cream at a given time, while the load is the length of the line of customers waiting for the ice cream.
Monitoring of other parameters
In addition to monitoring the CPU utilization on a VPN gateway, it also makes sense to monitor other parameters. These can be, for example, the number of active VPN sessions, the number of active VPN tunnels, and the number of bytes transmitted and received via VPN. It therefore makes sense for companies to think carefully about what they want to monitor with VPN monitoring, and how.
Checkmk also offers a number of options for monitoring the VPN connections in order to receive information about their current status. Various plug-ins are already available for this.
With Checkmk 1.6 FP2, for example, the number of active WebVPN connections on a Cisco ASA (Adaptive Security Appliance) can be monitored. The monitoring software is also able to check the existence of IPsec and remote access VPN tunnels on a Cisco ASA device. The solution creates a separate service for each IPsec and remote access tunnel. If Checkmk finds a configured tunnel, it is possible to configure a name and a status for it. You can also configure a general status for a tunnel that no longer exists. Checkmk can also monitor the number of active SVC sessions, such as AnyConnect, on a Cisco server.
A new check also enables the total number of current SSL/VPN connections to be monitored of Big-IP Loadbalancers from F5 Networks.
From Checkmk 1.6 FP2, Fortinet users can monitor the number of available IPSec and VPN tunnels of FortiGate firewalls and simultaneously check configured SSL VPN tunnels on the FortiGate devices. SSL VPN can also be enabled or disabled and deployed per virtual domain, so that there is one service per domain.
The other functions of Checkmk also include querying the VPN status on the genuscreen VPN appliance (version 5.1) from Genua, monitoring of the current status of the VPN tunnels on Juniper ScreenOS and on Checkpoint firewalls as well as monitoring the status of client connections, and incoming and outgoing data traffic with OpenVPN.
Avoiding bottlenecks in remote-workplace environments
Another home office scenario is the provision of remote workplaces via a VDI (Virtual Desktop Infrastructure) environment. In this way, a company can, for example, provide a secure virtual work environment that the employees can also access over any network from their private device. Citrix is a popular provider of such a remote workplace solution.
The Citrix infrastructure is designed so that users can access the Citrix environment’s server architecture behind the company firewall from the outside via the NetScaler Gateway. In this way, Citrix enables administrators to set up access control at the application level, and allows users to access their centrally-hosted virtual workstation from anywhere. The connection to the virtual NetScaler Gateway or to the NetScaler-VPX device is secured with TLS. The VPN sits in the DMZ (Demilitarised Zone), i.e. an area of the company infrastructure that is accessible from the outside and is shielded from other networks, such as the Internet or the LAN, by means of one or more firewalls. From there, the gateway provides central access via the company firewall.
If performance problems arise in such a VDI environment, it is not necessarily helpful to scale the number of servers, since the bottleneck may be in the DMZ, and not in the company infrastructure. If, for example, the gateway is overloaded by the many requests, this cannot be compensated for with more servers. It is instead necessary to scale the gateway. Monitoring the NetScaler gateway is a good way to find such problems, e.g. by monitoring CPU performance which is used when encrypting and decrypting the data traffic. In addition to various monitoring functions for the Citrix NetScaler Loadbalancing Appliance, Checkmk also offers a check for monitoring the CPU utilization.
Another reason for performance problems with Citrix can be – especially in Germany – an overloaded Internet connection. Very often, however, the virtual platform on which Citrix is running proves to be a bottleneck, since the available resources, such as CPU, RAM or Storage IO, are busy and have no capacity left for scaling. It is therefore extremely important to cover all areas with the monitoring solution used in order to be able to take the right measures for a friction-less operation.
With its Unfied Access Gateway (formerly VMware Access Point), VMware also offers a solution that should enable external access to corporate applications and resources, such as the VMware Horizon Desktop and Apps. The Unified Access Gateway (UAG) is also located in the DMZ, where it forwards authentication requests to the respective server, or blocks unauthorized requests.
Due to the Corona crisis, tribe29 is currently working on providing more options for monitoring IT environments that are necessary for the smooth operation of home office and remote work. These include, for example, the UAG, for which there has also been a check since Checkmk 1.6.0p12. This enables the monitoring of the CPU, memory and the VMware tunnel server.
From Checkmk 1.6 FP2, Pulse Secure appliances can be monitored, e.g. the number of logged-in web users for Pulse Secure. Pulse Secure also offers various solutions that include secure remote access from the classic VPN client via a workspace solution for BYOD devices (bring your own device), and a sophisticated NAC solution. Another check is for monitoring CPU, memory and hard disk utilization, as well as enabling a monitoring of the temperature and log file usage on the appliances.
In addition to a functioning access to the company network, the connection should be protected against external attacks – with a firewall or similar. With Checkmk it is possible to monitor various parameters (CPU load, bandwidth, etc.) for the firewall solution used, thus avoiding a possible bottleneck at the transition point between the company network and the Internet. Only if the firewall works perfectly can a company guarantee the provision of the most secure possible home office connection for its employees.
Thanks to the large number of check plug-ins, Checkmk already provides a lot of relevant data for monitoring. In addition, it offers the option of creating your own local check and setting up your own service in this way. These local check plug-ins calculate the status directly on the host on which you want to access the data. This has the advantage for the user that they do not have to program a complex check in Python, but are completely free in the choice of scripting language. In this way it is also possible to receive data from devices that, for example, cannot supply the required data on the SNMP side.
Conclusion
The relocation of many workplaces to a home office has caused problems in numerous companies, since often the necessary infrastructures for remote access to company resources were not available. Even companies that already had appropriate infrastructures, such as VPN solutions or remote workspaces, had to adapt their capacities to the increased requirements. Even if these are sufficient, the firewall or gateway used can lead to a further bottleneck when connecting from the home office. It is therefore advisable to set up extensive monitoring of all different areas in order to identify problems at an early stage, and to be able to take effective countermeasures.
Checkmk already offers numerous options for responding to current requirements. Our developers are currently trying to prioritize feature requests related to VPN monitoring. In addition, Feature Pack 2 already includes some new features for VPN monitoring, so that Checkmk users can easily monitor the challenges to IT infrastructures arising in these times.