Up until a few years ago, the ARM platform was considered the first choice for hobbyist projects because of its low price and ideal for battery-powered applications such as smartphones or tablets due to its low energy requirements. Since the introduction of the 64-bit instruction set in 2012 there have been several attempts to make ARM suitable for servers, but it is only now that a breakthrough is on the horizon.
Taken as a whole, the history of the ARM processor platform is a success story. What began as a processor for home computers is now the leading processor platform in the smartphone sector, and it is installed in billions of embedded devices. In recent years, the ARM family has had offshoots downwards (microcontrollers) and has had a (theoretically) server-compatible 64-bit variant for ten years. This is also the basis for Apple's processors, which bring ARM to the desktop.
Nevertheless, it took a while for ARM to gain a foothold in the server market. For example, the ARM modules for HPE's Moonshot platform delivered from autumn 2014 did not meet the high expectations placed on them, and three years later hardly anyone was still talking about them. The company Ampere Computing, which was founded in 2017 and which supplies processors for Google's and Oracle's cloud platforms, among others, is bringing new momentum to the market.
Waiting for ARM to be adopted in the server space before purchasing ARM hardware or porting software was therefore a legitimate decision by Checkmk — as well as many of our customers.
A respectable 5% market share
In the meantime, the ARM platform has grown to a 5% market share in the cloud sector, when Amazon's internal infrastructure is taken into account, it could even be as high as 15%. There is still further potential for growth. The fact that players like Google, Microsoft and Amazon are in the game — partly with their own processors, but mostly with the Ampere processors mentioned above — confirms that the platform is here to stay.
Starting with an agent
For us, this means first focusing on the monitoring of ARM-based systems: This is already possible with the legacy agent. In the future, via a Linux agent package and as little configuration effort as possible, it should be easy to use the full range of functions of the agent controller introduced in Checkmk 2.1 (encryption, mutual trust, optional push mode, auto-registration, etc.). Agent packages that run on ARM are therefore a priority for us.
Nevertheless, we do not want to ignore ARM for the Checkmk server either, especially since the platform with its massive possibilities for parallelization suits the architecture of Checkmk. Our focus will be on the data center, because operators of large Checkmk installations will potentially be the ones most likely to benefit from savings in energy costs.
Not all ARMs are created equal
If we are already porting to a new processor family, we see no reason to ensure backward compatibility within this. There are indications that the ARMv8.2-A instruction set has good market penetration on the one hand, and on the other hand has features that will benefit the performance of Checkmk.
As a comparison: Cortex A72 cores are very popular for single board computers (SBCs) such as the Raspberry Pi. These still use the very first version of ARM's 64-bit instruction set, ARMv8-A, introduced in 2012. A performance penalty on current professional hardware in order to be compatible with SBCs is simply not justifiable for customers with large Checkmk installations.
Intel for the Edge
The operation of Edge Sites in distributed monitoring was often cited as an argument for a version that also runs on ARMv8-A (Raspberry Pi 4, Rockchip 3399...). The memory and CPU performance of a Raspberry Pi may in fact be sufficiently powerful for this purpose. In practice, many factors argue against the use of SBCs. For example, mass storage and networks are often inefficiently connected, sometimes via an internal USB. SD cards do not have Wear Levelling, meaning that the data security appropriate for the application cannot be provided.
The computing power of such systems is often sufficient for pure monitoring. However, as soon as the UI is accessed in parallel, resource bottlenecks can quickly occur. This impairs the user experience and leads to unnecessary frustration.
For smaller installations, we therefore recommend Intel processors. If the Intel price is close to that of a Raspberry Pi, many barebones with power-saving Intel Celeron and NVMe support are available, without having to search for a suitable power supply and a housing with sufficient cooling. If you need more power in a small space, choose a NUC – even a Kubernetes Cluster can run on these when required. For those who still want to use a Raspberry Pi, Community Builds of the Raw Edition are available.
From Edge to Cloud without a detour
In addition, some developments already included in 2.2.0, such as the Checkmk Agent's Push Mode, reduce the need for small edge sites. The work for which some users would like to use Raspberry Pis is thus handled more securely and reliably by a virtual server rented cheaply.
Porting and Continuous Integration (CI)
One challenge is that of porting and eventually incorporating it into our Continuous Integration (CI). In fact, two of our components (Checkmk Micro Core and Livestatus) contain optimizations that require adaptations for the particular platform used. Simply recompiling on platforms other than x86/64 therefore means that compiler optimizations that result in instruction reordering cannot be used. Thus, we have the choice between performance below that of the x86/64 counterpart — or making adjustments that require analogous optimizations for ARM. Differences in memory management on ARM require further modifications to be able to guarantee a function safe operation.
Further effort will be required for inclusion in the CI. Even if we only consider a single Checkmk version, continuous builds will be necessary. To put it in numbers: A total of at least 13 installation packages for four distributions are currently being created from Checkmk 2.2.0 in each of its three editions.
Nearly 100 additional daily builds
If we wanted to make ARM support just as comprehensive as with Checkmk, taking the various branches into account, almost 100 more daily builds would be added to what would ultimately be a complete duplicate of the build infrastructure on ARM.
When prioritizing the Checkmk roadmap, we always take into account the four-way balance of reach, impact, confidence and effort.
The ongoing effort just for the CI, as well as the significant one-time effort for the porting, means that in the short term we cannot give this task a priority on the roadmap.
We will therefore focus on monitoring ARM infrastructures for the time being.
- Checkmk recognizes the potential of ARM as well as the needs of our customers
- Our focus is on seamless monitoring, so agents for ARM will be the starting point
- Operating the Checkmk server
- In the development of Checkmk, we continue to work towards platform neutrality and will thus achieve ARM compatibility over time. However, we are not currently focused on this task and can therefore not give a concrete time frame
- A focus on modern server processors: For reasons of optimal performance, no support for single board computers (such as the Raspberry Pi) can be envisaged
Your feedback is welcome
Are you already using Linux on ARM servers and would like to optimally monitor the existing infrastructure or even use it to operate a Checkmk server? If so, please tell us about the hardware you are using, the distributions you are working with and the ARM share of your current system environment, as well as the importance you attach to the operation of Checkmk on ARM servers. Let us know by sending an e-mail to firstname.lastname@example.org!