The last session at the Checkmk Conference #6 once again belonged to Lars Michelsen, who, as our chief developer, gave the participants an overview of the innovations implemented in Checkmk to date, and presented our roadmap to Checkmk 2.0 and beyond.
In the opinion of our chief developer, we are well on schedule for Checkmk 2.0, which we would like to release in a stable version this autumn – even if we still have a lot to do after the conference – as Lars said. Especially when it comes to the new user experience (UX), the project that Mathias Kettner presented in his session, we still have to try out and test the developed concepts so that they can be included in version 2.0, Lars explained.
We already have built a first version of the Prometheus integration, which has been included in the published Feature Pack 2 for version 1.6. We look forward to your feedback, such as how the integration of Prometheus in Checkmk has worked, and in which scenarios you use it. Only with your feedback can we continue to develop Checkmk according to your needs, emphasized Lars.
With the ntop integration, the majority of the implementation has been completed, but we have to integrate the feature better into the Checkmk look and feel, as Lars explained. Alerting is also not yet fully completed.
With regard to the automation and expandability of Checkmk, Lars also drew a positive conclusion in his lecture. For the Check-API and the related programming APIs – such as inventory API, bakery API, etc., – we only need to clarify some details for the implementation. He therefore expects that the basic work on the API will be completed within the next few weeks, and that we can then start migrating the plug-ins.
With the REST-API, on the other hand, we are not quite there yet. Here we have chosen the technological foundation and implemented some important API calls, but we still have to develop many more API calls so that we can offer a useful feature set up for Checkmk 2.0. Our goal is to be able to offer most of the use cases of the existing Web-API with the new REST-API.
Profound changes must be carefully implemented
In terms of performance, according to Lars, we make fundamental changes to core components in some cases. For example, the modification of the helper processes is a serious impact in the helper architecture, which must be carried out with the appropriate care. Currently we are still in the process of rebuilding and testing a lot, as Lars reported. He expects, however, that we will soon be able to start connecting the new helpers to the Micro Core and in May or June we will be able to consider whether the new features will also deliver the expected performance increase.
We have also almost completed the preparatory work on the matter of Activate Changes, so that we can soon incorporate an incremental synchronization of the configuration in distributed monitoring set-ups.
Lars has already reported that we have implemented the checks planned for version 2.0. In the coming weeks until the release of Checkmk 2.0, however, various checks arising from customer orders may be added.
The return of the Innovation Release
In addition to the content, Lars also discussed the changes to our release cycle – the new Feature Packs. We want to make useful functions that we have already completed available in a stable version to all Checkmk users. We deliver the features as MKPs, which are deactivated in Checkmk by default. In the Enterprise Edition, the desired functions can be activated or deactivated via WATO. Raw Edition users activate the features via the CLI.
Lars also talked about our Innovation Releases. We have used these in the past to make large features testable for our users during development. As Lars further explained, these innovation releases are a good tool for us to get feedback from Checkmk users before the beta phase, and at the same time to give them a preview of upcoming functions. We have planned an innovation release for version 2.0.
We have already introduced two feature packs for Checkmk 1.6. However, it is still unclear whether we will release a third one. “Depending on the new features available, there may also be a third feature pack for version 1.6 in the summer” said Lars. What is certain, however, is that we want to start with the innovation releases for Checkmk 2.0 in early summer. For us, this means that we need to have already implemented most of the functions at this point, but have not yet carried out a feature freeze. This enables us to make further changes to features in this phase if necessary.
The feature freeze for version 2.0 is thus planned for summer, so that we can then go intensively into the testing. Since we are making numerous large-scale changes to Checkmk with the Python 3 migration, the introduction of the new UX design, as well as the new REST-API, the new Check-API and the helper split, we want the testing process for 2.0 to be significantly more comprehensive than with previous versions to ensure the high stability, emphasized Lars.
For this we also need you, the community. At this point Lars renewed the request to the participants and Checkmk users to make use of the test options, and to send us their feedback. We will then inform you through our usual channels of when we will be starting the various test phases.
Checkmk 2.1
Furthermore, Lars also gave us a look at what ideas and plans we already have for version 2.1. An important topic after 2.0 is still the improvement of the User Experience. Although in Version 2.0 the most important workflows and elements in Checkmk have been redesigned, the next step, according to Lars, will be to make the detailed dialogues and views more consistent and intuitive for users. We also want to continue optimizing the dashboarding and reporting that started with 2.0.
Another point is the expansion of the monitoring of cloud environments. Here we have provided integrations for Azure and AWS already. An integration for Google Cloud Platform (GCP) will be added in the future. As we already do with Azure and AWS, this will be carried out by a special agent and enable the monitoring of the most important services. We also want to expand our existing portfolio with regard to Azure and AWS based on the needs of the community.
We will also deepen the integration with Prometheus. With Checkmk 2.0 we are already able to integrate the most important exporters. In the future we want to gradually expand the number of exporters based on feature requests, and also be able to connect to Prometheus exporters directly with Checkmk without having to go via Prometheus itself.
2,000 plug-ins and two-factor authentication
Likewise, the goal for 2.1 is to complete the REST-API introduced with 2.0, so that it can map all functions in Checkmk. We also want to drive the performance improvements that have already started. Lars also discussed the development of new plug-ins in his lecture. Here it is our goal to further implement the wishes of customers and the community.
With regard to security around Checkmk, there are considerations according to Lars to introduce two-factor authentication in order to additionally secure the login. For example, this could be in the form of the implementation of an optional two-factor authentication that supports the U2F standard on the GUI. We are also considering whether Checkmk itself should provide a local validation server on which the token for 2FA is registered. In the next step, connecting company-wide validation servers could be an option.
Monitoring of modern hypervisors and E2E monitoring
In addition, Lars also covered a wide range of topics that we also have in mind for the future. For example, the monitoring of modern hypervisors. The world of containerisation is growing ever closer to the world of virtualisation, as Lars noted from the example of VMware vSphere 7. He assumes that other hypervisors will follow this development. With Checkmk, it is possible to monitor container environments and hypervisors very well, but we also want to continue to optimally monitor the ‘merged’ environments, and to be able to make useful statements about conditions and relationships.
When it comes to the cloud, Checkmk has so far been able to monitor the things our users do in the cloud. In addition, however, we also want to consider how in the future we can better enable better Checkmk deployments in the cloud. One consideration that Lars outlined is, for example, the provision of standard images on the large cloud platforms such as GCP, AWS and Azure, so that the user can launch a Checkmk instance on a platform at the push of a button.
Lars also went into the topic of end-to-end monitoring, which we want to focus on in the future. Based on the ntop model, Lars mentioned the possibility that we could provide reasonable integrations for this, for example through articles in our user guide, how-to tutorials or in-product integrations. The aim is to enable you to perform end-to-end monitoring with Checkmk at the heart.
More intelligence and automation in Checkmk
Lars also spoke a lot about ‘Automated Dependencies’. The merging of information should help to show dependencies in IT, to help the user to solve problems faster for example.
We are currently imagining the topic as a puzzle consisting of many sources of information about the interrelationship of services, systems or individual applications, as Lars explained. We want to bring this information together in the future, and to build a model on it that can represent these dependencies and their consequences for the overall system. Based on this model with all the information related to the dependencies, Checkmk should then help the user to understand the complexity of IT more easily and to solve problems faster. Cascading problems can thus be identified and solved at an early stage before they become a problem. Furthermore, you can also avoid unnecessary alarms. We want to take the well-known concept of ‘Service Dependencies’ in Nagios much further, as many dependencies can be recognized automatically, particularly through network information.
In addition, we want to continue working on the subject of ‘Analytics’ in Checkmk. On the one hand, Checkmk should automatically detect anomalies and present them to the user. On the other hand, if an anomaly occurs in a service, Checkmk should automatically show other services with a similar behavior. Since such a correlation does not necessarily have to be a causal connection, the connection with ‘dependencies’ makes such analysis really powerful. That is why we see the two topics ‘Analytics’ and ‘Dependencies’ as closely-connected and as complementary to each other. Together, both tools allow the user an unprecedented insight into the data pool of monitoring, and help to find the root of a problem more quickly.
Away from the classic SNMP approach?
Lars also spoke about the evolution in network monitoring to move away from the classic SNMP approach. The current approach is that Checkmk sends regular polls to the devices to find out their current status. This has the advantage of maintaining a defined state at a specific time of the query. The disadvantage of this, however, is that the pollings generate a continuous load on the systems. It is also unclear what happens between the queries on the system. The latter can be solved by SNMP traps, which send event-based information to the monitoring system, which come with their own issues.
Some hardware manufacturers are currently pushing streaming telemetry as an alternative to SNMP on their platforms. It includes various protocols and APIs with the aim of combining the advantages of polling and event-based monitoring. Another advantage of streaming telemetry is that the protocols prepare the data in a modern way, so that they are available in a language-neutral format. This could make it possible for Checkmk, to get information faster and thus become more performant, for example. According to Lars, we will continue to deal with topics that further simplify and improve the work of and with Checkmk.
Summing up – according to Lars we will have many topics that we want to tackle together with you on the agenda for next year. During the lecture, for example, you the users made extensive use of voting on the points that Lars raised. As always, we will incorporate these results into the further development. In conclusion, we still have a lot to do with Checkmk. You can be excited!