Rockefeller University monitors its research IT infrastructure with Checkmk
Who is the Rockefeller University?
Rockefeller University is the world’s leading biomedical research university. The institute draws top scientists and graduate students from around the world in pursuit of one mission: To conduct science for the benefit of humanity. The labs employ the latest technology to answer the toughest questions in their fields. And right outside is New York City, a mecca for culture and ideas, and a burgeoning hub for biomedical science.
We need a monitoring that allows us to counteract issues, before they impact the performance of our IT. We found it with Checkmk.
The IT team decided to replace the existing Nagios monitoring because it did not meet their requirements and was too difficult to maintain. One issue was the number of email alerts, sometimes more than 800 per day.
The university decided to switch to Checkmk. Since 2022, the IT team has been monitoring
more than 850 hosts with the Checkmk Enterprise Edition.
High performance is essential
Rockefeller University is one of the leading institutes in the field of biochemical research. Since its foundation in 1901 a total of 26 Nobel Prize winners have attended the institution. To be able to continue conducting science at such a high level of innovation, the functioning of the IT infrastructure is one of the top priorities.
The IT team aims to provide all of the necessary resources to ensure the best performance of all research facilities. Due to its high standards and the fact that the university has no undergraduate students, rather mostly researchers as users, the development of a large IT infrastructure has been necessary.
Most of the hardware resources are hosted in data centers with focus on high performance computing. The IT teams runs server clusters with around 6,000 CPU cores in total. The IT infrastructure also has to support healthcare devices in the university hospital that is also on campus. The facilities all together consist of more than 50 labs from different scientific fields. Besides biochemical labs, this also includes maker spaces and manufacturing workshops. In all of them, users expect the IT to run at peak performance.
The challenge
The Rockefeller University has to provide IT infrastructure for research laboratories. The former monitoring solution has been outdated and kept sending many imprecise alarms, which often have been ignored by the people. Thus, the IT team started looking for a monitoring tool that can deal with high performance systems, has a good alerting and is easy to maintain.
Checkmk can store data for a long time. This allows us to understand the impacts of actions, regardless of whether they occurred recently or in the past. We can always trace issues back and gain insights to improve our IT infrastructure and business processes.
Monitoring is crucial for Rockefeller University. The university previously used Nagios, but the solution generated too many false alerts, took a lot of effort to maintain and did not provide the desired visibility. The IT team found the Checkmk Raw Edition as an alternative to replace Nagios.
After using Checkmk for a few years, in 2022 the IT team decided to switch to the Checkmk Enterprise Edition to make better use of the collected monitoring data. The additional features are helping the IT team to act more proactively.
Checkmk monitors all assets in the data centers and the networks, such as enterprise applications, servers, switches, and storage. The monitoring includes more than 850 hosts with around 17,000 services. Over the years, the IT team has developed a few of its own plug-ins, but monitors most of its systems with the official Checkmk plug-ins. The implementation went smoothly and the IT team is pleased that Checkmk only notifies them when they really need to take action. Checkmk has significantly reduced the number of false alerts and the working hours spent on the monitoring.
The solution
Rockefeller University started using the Checkmk Enterprise Edition in 2022, and put an end to the alert fatigue in their IT team. Checkmk replaced Nagios and is now the main monitoring tool. It is used by several teams to monitor all assets on campus, including their network and the high performance data center.
More transparency and efficient communication
The biggest advantage with Checkmk, however, is the improved visibility and transparency in the monitoring. Each department can adjust the monitoring of their hosts to their needs. At the same time, every user can have read-only access to the complete Checkmk environment and can see all hosts.
Thus, everyone is on the same page and can check whether an issue with their systems is related to another issue in a different department. This is possible because Checkmk has a granular access control and the IT team provides admin rights to different user groups by using a simple folder structure in Checkmk. This enables teams for areas such as IT security, applications, or the help desk to create and manage their own hosts in Checkmk.
Rockefeller University is very pleased with Checkmk. The number of support calls has dropped since the implementation of the monitoring. The IT team also has more time for other things besides monitoring and can focus on more important tasks
The communication and support from Checkmk is great. We are treated well, and we are glad that we have chosen Checkmk.
The advantages
Checkmk eases the burden on the IT team, which now requires less time for managing the monitoring. The notifications are precise and let the right people know where and when they have to take action. With Checkmk the IT team is also able to collect and store real-time data, which enables them to act proactively. Most issues and anomalies can be detected before they impact the performance of the IT infrastructure.