Ep. 30: Clustering the Checkmk appliance
Read Video Transcript
|[0:00:00]||Welcome to the Checkmk channel. Today, I'm going to show you how to cluster the Checkmk appliance.|
|[0:00:15]||If you have one of our hardware appliances, you want to make sure you're resilient against hardware failure. To do so, we deliver the possibility to cluster two Checkmk appliances, so if one of those fails, the other one takes over.|
|[0:00:29]||A word of warning in the beginning in my demo here, you will see that I'm using virtual appliances. That is not something you want to do in production.|
|[0:00:39]||If you're running in production, the clustering feature aims at hardware appliances to make sure you are resilient against hardware failure and not in virtual environments. If you are running our appliance virtually, then the hypervisor will take care of the high availability for you.|
|[0:00:54]||So, without further ado, let's dive into the configuration. Okay, so first, we are going to create a network bond with the four available network interfaces.|
To do so, we navigate to the Device Settings, Network Settings, and there we need to enable the advanced mode. Because here you can see we can only set one IP, one IP configuration, which makes sense if you're running in standalone mode or a set in a virtual machine.
|[0:01:20]||But in this case we want to configure a bonding interface. So, we go to the advanced mode. We have to say that we really want to go there.|
|[0:01:32]||And now we can see all the four interfaces that we can use for configuration. Now to create our first bond, I'm going to click on Create Bonding.|
|[0:01:42]||We can use the default name, which makes sense. Then we want to use the first two interfaces as members of that bond. The default settings for Bonding-Mode and MAC address failover are fine here.|
|[0:01:54]||And now we want to set a IP configuration. So, I'm just going to use the IP configuration that you just saw. Because this is my first bond and I still want to access the web interface here.|
|[0:02:09]||And then we are good to go. So, I'm going to save these settings. And now you can see the changes the configuration was made but the changes haven't been activated. So, i'm just going to go ahead, create a second bonding with the remaining two interfaces.|
|[0:02:28]||The defaults are fine. We just need to add a simple IP configuration here. And this time with a different network. This one's important for clustering but I'm going to get to that later. So, let me just save this configuration too.|
|[0:02:43]||And now we can see all our network interfaces are members of a bond. We have two bonds here. Nothing is active at this point because we need to activate changes.|
|[0:02:54]||So, I'm going to do this right now. And then we need to move over to the second appliance and do the very same changes there too.|
|[0:03:05]||Okay, so as mentioned, now we are on our second appliance and there we need to configure the very same settings too, with different IP addresses of course.|
|[0:03:13]||But the settings all in all are the same because we want to have the same network configuration on both devices. So, again we have the same mode, the same configuration, all in all, the only difference here is the IP address that we are using.|
|[0:03:39]||And I'm just going to save that onto the second bond. You can see I'm a lazy person, so, I prepared everything, I just have to fill in the gaps. And that's that. So, we can activate the changes on the second device too. And then we can go back to the first device and start creating the cluster.|
|[0:04:04]||Okay, so after a few moments, we can see the bonds are active, up and running. Everything's looking good. So, we can head over to the cluster configuration. For that, we go back to the Main Menu and there you can see the Clustering menu here. So, we're going to open that.|
|[0:04:22]||We're going to click on create a cluster with a non discovered device. Okay, so first, we need the IP address of the partner. That's the 202 IP address you saw earlier.|
|[0:04:33]||The Data Sync Interface can stay at bond one. This is actually why we are using bonding interfaces with the hardware because in case a network link fails physically, then the bond will make sure all the traffic goes over the other interface and the communication is still possible.|
|[0:04:52]||Next, we have the Cluster Communication Interfaces. Those are the interfaces that are generally available for communication between the cluster nodes.|
|[0:04:59]||So, we want both bond interfaces there. Then we have to give the cluster a IP configuration, so the cluster address and under which the cluster will be available.|
|[0:05:12]||Of course, we need a Netmask here too. And for this communication, we will take bond0 as you can see the IP address here is in the 56 network and that was on bond0, so this is the configuration that we want to have here.|
|[0:05:26]||And then we need the ping targets. So, before I enter anything here, let me just quickly explain what it does. The ping targets help the cluster to decide whether a node has been isolated from the other node.|
|[0:05:38]||So, both nodes try to reach those ping targets and if the ping targets go down, one side of the cluster knows that it's isolated from the network and can take appropriate action.|
|[0:05:50]||So, if it's the standby side, that doesn't make any difference, it goes offline but the primary side keeps running. If the primary side goes down and realizes itself that it's been isolated, then it will stop action and the second node will take over.|
|[0:06:04]||So, in general you want these ping targets to be some high available IP addresses within your network, like a core switch domain controller, something like that. But it shouldn't be the gateway of the IP network that we are using here. Because both appliances will be able to ping this interface all the time.|
|[0:06:23]||So, I'm using it actually because in my demo environment, I do not have too much IP addresses to choose from. But make sure in a production environment, you give at least two or three high available IP addresses, that make sure the appliances can decide whether the cluster can be still working or if a failover has to take place.|
|[0:06:42]||Okay, that being said, the configuration is done. We can save the configuration. This takes a few moments.|
|[0:06:50]||Now we need to log into the remote appliance because, of course, we need to authenticate ourselves that we are allowed to overwrite the configuration of the second appliance.|
|[0:07:02]||That's what we do with the password here. Then we click on Connect. And after we clicked on Connect, we are asked one more time, if we really want to synchronize with the second appliance. Because it will overwrite all the data that is there, so we want to make sure we look out for the IP addresses if it's really the right appliance. I'm going to confirm that.|
|[0:07:27]||And then it takes a few seconds for the two appliances to communicate with each other to start synchronizing their state. So, if you click on Back, in the first moments, we will not see any information on the cluster state.|
|[0:07:42]||We just get the IP address of the partner and the cluster IP address. That's quite obvious information here. And if I refresh the site, then we see the cluster is starting to build, to be created.|
|[0:07:54]||At this point, the communication hasn't started yet, so let me just refresh one more time. And now we can see both nodes are shown as online. The network reachability is fine.|
|[0:08:08]||At this point, File Synchronization reports are known, synchronization is inactive. And if we update this page one more time, and now we can see everything is working.|
|[0:08:24]||One side of the cluster is already active and we could start creating our first site and start monitoring. But for good measure, we are going to wait on the synchronization process which we see here is synchronizing.|
|[0:08:36]||And it takes about 1 hour in this environment. That can depend on your environment, how long it will take to synchronize. And after that's done, everything will be green and we can start creating our first site. Okay, so now you can see the cluster status is completely green.|
|[0:08:57]||The data synchronization has finished and the cluster is up and running, so we could start creating our monitoring site and start monitoring.|
|[0:09:07]||So, that concludes today's video. Thanks for watching. Make sure to subscribe and see you next time.|
More Checkmk Videos
Ep. 1: Installing Checkmk 2.0 and monitoring your first host
In this video, Baris explains how to take get started with Checkmk and start monitoring your first host within a few minutes.
Ep. 2: The Checkmk 2.0 user interface
In this video, Baris take you through the new user interface in Checkmk 2.0. He explains the various components of the User interface such as the new navigation menus, the Sidebar, main dashboard, tactical overview, how to switch between the Checkmk interface themes and much more
Ep. 3: Using SNMP to monitor network devices in Checkmk 2.0
In this episode, Baris explains how to monitor network devices with Checkmk. SNMP is a protocol that many switches, routers, printers, UPSs, hardware sensors and other devices have implemented with the purpose of being able to monitor them easily.
Ep. 4: Monitoring Windows in Checkmk
In this video of our Getting started with Checkmk series, Baris explains how to install a Checkmk agent on a Windows host system and add that into your monitoring environment.
Ep. 5: Using metrics and graphs in Checkmk 2.0
In the 5th episode of the Getting started with Checkmk series, Baris explains using various metrics that you can monitor in Checkmk such as CPU utilization, CPU load etc. You can also see graph visualizations for these metrics or create and customize your own as per your requirements.
Ep. 6: Updating Checkmk 2.0 and using multiple instances
In this video, Baris explains how to update your Checkmk instance. It is very easy and can be done within minutes. You can run multiple Checkmk instances with different versions on the same system. This gives you the flexibility to test the new version before using it in production.
Ep. 7 (part 1): Working with rules and setting thresholds in Checkmk
In the following three-part videos series, Baris explains rule-based monitoring with Checkmk. In the first part, he shows you how you can work with rules and set threshold values. Rule-based configuration is one of the key features for Checkmk which helps you to scale your monitoring easily within minutes.
Ep. 7 (part 2): Smart rules with Host Tags in Checkmk
In the second part of this video, Baris explains using Smart rules with host tags in Checkmk. In the first part, he shows you how you can work with rules and set threshold values. These are features that you can use to build your rules even more intelligently and to better organize your monitoring.
Ep. 7 (part 3): Managing Hosts in Folder in Checkmk
In this final part of our episode on Rule-based monitoring in Checkmk, Baris demonstrates how to manage hosts in folders in Checkmk. This helps you to apply your monitoring configurations at scale and organize your hosts according to your needs.
Ep. 8: Working with Host and Service Groups in Checkmk
In this Baris demonstrates how to create host and service groups in Checkmk, so you can perform actions on an entire group instead of configuring each of them individually.
Ep. 9: Using the Quicksearch function in Checkmk
In this episode of the Checkmk tutorials, Baris shows how you can use the Quicksearch function in Checkmk. You can use it to easily find and manage certain hosts or services. He also explains some examples of filters to you. In Checkmk 2.0 you can use the same syntax in the Seach function found in the monitor menu to get identical results.
Ep. 10: Detecting configuration errors with the Analyze Configuration feature
With the Analyze Configuration feature, you can check if there are any configuration errors in your installation. Checkmk controls a number of possible security risks or potential performance restrictions and indicates if there are any problems.
Ep. 11: View creation and customization in Checkmk
In this video, Baris demonstrates how to customize headers, columns, and more in Views in Checkmk for yourself or other users. He also explains how to create custom views and add desired information to these views.
Ep. 12: Acknowledging problems in Checkmk
In this video, Baris explains how you can acknowledge problems in Checkmk. This function helps you to qualify the states of hosts and services. This allows you to keep track of messages in the main dashboard and, for example, you can add comments to problems.
Ep. 13: Scheduling downtimes in Checkmk
In the episode of our Getting started with Checkmk series, Baris explains how you can manage the maintenance times of your systems in Checkmk. Such scheduled downtimes prevent your monitoring from sending false alarms when a host or service goes to WARN or CRIT during maintenance work. You can also inform the users concerned about the maintenance via Checkmk.
Ep. 14: Distributed monitoring with Checkmk
In this video, Baris explains how you can connect several Checkmk instances to a monitoring system and then manage it.
Ep. 15: MKPs and Plugins in Checkmk
In the 15th episode of our Getting started with Checkmk tutorial series, Baris explains what are Checkmk Extension Packages (MKPs) and how easy it is to integrate them into your Checkmk monitoring environment. MKPs are the preferred format when you make your own extensions as it makes it easy to share with other users or deploy in distributed environments.
Ep. 16: Working with 'Bulk Actions' in Checkmk
In this episode of our Checkmk tutorials series, Baris explains how you can save a lot of time with bulk actions. With this feature you can perform various tasks such as deleting, renaming, service discovery etc. on a large number of hosts simultaneously.
Ep. 17: Working with network topologies in Checkmk
In this video of our gettign startted with Checkmk series, Baris explains how to map network topologies in Checkmk. This feature is quite helpful to manage your network and prevent any unnecessary notifications from the devices in your network.
Ep. 18: Creating and customizing dashboards in Checkmk
In this video of our Getting started with Checkmk series, Mathias explains how you can create and customize dashboards in Checkmk 2.0, so you can get insights into your monitoring according to your requirements. Find out more in this video.
Ep. 19: Monitoring websites and their certificates with Checkmk
In this episode, Bastian demonstrates how to monitor a website and its certificate with Checkmk. You can also monitor specific web pages with Checkmk by using the several options that will suit your use case. Learn more in this video.
Ep. 20: Configuring dashboard elements in Checkmk
Learn how to add data visualization elements of the various metrics into your Checkmk Dashboard. In this video, Mathias explains how you can configure these elements and create a dashboard as per your requirements.
Ep. 21: Setting up notifications in Checkmk
Learn how to set up notifications in Checkmk and assign relevant contacts and contact groups to be notified for various events. Later in this video, our presenter Bastian also demonstrates how you can set up rule-based notifications according to different conditions for hosts and services.
Ep. 22: Monitoring logfiles with Checkmk
Monitor your logfiles with Checkmk using its Logwatch plugin. It is very useful when you want to monitor your logfiles regardless of whether you are using a UNIX/Linux or a windows based system. Learn more in this video.
Ep. 24: 3 Rules for efficient network monitoring
In this video, Bastian demonstrates 3 rules that will help you to efficiently monitor your network interfaces. With Checkmk 2.0, with just three rules, you can set up an efficient network monitoring that will not only monitor all of your network interfaces but also simultaneously provide a detailed overview of all of your ports.
Ep. 25: New UX and security improvements in Checkmk 2.1
Checkmk 2.1 come with many UX improvements such as pre-built dashboards for Linux and Windows, faster core performance and much more. Security features such as two-factor authentication etc. were also added in this new version. Watch this video to learn how to use these new features and enhancements in Checkmk.
Ep. 28: Working with InfluxDB integration in Checkmk
Learn how to send data to InfluxDB from Checkmk. As InfluxDB introduced a new protocol to send data to it, a new connector was developed with Checkmk to talk natively with it. Learn more about it in this video.
Ep. 29: New agent architecture in Checkmk 2.1
With Checkmk 2.1, the agent architecture was modified to enable performance improvements and add new features such as TLS encryption, data compression, and the reversal of direction of communication from the agent. This will enable push mode and pull mode.
Ep. 32: Working with the Agent bakery in Checkmk
In this video, Robin demonstrates how to roll out agent packages with the required configuration for different monitored systems using the agent bakery in Checkmk. The "Automatic agent update" is quite a helpful feature as it pulls the latest configurations for an agent automatically and you don't need to manually update all of your agents deployed on different systems.
Ep 33: Monitoring Docker containers with Checkmk
Learn how to monitor Docker containers with Checkmk.In this video, Robin demonstrates the process of setting up a rule to configure the docker plugin and bake an agent with the desired settings for the Docker host.
Ep 34: Introduction to Checkmk Ansible collection
Last year the Checkmk Ansible collection was created to interact with the Checkmk REST API. In this video, Robin demonstrates how you can use this Ansible collection to automate your monitoring with Checkmk.
Ep 35: Monitoring SQL databases with Checkmk
In this video, Robin demonstrates how you can configure your Checkmk site to monitor your SQL databases. As there are many flavours of SQL databases, the process is mostly the same.