Ep. 31: Upgrading the Checkmk appliance to version 1.5

[0:00:00] Welcome to the Checkmk channel. Today, we are taking a look at how to upgrade your Checkmk appliance to version 1.5.
[0:00:17] Upgrading the Checkmk appliance is quite a simple task. You simply acquire the firmware file, which you then upload to your appliance.
[0:00:28] You confirm that you actually want to upgrade your appliance. Then the appliance reboots twice and after that your appliances up and running again and you're good to continue monitoring.
[0:00:40] With the upgrade to version 1.5, there is one minor catch. Because the upgrade from version 1.4 to 1.5 includes a distribution upgrade of the underlying Debian Linux, which makes the old Checkmk installation packages incompatible.
[0:01:03] So, what does that mean? It's really simple in the end after you upgraded the firmware of your Checkmk appliance, you simply need to upload a compatible Checkmk version and then your sites continue running.
[0:01:18] That sounds worse than it is, so I suggest we just take a look and I'll show you how to go through this process.
[0:01:26] So, we are looking at my running appliance over here. We have a Checkmk site that is running at the moment and doing its job. And now we've decided we want to upgrade the firmware to version 1.5.
[0:01:41] So, what we need first is we need to stop the Checkmk site. So, let's go ahead and do that. Because that's generally a requirement of operating your Checkmk appliances firmware, there can be no running Checkmk sites.
[0:01:59] Now that we did that, we can go back to the Main Menu and navigate to the Firmware Update menu. You can see I'm at the most recent version of the 1.4 firmware at the time of recording, that is.
[0:02:15] And now we need to select a new firmware file. Now let's browse there. Here I already have the 1.5 firmware file. Let me open that. Click on Upload & Install.
[0:02:26] That will take a few seconds in that the firmware is uploaded to the appliance verified by the appliance. And we will be presented by a prompt that's asking us if we are actually ready to perform the upgrade.
[0:02:39] In this case we get an additional dialogue. So, you might be familiar with the yellow box, that's telling you what kind of update is going to be performed and if you actually want to go ahead with it.
[0:02:51] And the green box in the upper part of the page now tells you that there are Checkmk versions which are incompatible with the new 1.5 version. That is not a problem at all. This is just an information. Here you can see which sites are affected.
[0:03:07] But, as I already said, the process is really simple, so I suggest we just go ahead, confirm the upgrade, and wait for a few moments until the appliance is rebooted.
[0:03:18] Now while we are waiting, maybe just a few words on what is happening in the background. So, first the error message that you see here about a connection being reset, that is expected because the appliance needs to reboot to install the firmware update.
[0:03:31] And it does that really quickly, so this error message is just your browser telling you the appliance is not reachable anymore. So, this is totally expected, no worries there.
[0:03:41] Now what is happening in the background while we are waiting for the appliance to reboot again? The appliance reboots into a upgrade mode.
[0:03:48] It then upgrades the complete firmware volume to the new firmware version. And then it performs another upgrade to boot the freshly installed new firmware version.
[0:04:01] And after that has happened the appliance boots back and we can log into the web interface again. Let me just try if we already are able to do that.
[0:04:14] That's looking good. We are already or we are still logged in actually. And you can see the current firmware version here is version 1.5. So, that worked like a charm, really easy.
[0:04:25] So, let's get back to the Main Menu because we have to take a look at the site management here. Because as I said earlier, with this major firmware update, we have Checkmk versions that have become incompatible, the warning message also stated that.
[0:04:40] So, let's take a look at the Site Management. There we can see the error message, that the currently installed version is incompatible.
[0:04:49] We already talked about that. So, let's take a look at the versions tab. And here we can see the old version is still installed, but we see it's before the 1.5 firmware and it's incompatible, so let's fix that.
[0:05:02] I already downloaded the installation file for the new Checkmk appliance version, that is denoted by the little 3 in here, just so you notice.
[0:05:15] The download can be found on the download page, where you also download your Checkmk versions. So, there you can find the firmware version for the appliance version 1.5.
[0:05:25] I already prepared that, so let me just click on Upload & Install. This will take a few moments and then we will see we have a compatible Checkmk version.
[0:05:36] And then we can navigate back to the site overview, take a look at the site, make sure that everything's looking good and then we can restart the site. So, there really is no danger in performing this upgrade.
[0:05:48] It might look a little frightening, but it really is a simple process and it only takes two minutes more than your average upgrade from, let's say, version 1.4.19 to 1.4.20 or something like that.
[0:06:05] Okay, so the version was installed. You can see nothing really changed here, only that the warning message about the incompatible platform has vanished because this is not a additional version.
[0:06:13] The incompatible version has been removed, the compatible version has been installed. And if we move back to the site writer, you can now see we see our site again, nothing has changed here. The only thing is it's still stopped. So, let's go ahead and start the site, so we can continue monitoring.
[0:06:33] Alright, with that, that concludes the first part of the video, which takes care of a single appliance firmware upgrade. In a few moments, we will take a look at the cluster upgrade because there we also have a little difference in comparison to the upgrade process before version 1.5.
[0:06:53] All right, now let's take a look at the upgrade process for a clustered appliance which consists of two appliances. So, before version 1.5, you would simply put one node of the cluster into maintenance mode, you would upgrade that site.
[0:07:09] Make sure it's running again. Then you would shift the cluster from one to the other. You would put the other node in maintenance mode, upgrade that node, and afterwards, your whole cluster would be on the new firmware version and your monitoring continues running.
[0:07:23] With the update to version 1.5, we actually need to disband the cluster because it's a operating system upgrade, so we are changing the underlying Debian operating system from one version to the another.
[0:07:42] And that has also effects on the clustering mechanisms and that's why we cannot simply upgrade it site by site. So, let's take a look how we go about that.
[0:07:52] Now we are looking at our clustered appliance. We can take a look at the Clustering page to see the state. And there we can see both nodes are synchronized.
[0:08:05] The monitoring site is running on the main node of the cluster on CLUSTER1 here, on the right side. And everything is healthy, everything is looking good. So, let's talk about how we perform the upgrade here.
[0:08:20] So, the first thing I have to mention is, I said a few minutes ago, we have to disband the cluster. That is not right. We just need to disconnect the cluster, so we don't need to recreate the cluster completely after the upgrade.
[0:08:32] We simply need to reconnect the cluster. But, for the time being, we need to disconnect the cluster, so both nodes are standalone. The monitoring site will remain on the main node but it will be stopped.
[0:08:45] And then we perform the upgrade on both sites, and restart, and reconnect the cluster. So, let's go right ahead.
[0:08:54] We are going to disconnect the cluster. We are asked if we really want to do this. Of course, we want to be sure that we know what we are doing, so I'm going to confirm this here.
[0:09:02] That takes a few seconds to inform the other node that we are leaving the cluster, disconnecting it. And after that has been done, we can take a look at the monitoring sites.
[0:09:17] So, here it says both nodes have been disconnected, perfect. So, let's go back. We can see the cluster information is still there but the cluster has been disconnected.
[0:09:26] Now let's go to the Main Menu, to the Site Management. And there we can see, while the cluster is disconnected, we cannot start or stop sites. So, the site is stopped which means we can go about our firmware update.
[0:09:42] So, let me start by going to the dialog. I'm going to select the firmware update file, which we downloaded earlier.
[0:09:50] I'm going to say Upload & Install. As this will take a few moments, I'm going to switch over to the second appliance, take a look at the clustering page.
[0:10:00] Actually, logging in again because our session expired. Now taking a look at the cluster page and we can see the second note also was informed about the disconnect. It's also in a disconnected state.
[0:10:12] So, let's right head for the Firmware Update page. And again, we go for the update file, select it, click on Update & Install.  We can do this actually site by site because there are no ramifications.
[0:10:26] Because both appliance are not talking to each other at this point because the cluster is disconnected.
[0:10:33] So, now we are presented again with the confirm update dialog that we saw earlier already. And we also get the information about the clustered mode, which we already talked about. That's why I'm showing you what's going on. The same warnings here.
[0:10:48] So, let's confirm this. The appliance needs to install and reboot. As said earlier, this connection reset information is expected. So, let's head over to the second appliance. So, let's go about the update again, press Yes, and receive the connection reset.
[0:11:07] And now the very same process happens in the background on both appliances. We simply do it site by site with two appliances at once. But, in the end, it's the same process you already saw.
[0:11:17] So, let's wait a few seconds until the appliances have rebooted and we can move on. Okay, let's try and see whether the appliance is back online again.
[0:11:32] There we are, we can see the current firmware version is 1.5, so the upgrade was successful. So, let's get back to the Main Menu. Let's go to the Clustering page, and just really quickly check whether the second appliance is also available again.
[0:11:49] All right, version 1.5. So, both sites have been upgraded to the new firmware version. So, on the main node, I'm going to click on Reconnect Cluster.
[0:12:01] This will again take a few seconds in which the communication between the second node and the first node is being re-established and the cluster can ensure its proper state.
[0:12:16] And there we are, the cluster is already talking or better said the nodes are starting to talk to each other. This can take a few moments until the cluster is synchronized again. So, we can see both sites have firmware version 1.5 over here.
[0:12:32] We can see both our online, Network Reachability has started. And let's refresh the page again. And there we can see file synchronization. Everything is up to date, everything is fully synchronized.
[0:12:45] Mostly, everything is running. The only thing is the monitoring site that stopped. And again, we already talked about that, we need to upload a compatible Checkmk version.
[0:12:55] So, if we go to Checkmk versions, we see this dialog, which we already know. So, I am gonna just upload a compatible Checkmk version. Upload & Install. This will just take a few moments.
[0:13:14] And then we have the compatible version and can restart our monitoring site. And then we will see that everything is back in order. There we are, the version has been uploaded.
[0:13:29] So, now let's go to the sites menu. We can see our monitoring site, it has already been restarted after a compatible version was there.
[0:13:38] So, let's take a last look at the clustering information. And there we can see the cluster is in the expected state. Everything is up and running, the monitoring site is working.
[0:13:49] And with that I'm going to conclude this video. You just learned how to upgrade your Checkmk appliance to version 1.5 and also how to do that with a clustered setup.
[0:14:01] So, with that, thank you for watching. Be sure to subscribe and I will see you next time.

