1. OMD - The Open Monitoring Distribution
The Checkmk Monitoring System uses the Open Monitoring Distribution (OMD). Founded by Mathias Kettner, OMD is an open source project which revolves around the convenient and flexible installation of a monitoring solution made up of various components. The abreviation OMD might already be familiar to you as part of the RPM/DEB-Package installation.
An OMD-based installation is distinguished by a number of characteristics:
- the ability to run multiple instances (or ‘sites’) in parallel
- the ability to operate instances with differing versions of the monitoring software
- an intelligent and easy to operate upgrade/downgrade-mechanism
- uniform file paths – regardless of which Linux-platform is installed
- a clear separation of data and software
- a very simple installation – with no dependence on third-party software
- a perfect preconfiguration of all components
2. Creating instances (or ‘sites’)
Perhaps the best thing about OMD is that it can manage any chosen number of monitoring instances on a server. These can also be referred to as Sites. Each ‘instance’ is a self-contained monitoring system which runs independently of the others.
An instance always has a distinct name, specified at its creation. This name is the same as that of the Linux-user which is created at the same time. The instance's name conforms to the same conventions as user names under Linux.
The creation is performed with the omd create command. This must be executed as root:
root@linux# omd create mysite Adding /opt/omd/sites/mysite/tmp to /etc/fstab. Restarting Apache...OK Creating temporary filesystem /omd/sites/mysite/tmp...OK Created new site mysite with version 2014.11.17.mmk. The site can be started with omd start mysite. The default web UI is available at http://Klappfisch/mysite/ The admin user for the web applications is omdadmin with password omd. Please do a su - mysite for administration of this site.
From Version 1.4.0, when creating the cmkadmin user a password will be randomly-generated and issued.
What takes place during the creation of an instance ‘mysite’?
- An operating system user mysite, and a group mysite will be created.
- A new home directory /omd/sites/mysite will be created and assigned.
- This home directory will be populated with configuration files and sub-directories.
- A basic configuration will be created for the new instance.
2.1. User and group IDs
In some cases it is also desirable to specify the user/group ID of the new user to be created. This is performed with the -u and -g options, e.g.:
root@linux# omd create -u 6100 -g 180 mysite
An overview of the further options can be shown with omd create --help. The most important options are:
|-u UID||The new user will be created with the User-ID ‘UID’.|
|-g GID||The new user's group will be created with the Group-ID ‘GID’.|
|--reuse||OMD assumes that the new user already exists, and does not create it.|
|-t SIZE||The new site's tmpfs will be created with the ‘SIZE’ parameter. SIZE has the suffix M (Megabyte), G (Gigabyte) or % (percentage of RAM). Example: -t 4G|
3. Instance User (Site User)
The further administration of the instance is always best performed with the rights of the newly-created user. Switching users is done with su:
root@linux# su - mysite
Please note that the ‘minus sign’ following the su is essential. It ensures that switching users processes ALL of the operations that take place during a normal login. In particular, all environment variables will be correctly set, and your session will continue as mysite in the home directory of the /omd/sites/mysite instance.
4. Starting and stopping instances
Your instance is now ready to be started – which can be done as root with omd start mysite. It is fundamentally better though to work with the instance as the instance user (site user):
OMD[mysite]:~$ omd start Starting Livestatus Proxy-Daemon...OK Starting rrdcached...OK Starting CMC Rushing Ahead Daemon...OK Starting Check_MK Micro Core...OK Starting dedicated Apache for site mysite...OK Initializing Crontab...OK
Unsurprisingly, stopping is achieved with omd stop:
OMD[mysite]:~$ omd stop Removing Crontab... Stopping dedicated Apache for site mysite....OK Stopping Check_MK Micro Core...killing 15085...OK Stopping CMC Rushing Ahead Daemon...killing 15071....OK Stopping rrdcached...waiting for termination...OK Stopping Livestatus Proxy-Daemon...killing 15049....OK
Starting and stopping an instance is nothing other than starting or stopping a collection of services. These can also be individually managed by specifying the name of the service, e.g.:
OMD[mysite]:~$ omd start apache Starting dedicated Apache for site mysite...OK
The names of the various services can be found in the ~/etc/init.d directory. Please note the leading tilde – this represents the home directory for the instance-user (the site-directory). This is not the same as /etc/init.d!
Alongside start and stop, there are also the restart, reload and status commands. Reloading Apache is, for example, always necessary following a manual change to the Apache-configuration. Please note that this does not apply to the global Apache-process on the Linux-server, but rather the site's own dedicated Apache-process:
OMD[mysite]:~$ omd reload apache Reloading dedicated Apache for site mysite....OK
In order to be able to maintain an overview of state of the site following all of the starts and stops, simply use omd status:
OMD[mysite]:~$ omd status liveproxyd: stopped rrdcached: running cmcrushd: running cmc: stopped apache: running crontab: running ----------------------- Overall state: partially running
5. Deleting instances
Deleting an instance is as easy as creating one – with the omd rm command. The instance will first be automatically stopped.
root@linux# omd rm mysite omd rm mysite omd rm mysite PLEASE NOTE: This action removes all configuration files and variable data of the site. In detail the following steps will be done: - Stop all processes of the site - Unmount tmpfs of the site - Remove tmpfs of the site from fstab - Remove the system user <SITENAME> - Remove the system group <SITENAME> - Remove the site home directory - Restart the system wide apache daemon (yes/NO): yes
It goes without saying that this action also deletes all of the instance's data!
If you are no fan of confirmation prompts, or wish to perform the deletion as part of a script, the deletion can be forced with the -f option. Attention: here the -f must be placed before the rm:
root@linux# omd -f rm mysite
6. Configuring the components
As already mentioned, OMD is a system that integrates multiple software components into a monitoring system. In so doing, some components are optional, and for some there are alternatives or different operational settings. All of this can be comfortably configured with omd config. There are also scripting and interactive modes. This latter can be simply opened by a site-user with:
OMD[mysite]:~$ omd config
If you alter a setting, the OMD will be immediately notified that the site must be stopped (if that is not already the case), and does this as needed:
Those who don't like the interactive mode, or prefer to work with scripts, can set the individual variables using commands. For this there is the omd config set command. The following example sets the CORE variable to cmc:
OMD[mysite]:~$ omd config set CORE cmc
As always, this can be performed as root if the site's name is added as an argument:
root@linux# omd config mysite set CORE cmc
The current configuration of all variables can be viewed using omd config show:
OMD[mysite]:~$ omd config show APACHE_MODE: own APACHE_TCP_ADDR: 127.0.0.1 APACHE_TCP_PORT: 5000 AUTOSTART: off CMCRUSHD: on CORE: cmc [...]
6.2. Commonly used settings
There are numerous settings in omd config. The most important are:
|CORE||cmc||Selection of the monitoring core. As well as the Checkmk Micro Core (CMC), the standard Nagios core is still available. In earlier versions this was set as the default.|
|MKEVENTD||on||Activates the Checkmk Event Console, with which the syslog messages, SNMP-Traps and other events can be processed|
|MKNOTIFYD||on||Enterprise Editions: Activates the Notification-Spooler. Firstly, this forwards remotely-generated notifications to a central system. This will require mknotifyd on the central and remote sites respectively. An asynchronous delivery of messages can additionally be performed using this.|
|AUTOSTART||on||Set this to off if you want to suppress an automatic starting of the instance when the computer is started. This is primarily of interest for test installations that should not normally start by themselves.|
|LIVESTATUS_TCP||off||Allows external access to the status data for this site. A distributed monitoring can be constructed with this. The status of this instance can be incorporated into the central instance. Please only activate it in a secure network.|
7. Copying and renaming instances
It is sometimes useful to create a copy of an instance, for testing purposes or for the preparation of an update. Of course one could simply copy the /omd/sites/alt directory to /omd/sites/neu. That will however not work because:
- Many configuration files include the site's name.
- In addition, at numerous locations there are absolute data paths with the /omd/sites/alt prefix.
- Not least, a user and a group with the site's name to which everything belongs, must be available.
To simplify the copying of an instance, there is the omd cp command, which takes all of these factors into consideration. Its use is very simple. As argument simply enter the name of the existing site followed by the name of the new one. For example:
root@linux# omd cp alt neu
The copy can only work if:
- The site has been stopped.
- No processes that belong to the instance user are running.
7.1. Limiting data volume
If a large number of hosts are being monitored, the volume of data to be copied can be quite substantial. The greater part of this is the performance data which is stored in RRD-files. But the log files containing historic events can also produce larger data volumes. If the history is not required (for example, if only testing is being performed), these can be omitted from the copy. In such cases the following options can be added to omd cp:
|--no-rrds||The copy will exclude performance data (RRDs)|
|--no-logs||All log files and remaining historic data will be excluded|
|-N||This is an abreviation of --no-rrds --nologs|
The order of the options is important:
root@linux# omd cp --no-rrds alt neu
7.2. Renaming instances
Renaming an instance is performed with the omd mv command. This functions similarly to the copy command and has the same prerequisites. The options to restrict the data volume are not available since the data is only being moved to another directory and is not being duplicated. For example:
root@linux# omd mv alt neu
7.3. Further options for cp and mv
Both operations will create new Linux-users in exactly the same way as create does, thus some of the options for omd create are also available for use:
|-u UID||The new user will be created with the User-ID UID.|
|-g GID||The new user's group will be created with the Group-ID GID.|
|--reuse||OMD assumes that the new user already exists and does not create it.|
|-t SIZE||The new site's tmpfs will be created with the ‘SIZE’ parameter. SIZE has the suffix M (Megabyte), G (Gigabyte) or % (percentage of RAM). Example: -t 4G|
8. Showing changes with omd diff
When creating a new Checkmk-instance the omd create command populates the etc directory with numerous predefined configuration files. A number of directories will also be created under var and local.
Now it is probably the case that in the course of time a number of the files will have been customised. When after a time you wish to determine which files are no longer in the condition as originally supplied, the omd diff command can provide the answer. Amongst other things, this is useful before an update of Checkmk, since your changes could conflict with changes in the default files.
In a request without additional arguments, all changed files will be listed:
OMD[mysite]:~$ omd diff * Deleted var/log/nagios.log * Changed content var/check_mk/wato/auth/auth.php * Changed content etc/htpasswd ! Changed permissions etc/htpasswd * Changed content etc/diskspace.conf * Changed content etc/auth.secret * Changed content etc/apache/apache.conf
You can also enter a query for a specific directory:
OMD[mysite]:~$ omd diff etc/apache * Changed content etc/apache/apache.conf
If you wish to see the changes in detail, simply enter the complete file name:
OMD[mysite]:~$ omd diff etc/apache/apache.conf --- /dev/fd/63 2017-01-24 09:14:46.248968199 +0100 +++ /omd/sites/mysite/etc/apache/apache.conf 2017-01-24 09:12:37.705355164 +0100 @@ -66,8 +66,8 @@ StartServers 1 MinSpareServers 1 MaxSpareServers 5 -ServerLimit 128 -MaxClients 128 +ServerLimit 64 +MaxClients 64 MaxRequestsPerChild 4000 ###############################################################################
9. Backing-up and restoring instances
9.1. Backing-up instances with omd backup
The site management in Checkmk has a built-in mechanism for backing up and restoring Checkmk-instances. The omd backup and omd restore commands are the basics for packing all of an instance's data into a tar archive, and respectively, extracting that data for a restore.
From Version 1.4.0 Checkmk additionally uses the Backup WATO-module which makes a backup and restore possible without the command line, and which also enables the setting-up of regular backup jobs.
Backing up an instance with omd backup does not require root-permissions. An instance user can perform this. Simply enter as an argument the name for the backup file to be created:
OMD[mysite]:~$ omd backup /tmp/mysite.tar.gz
Please note however:
- The created file type is a gzip-compressed tar archive. Therefore use .tar.gz or .tgz as the file extension.
- Do not store the backup in the instance directory, since this will of course be completely backed up – thus every subsequent backup will contain a copy of ALL of its predecessors!
If the backup's target directory is not writable for an instance user, the backup can otherwise be performed as a root-user. In this case an additional argument is always required specifying the name of the instance to be backed up:
root@linux# omd backup mysite /var/backups/mysite.tar.gz
The backup contains all of the instance's data – except for the volatile data under tmp/. With the tar tzf command one can easily have a look at the file's contents:
OMD[mysite]:~$ tar tvzf /tmp/mysite.tar.gz | less lrwxrwxrwx mysite/mysite 0 2017-01-24 09:02 mysite/version -> ../../versions/2017.01.16.cee drwxr-xr-x mysite/mysite 0 2017-01-24 09:12 mysite/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/styles/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/scripts/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/templates/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/gadgets/
9.2. Backup without history
The lion's share of an instance’s data is the performance data retained in the RRDs. The monitoring history can also be very large. If neither of these are absolutely required, with the following options the history data can be omitted, thus making the backup smaller and faster running. The options must be coded after the word ‘backup’:
|--no-rrds||Omits backing up the RRD-databases (performance data)|
|--no-logs||Omits the monitoring history stored in the log files|
|-N||An abreviation of --no-rrds --no-logs|
OMD[mysite]:~$ omd backup -N /tmp/mysite.tar.gz
9.3. Backing up a running instance
A backup does not require the instance to be stopped, and therefore can be executed while the system is running. In order to ensure a consistent condition of the RRDs used for recording the performance data, the omd backup command automatically alters the Round-Robin-Cache to a mode with which the running updates are written only to the journal, and no longer to the RRDs. The journal files are the last to be backed up – thus it can be achieved that as much as possible of the performance data that has been generated during the backup is also included in the backup.
The restoring of a backup is as simple as the backup itself. The omd restore command restores an instance from a backup. This is even possible for a user. The instance must be stopped for this procedure. The instance will not be newly-generated (which would require root-permissions), rather it will be completely emptied and then refilled:
OMD[mysite]:~$ omd stop OMD[mysite]:~$ omd restore /tmp/mysite.tar.gz
Following the restore the instance can be restarted:
OMD[mysite]:~$ omd start
A restore can also be performed by a root-user. If an instance with the same name already exists, this must first be deleted. This can be performed either with an omd rm, or by simply including the --reuse option. A --kill additionally ensures that the existing instance is first stopped. It is not necessary to use the instance's name with the restore, since this is contained in the backup:
root@linux# omd restore --reuse --kill /var/backup/mysite.tar.gz root@linux# omd start mysite
When operating as root, you can restore the instance with a different name from that in the backup. Include the desired alternative name as an argument following the restore command:
root@linux# omd restore mysite2 /var/backup/mysite.tar.gz Restoring site mysite2 from /tmp/mysite.tar.gz... * Converted ./.modulebuildrc * Converted ./.profile * Converted .pip/pip.conf * Converted etc/logrotate.conf
The long list of conversions found here has the same function as for the renaming of instances described earlier: The instance's name is included in numerous configuration files, and with this these occurrences will be replaced automatically by the new name.
9.5. Live migration of instances with backup & restore
The omd backup and omd restore commands can – in the good old Unix tradition – instead of files, also work with the standard input/output. Instead of a data path for the tar file, simply enter a hyphen (-).
In this way a pipe can be constructed and the data ‘streamed’ directly to another computer without requiring intermediate files. The larger the backup, the more advantageous this will be since no temporary space in the backed up server's file system will be needed.
The following command backs up an instance to another computer using SSH:
root@linux# omd backup mysite - | ssh user@otherserver "cat > /var/backup/mysite.tar.gz"
If you want to reverse the SSH-access – by which you prefer to log in TO the Checkmk-instance FROM the backup server – that is also possible, as shown in the following example. For this, first an SSH-Login as an instance user must be permitted:
root@otherserver# ssh mysite@checkmkserver "omd backup -" > /var/backup/mysite.tar.gz
If you are clever, and combine the above with an omd restore which reads the data from the standard input, you can copy a complete, running instance from one server to another – and without needing any additional space for a backup file:
root@otherserver# ssh mysite@checkmkserver "omd backup -" | omd restore -
And now, the same procedure with a reversed SSH-access – but this time from the source system to the target system:
root@linux# omd backup mysite - | ssh root@otherserver "omd restore -"