Checkmk
to checkmk.com

1. Introduction

The Livestatus is the most important interface in Checkmk. This is the fastest possible way to get all the monitored host’s and service’s data, including live data. So, for example, the Overview's data is retrieved directly via this interface. Since this is read directly from the RAM, slow hard drive access is avoided, thus providing rapid access to the monitoring without putting too much load on the system.

In order to structure the data this data is arranged in tables and columns. The hosts table includes, for example, the name, state and numerous other columns. Each line in the hosts table represents a host, the services table a service, and so on. In this way the data can be simply searched and retrieved.

This article should help you to use this interface for your own queries, extensions and customizations. As an instance user you can – using copy and paste – directly test all of the queries and commands in this article.

2. The Livestatus Query Language (LQL)

2.1. Using the LQL in the shell

Access to the Livestatus is made over a Unix-Socket using the Livestatus Query Language (LQL). It’s syntax is based on HTTP.

Via the command line there are a number of ways of accessing the interface. One possibility is to use the printf and unixcat commands to send an instruction to the socket. The unixcat tool is already included in Checkmk for the instance user. Important: all inputs to the socket are case-sensitive so this must always be observed:

OMD[mysite]:~$ printf "GET hosts\nColumns: name\n" | unixcat ~/tmp/run/live

The interface expects all commands and headers in a separate row. You can mark such a line break with \n. As an alternative to the command above, you can also use the lq script command, which saves you a bit of work by auto-completing some fields when entering:

OMD[mysite]:~$ lq "GET hosts\nColumns: name"

Or you can start the interactive entry stream and enter the command followed by the header. With a blank line you execute the command with its header, and with a further line the socket access is ended. Note that in the example, everything before the blank line belongs to the command, and everything between the first and second blank lines is the response:

OMD[mysite]:~$ lq
GET hosts
Columns: name

myserver123
myserver124
myserver125

OMD[mysite]:~$

The following examples are always executed with the lq command – in the direct form when the query is short, and as an entry stream for longer queries.

LQL commands

In the first examples you have already seen the first of two commands: with GET you can call-up all available tables. In the command reference can be found a complete listing, with a description, of all available tables, and this article also contains a general explanation on using the Livestatus.

With COMMAND you can issue commands directly to the core, for example, to set a downtime, or to completely deactivate notifications. A list of all available commands can in any case be found in the command reference in Commands.

LQL headers

For every GET-command you can insert various headers in order to restrict the results from a query, to output only specific columns for a table, and much more. The following are the two most important headers:

Header Description

Columns

Only the specified columns will be produced by a query.

Filter

Only the entries which meet a specific condition will be produced.

A list of all headers, each with a short description can be found here.

Show available columns and tables

One will not be able to recall all of the tables and their columns, and access to this handbook (with the references in the online version) may not always be possible. It is however possible to quickly create a query which provides the desired information. To receive a list of all available tables, submit the following query, and delete the duplicated lines in the output with sort. In the output the first four lines can be viewed as an example:

OMD[mysite]:~$ lq "GET columns\nColumns: table" | sort -u
columns
commands
comments
contactgroups

For a query of all columns in a table you must of course specify these. Substitute hosts with the desired table. Here as well the first four lines in the output can be viewed as an example:

OMD[mysite]:~$ lq "GET columns\nFilter: table = hosts\nColumns: name"
accept_passive_checks
acknowledged
acknowledgement_type
action_url

2.2. Using LQL in Python

Since Checkmk is based very heavily on Python, scripts in this language are practical. The following script can be used as a basis for an access to the Livestatus socket:

live_example.py
#!/usr/bin/env python
# Sample program for accessing Livestatus from Python

import json, os, socket

# for local site only: file path to socket
address = "%s/tmp/run/live" % os.getenv("OMD_ROOT")
# for local/remote sites: TCP address/port for Livestatus socket
# address = ("localhost", 6557)

# connect to Livestatus
family = socket.AF_INET if type(address) == tuple else socket.AF_UNIX
sock = socket.socket(family, socket.SOCK_STREAM)
sock.connect(address)

# send our request and let Livestatus know we're done
sock.sendall(str.encode("GET status\nOutputFormat: json\n"))
sock.shutdown(socket.SHUT_WR)

# receive the reply as a JSON string
chunks = []
while len(chunks) == 0 or chunks[-1] != "":
    data = sock.recv(4096)
    chunks.append(str(data.decode("utf-8")))
sock.close()
reply = "".join(chunks)

# print the parsed reply
print(json.loads(reply))

2.3. Using the Livestatus-API

Checkmk provides an API for the Python, Perl and C++ programming languages, which simplifies the access to Livestatus. An example code is available for each language which explains its use. The paths to these examples can be found in the chapter Files and directories.

3. Simple queries

3.1. Column queries (Columns)

In the examples we have seen so far, ALL information for ALL hosts has been queried. In practice however, one will probably only require specific columns. With the Columns header that has already been mentioned the output can be limited to this column. The individual column names will be separated by a simple blank character.

OMD[mysite]:~$ lq "GET hosts\nColumns: name address"
myserver123;192.168.0.42
myserver234;192.168.0.73

As can be seen, in a line the individual values are separated by a semicolon.

Important: If using these headers the header will be suppressed in the output. This can be re-inserted in the output with the ColumnHeaders header.

3.2. Setting a simple filter

To limit the query to specific lines, the columns can be filtered for specified contents. If only services with a specific status are to be searched for, this can be achieved with a filter:

OMD[mysite]:~$ lq "GET services\nColumns: host_name description state\nFilter: state = 2"
myserver123;Filesystem /;2
myserver234;ORA MYINST Processes;2

In the example all services with a CRIT status will be searched-for, and the host name, the service description and its status will be output. Such filters can of course be combined, and restricted to those services with a CRIT status, and which have not yet been acknowledged:

OMD[mysite]:~$ lq "GET services\nColumns: host_name description state\nFilter: state = 2\nFilter: acknowledged = 0"
myserver234;Filesystem /;2

As can be seen, one can also filter by columns which are not listed in Columns.

Operators and regular expressions

So far only only matching numbers have been filtered. The interim result from a query can also be searched for ‘less than’ with numbers, or for character strings. The available operators can found in the Operators chapter in the command reference. Thus you can, for example, filter for regular expressions in the columns:

OMD[mysite]:~$ lq "GET services\nColumns: host_name description state\nFilter: description ~~ exchange database|availability"
myserver123;Exchange Database myinst1;1
myserver123;Exchange Availability Service;0
myserver234;Exchange Database myinst3;0

With the right operator you can search the columns in various ways. The Livestatus will always interpret such an expression as ‘can appear anywhere in the column’, as long as it has not been otherwise defined. Indicate the start of a line with, for example, the ^ character, and the end of a line with the $ character. A comprehensive list of all special characters in Checkmk regular expressions can be found in the article covering Regular expressions.

4. Complex queries

4.1. Filters for lists

Some columns in a table return not just a single value, rather a whole list of them. So that such a list can be effectively searched, in these cases the operators have another function. A complete list of the operators can be found in Operators for lists. So for example, the operator >= has the function ‘contains’. With this you could, for example, search for specific contacts:

OMD[mysite]:~$ lq "GET hosts\nColumns: name address contacts\nFilter: contacts >= hhirsch"
myserver123;192.168.0.42;hhirsch,hhirsch,mfrisch
myserver234;192.168.0.73;hhirsch,wherrndorf

As can be seen in the above example, the contacts will be listed, separated by commas, in the contacts column. This allows them to be clearly distinguished as not being the start of another column. A special feature of the equality operator is that it checks whether a list is empty:

OMD[mysite]:~$ lq "GET hosts\nColumns: name contacts\nFilter: contacts ="
myserver345;
myserver456;

4.2. Combining filters

Several filters have earlier already been combined. It would seem to be intuitive that the data must pass through all filters in order to be shown. The filters will thus be linked by the logical operation and. To link particular filters with a logical or, at the end of the filter string code an or: followed by an integer. The counter specifies how many of the last lines may be combined with an or. In this way groups can be formed and combined as required. The following is a simple example. Here two filters are combined so that all services which have either the status WARN or UNKNOWN will be shown:

OMD[mysite]:~$ lq
GET services
Columns: host_name description state
Filter: state = 1
Filter: state = 3
Or: 2

myserver123;Log /var/log/messages;1
myserver123;Interface 3;1
myserver234;Bonding Interface SAN;3

OMD[mysite]:~$

The result from a combination can also be negated, or groups can in turn be combined into other groups. In the example, all services are shown whose status is not OK, and whose description either begins with Filesystem, or who have a status other than UNKNOWN:

OMD[mysite]:~$ lq
GET services
Columns: host_name description state
Filter: state = 3
Filter: description ~ Filesystem
And: 2
Filter: state = 0
Or: 2
Negate:

myserver123;Log /var/log/messages;1
myserver123;Interface 3;1
myserver234;Filesystem /media;2
myserver234;Filesystem /home;2

4.3. Specifying an output format

The output format can be specified in two ways. One method is to redefine the separators used in the standard output. The other method is to output conforming to Python or JSON formats.

Customizing csv

As already described, you can precisely customize the standard output format csv (lower case!) and define how the individual elements should be separated from each other. Checkmk recognizes four different separators for structuring the data. Following a colon, code an appropriate standard ASCII value so that the filter is structured as follows:

Separators: 10 59 44 124

These separators have the following functions:

  1. Separator for the datasets: 10 (line break)

  2. Separator for the columns in a data set: 59 (semicolon)

  3. Separator for the elements in a list: 44 (comma)

  4. Separator for the elements in a service list: 124 (vertical bar)

Each of these values can be selected to structure the output as desired. In the following example the individual columns in a data set have been separated with a tabulator (9) rather than a semicolon (59):

OMD[mysite]:~$ lq
GET services
Columns: host_name description state
Filter: description ~ Filesystem
Separators: 10 9 44 124

myserver123     Filesystem /opt     0
myserver123     Filesystem /var/some/path       1
myserver123     Filesystem /home        0

Important: The order of the separators is fixed and may not be altered.

Changing output formats

As well as producing outputs in csv, Livestatus can also output in other formats. These have the advantage of being easier and cleaner to parse in higher programming languages. Accordingly, the outputs may be coded in the following formats:

Format Description

python

Generates an output as a list compatible with 2.x. Text is formatted in Unicode.

python3

Likewise generates output as a list, and when doing so takes account of changes in the data type – for example, the automatic conversion of text to Unicode.

json

The output will like wise be generated as a list, but only a json-compatible format will be used.

CSV

Formats the output conforming to RFC-4180.

csv

See customizing csv. This is the standard format if no other is specified, and it is based on the official CSV Format.

Please do not confuse the CSV Format with the csv-output from Livestatus which is used if no output format has been specified. A correct coding of upper case/lower case is thus absolutely essential. For the customization, at the end specify OutputFormat instead of Separator:

OMD[mysite]:~$ lq
GET services
Columns: host_name description state
Filter: description ~ Filesystem
OutputFormat: json

[["myserver123","Filesystem /opt",0]
["myserver123","Filesystem /var/some/path",1]
["myserver123","Filesystem /home",0]]

5. Retrieving statistics (Stats)

5.1. Introduction

There will be situations in which you have no interest in the status of a single service or group of services. Far more important is the number of services with a current WARN status, or the number of monitored data bases. Livestatus is able to generate and output statistics with Stats.

5.2. Numbers

The Overview receives its data by retrieving statistics for hosts, services and events through Livestatus and displaying them in Checkmk’s interface. With direct access to Livestatus you can produce your own summary:

OMD[mysite]:~$ lq
GET services
Stats: state = 0
Stats: state = 1
Stats: state = 2
Stats: state = 3

34506;124;54;20

By the way, such statistics can be combined with all filters.

5.3. Grouping

Statistics can also be combined with and/or. The headers are then called StatsAnd or StatsOr. Use StatsNegate if the output should be reversed. In the example the total number of hosts will be output (the initial Stats), and in addition the output will include the count of hosts marked as stale and which are also not listed in a Downtime (Stats 2 and 3 are linked with a logical 'AND'):

OMD[mysite]:~$ lq
GET hosts
Stats: state >= 0
Stats: staleness >= 3
Stats: scheduled_downtime_depth = 0
StatsAnd: 2

734;23

Do not be confused by the various options for combining the results from filters and statistics. While all hosts meeting the conditions will be output using the Filter header, with statistics the output will be the sum of how often the Stats filter applies.

5.4. Minimum, maximum, average, etc.

It is also possible to perform calculations on values and, for example, output an average value or a maximum value. A complete list of all of the possible operators can be found here.

In the following example the output will list the average, minimum and maximum times a host’s check plug-ins require for calculating a status:

OMD[mysite]:~$ lq
GET services
Filter: host_name = myserver123
Stats: avg execution_time
Stats: max execution_time
Stats: min execution_time

0.0107628;0.452087;0.008593

Calculations with metrics are handled in a somewhat special way. Here as well, all of the Stats-header functions are available for use. These are however applied individually to all of a service’s metrics. As an example, in the following example the metrics from a host group’s CPU-usage will be added together:

OMD[mysite]:~$ lq
GET services
Filter: description ~ CPU utilization
Filter: host_groups >= cluster_a
Stats: sum perf_data

guest=0.000000 steal=0.000000 system=34.515000 user=98.209000 wait=23.008000

6. Limiting an output (Limit)

The number of lines in an output can be intentionally limited. This can be useful if, for example, you only wish to see if you can get any sort of response to a Livestatus query, but want to avoid getting a multi-page output:

OMD[mysite]:~$  lq "GET hosts\nColumns: name\nLimit: 3"
myserver123
myserver234
myserver345

Note that this limit also functions when it is combined with other headers. If for example, with Stat you count how many hosts have an UP status, and limit the output to 10, only the first 10 hosts will be taken into account.

7. Time limits (Timelimit)

Not only the count of lines to be output can be restricted – the maximum elapsed time that a query is permitted to run can also be limited. This option can prevent a Livestatus query blocking a connection forever if it gets hung up for some reason. The time restriction specifies a maximum time in seconds that a query is permitted to process:

OMD[mysite]:~$ lq "GET hosts\nTimelimit: 1"

8. Activating column headers (ColumnHeaders)

With ColumnHeaders the names of the columns can be added to the output. These are normally suppressed in order to simply further processing:

OMD[mysite]:~$  lq "GET hosts\nColumns name address groups\nColumnHeaders: on"
name;address;groups
myserver123;192.168.0.42;cluster_a,headnode
myserver234;192.168.0.43;cluster_a
myserver345;192.168.0.44;cluster_a

9. Authorizations (AuthUser)

If you want to make scripts available based on the Livestatus, the user should probably only see the data for which they are authorized. Checkmk provides the AuthUser header for this function, with the restriction that it may not be used in the following tables:

  • columns

  • commands

  • contacts

  • contactgroups

  • eventconsolerules

  • eventconsolestatus

  • status

  • timeperiods

Conversely, this header may be used in all tables that access the hosts or services tables. Which among these a user is authorized for depends on the user’s contact groups.

In this manner a query will only output data that the contact is also permitted to see. Note here the difference between strict and loose permission settings:

OMD[mysite]:~$ lq "GET services\nColumns: host_name description contacts\nAuthUser: hhirsch"
myserver123;Uptime;hhirsch
myserver123;TCP Connections;hhirsch
myserver123;CPU utilization;hhrisch,kkleber
myserver123;File /etc/resolv.conf;hhirsch
myserver123;Kernel Context Switches;hhrisch,kkleber
myserver123;File /etc/passwd;hhirsch
myserver123;Filesystem /home;hhirsch
myserver123;Kernel Major Page Faults;hhrisch
myserver123;Kernel Process Creations;hhirsch
myserver123;CPU load;hhrisch,kkleber

10. Time delays (Wait)

With the Wait-header you can create queries for specific data sets without needing to know whether the prerequisites for the data have been satisfied. This can be useful when, for example, you need comparison data for a specific error situation, but you don’t want to put a continuous, unnecessary load on the system. Information will therefore only be retrieved when it is really required.

A full list of the Wait-headers can be found here.

In following example the Disk IO SUMMARY service for an ESXi-Server will be output, as soon as the status of the CPU load service changes to a specific VM CRIT. With the WaitTimeout header the query will then be executed if the condition has not been satisfied after 10000 milliseconds. This prevents the Livestatus connection being blocked for a long time:

OMD[mysite]:~$ lq
GET services
WaitObject: myvmserver CPU load
WaitCondition: state = 2
WaitTrigger: state
WaitTimeout: 10000
Filter: host_name = myesxserver
Filter: description = Disk IO SUMMARY
Columns: host_name description plugin_output

myesxserver;Disk IO SUMMARY;OK - Read: 48.00 kB/s, Write: 454.54 MB/s, Latency: 1.00 ms

A further application is to combine this with a command. You can issue a command and retrieve the results as soon as they are available. In the following example we want to query and display the current data from a service. For this, first the command will be submitted, and then a query issued. This checks whether the data from the Check_MK service is newer than that at a particular point in time. As soon as the precondition has been satisfied the status of the Memory service will be output.

OMD[mysite]:~$ lq "COMMAND [$(date +%s)] SCHEDULE_FORCED_SVC_CHECK;myserver;Check_MK;$(date
+%s)"
OMD[mysite]:~$ lq
GET services
WaitObject: myserver Check_MK
WaitCondition: last_check >= 1517914646
WaitTrigger: check
Filter: host_name = myserver
Filter: description = Memory
Columns: host_name description state

myserver;Memory;0

Important: Note that the time stamp as used in last_check in the example MUST be substituted with an appropriate one – otherwise the condition will always be satisfied and the output will be produced immediately.

11. Time zones (Localtime)

Many monitoring environments query hosts and services on a global level. In such cases it can quickly develop into a situation of distributed monitoring instances working in different time zones. Since Checkmk utilizes Unix Time – which is independent of time zones – this should not be a problem.

Should a server nevertheless be assigned to an incorrect time zone, this difference can be compensated for with the Localtime header. Provide the current time to the query as well. Checkmk will then autonomously round up to the next half-hour, and adjust for the difference. You can provide the time automatically if you invoke the query directly:

OMD[mysite]:~$ lq "GET hosts\nColumns: name last_check\nFilter: name = myserver123\nLocaltime: $(date +%s)"
myserver123;1511173526

Otherwise provide the result from date +%s if you want to use the input stream:

OMD[mysite]:~$ lq
GET hosts
Columns: name last_check
Filter: name = myserver123
Localtime: 1511173390

myserver123;Memory;1511173526

12. Status codes (ResponseHeader)

If you write an API you will probably want to receive a status code as a response, so that you can process the output better. The ResponseHeader header supports the off (Standard) and fixed16 values, and with these provides a status message exactly 16 Bytes long in the first line of the response. In the case of an error, the subsequent lines will contain a comprehensive description of the error code. These are thus also very useful for looking for the error in the query’s results.

The status report in the first line combines the following:

  • Bytes 1-3: The status code. The complete table of possible codes can be found here.

  • Byte 4: A simple blank character (ASCII-character: 32)

  • Bytes 5-15: The length of the actual response as an integer. Unnecessary bytes are filled by blank characters.

  • Byte 16: A line feed (ASCII-character: 10)

In the following example we will execute a faulty query in which a filter is in fact erroneously coded with a column name.

OMD[mysite]:~$ lq "GET hosts\nName: myserver123\nResponseHeader: fixed16"
400          33
Columns: undefined request header

Important: In an error situation the output format is always an error message in text form. This applies regardless of any adaptations you may have made.

13. Keeping a connection alive (KeepAlive)

Particularly with scripts which establish a Livestatus connection over the network, you may possibly want to keep the channel open to save the overhead generated when repeatedly establishing the connection. You can achieve this with the KeepAlive header, and in this way are able to reserve a channel. By the way — following a command a Livestatus connection always stays open. No additional header needs to be input for this.

Important: Because the channel is blocked to other processes for the duration of the connection, it can become a problem if no other connections are available for use. Other processes must therefore wait until a connection is free. In the standard configuration Checkmk holds 20 connections ready — raise the maximum number of these connections as necessary with Setup > General > Global Settings > Monitoring Core > Maximum concurrent Livestatus connections.

Always combine KeepAlive with the Response header, in order to be able to correctly distinguish the individual answers from each other:

OMD[mysite]:~$ lq
GET hosts
ResponseHeader: fixed16
Columns: name
KeepAlive: on

200          33
myserver123
myserver234
myserver345
GET services
ResponseHeader: fixed16
Columns: host_name description last_check
Filter: description = Memory

200          58
myserver123;Memory;1511261122
myserver234;Memory;1511261183

Make sure that there is no empty line between the first answer and the second request. As soon as a header is omitted from a query, following the next output the connection will closed as usual by the blank line.

14. Log retrieval

14.1. Overview

With the table log in Livestatus you have a direct access to the core’s monitoring history, so that using the LQL you can conveniently filter for particular events. The availability tables, for example, will be generated with the help of these tables. In order to enhance the overview and to restrict a query thematically, you have access to the following log classes:

Class Description

0

All messages not covered by other classes

1

Host and service alerts

2

Important program events

3

Notifications

4

Passive Checks

5

External commands

6

Initial or current status entries (e.g., after a log rotation)

7

Changes in the program’s status

Just by using these log classes you can already restrict which type of entry should be shown very well. The time range taken into account in the query will additionally be restricted. This is important since otherwise the instance’s complete history will be searched – which could logically apply a strong brake on the system due to the flood of information.

A further sensible restriction of the output are the (Columns) which are to be shown for an entry. In example below we will search for all notifications that have been logged in the last hour:

OMD[mysite]:~$ lq "GET log\nFilter: class = 3\nFilter: time >= $$(date +%s)-3600\nColumns: host_name service_description time state"
myserver123;Memory;1511343365;0
myserver234;CPU load;1511343360;3
myserver123;Memory;1511343338;2
myserver234;CPU load;1511342512;0

Important: Ensure that in the entry stream’s interactive mode none the of variables as used in the example can be used, and always restrict the queries to a time range.

14.2. Configuring the monitoring history

It is possible to influence the rotation of the files, and their maximum sizes. You can additionally specify how many lines of a file should be read in before Checkmk interrupts. All of this can affect the performance of your queries, depending on the instance’s construction. The following three parameters are available which can be found and customized in Setup > General > Global Settings > Monitoring Core:

Name Description

History log rotation: Regular interval of rotations

Here it can be defined within which time range the history should be continued in a new file.

History log rotation: Rotate by size (Limit of the size)

Independently of the time range, here the maximum size of a file is defined. The size represents a compromise between the possible read rate and the possible IOs.

Maximum number of parsed lines per log file

When the specified number of lines have been read in, reading of the file will stop. This avoids time-outs if for any reason a file becomes very large.

15. Checking availability

With the statehist table you can query the raw data on the availability of hosts and services, and therefore have access to all of the information as used by the interface’s availability display. Always enter a time range, otherwise all available logs will be searched, which can put a heavy load on the system. The following additional specifics also apply:

  • The time range in which a host/service had a particular status can be output as an absolute as well as a Unix-Time, and also as a relative and as a percentage proportion of the queried time range.

  • During times in which a host/service was not monitored the status will be -1.

Checking whether, when and for how long a host/service has been monitored is made possible in Checkmk through the logging of the initial status. Thus you can not only see which status existed at a specific time, but you can also retrace whether it was actually being monitored at that point in time. Important: This logging is also active with a Nagios-Core. Here it can be deactivated however:

~/etc/nagios/nagios.d/logging.cfg
log_initial_states=0

In the example below it can be seen how the query of a percentage allocation, and the absolute times for a particular status look. The last 24 hours have been specified as the time range, and the query restricted to the availability of a service on a particular host:

OMD[mysite]:~$ lq
GET statehist
Columns: host_name service_description
Filter: time >= 1511421739
Filter: time < 1511436139
Filter: host_name = myserver123
Filter: service_description = Memory
Stats: sum duration_ok
Stats: sum duration_warning
Stats: sum duration_critical
Stats: sum duration_part_ok
Stats: sum duration_part_warning
Stats: sum duration_part_critical

myserver123;Memory;893;0;9299;0.0620139;0;0.645764

How a complete list of the available columns can be retrieved is explained in more detail in the Command reference.

16. Variables in Livestatus

At various locations in the Checkmk-interface you can use variables to make context-based assignments. Some of this data is also retrievable over the Livestatus. Because these variables must be also be resolved, the availabilities of these columns are duplicated in a table – once as a literal entry, and once in which the variable has been substituted with the appropriate value. An example of such is the notes_url column which outputs a URL with the variable:

OMD[mysite]:~$ lq "GET hosts\nColumns: name notes_url"
myserver123;https://mymonitoring/heute/wiki/doku.php?id=hosts:$HOSTNAME$

If however, instead of this you query the note_url_expanded column, you will receive the macro’s actual value:

OMD[mysite]:~$ lq "GET hosts\nColumns: name notes_url_expanded"
myserver123;https://mymonitoring/heute/wiki/doku.php?id=hosts:myserver123

17. Using Livestatus via a network

17.1. Connections via TCP/IP

To access Livestatus via the network, you can connect the Unix socket of the live status process to a TCP port. This way you can execute scripts on remote machines and collect the data directly from where they should be processed.

When a site is turned off, access via TCP can be enabled with the omd command:

OMD[mysite]:~$ omd config set LIVESTATUS_TCP on

Once the site has been started, Livestatus via TCP is usually active on the default port 6557. For Checkmk servers with multiple sites using Livestatus via TCP, the next higher unused port is chosen.

All settings such as port and authorized IP addresses can be configured via omd config. Alternatively these settings can be made in the setup. In Checkmk the SSL encryption of Livestatus communication is enabled by default:

Livestatus configuration in Setup.
Livestatus configuration in Setup

Local SSL connection test

Livestatus uses a certificate that is automatically generated when the site is created. This certificate is located in the var/ssl/ca-certificates.crt file together with all other CA certificates trusted by the site. In order for the command line tool openssl s_client to be able to validate the certificate used by the Livestatus server, this file must be designated as Certificate Authority File.

We have massively shortened the output from the command call here, […​] shows the omissions:

OMD[mysite]:~$ openssl s_client -CAfile var/ssl/ca-certificates.crt -connect localhost:6557
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 CN = Site 'mysite' local CA
verify return:1
depth=0 CN = mysite
verify return:1
---
Certificate chain
 0 s:CN = mysite
   i:CN = Site 'mysite' local CA
 1 s:CN = Site 'mysite' local CA
   i:CN = Site 'mysite' local CA
---
Server certificate
[...]
    Start Time: 1664965470
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: no
    Max Early Data: 0
---
read R BLOCK

As soon as there is no further output, you can issue LQL commands interactively, and terminate the interaction with an empty line (press the return key twice). If this works, you can also pipe Livestatus queries, and use the additional -quiet parameter to suppress debugging output:

OMD[mysite]:~$ echo -e "GET hosts\nColumns: name\n\n" | \
    openssl s_client -quiet  -CAfile var/ssl/ca-certificates.crt -connect localhost:6557
Can't use SSL_get_servername
depth=1 CN = Site 'mysite' local CA
verify return:1
depth=0 CN = mysite
verify return:1
myserver23
myserver42
myserver123
myserver124

The output preceding the four host names is written to STDERR by the openssl command. It can be suppressed by appending 2>/dev/null.

Remote access to Livestatus

If you access Livestatus from remote machines, you should not use the entire list of certificates trusted by the Checkmk site on those machines. Instead, read the site CA’s certificate from the setup alone.

To do this, go to Global Settings > Site management > Trusted certificate authorities for SSL. Here you can copy and paste the certificate used by the site CA. Copy the complete text of the first certificate under Content of CRT/PEM file into a file — in our example we use /tmp/mysite_ca.pem.

Display of the site CA's certificate in the Setup.
Display of the site CA’s certificate in the Setup

If the remote host has now been enabled for Livestatus access, Livestatus queries via script will be possible with this certificate file:

user@host:~$ echo -e "GET hosts\nColumns: name\n\n" | \
    openssl s_client -quiet  -CAfile /tmp/mysite_ca.pem -connect cmkserver:6557

Note: The certificate file does not provide authentication, it only ensures transport encryption! Access protection is regulated exclusively via the IP addresses that are authorized to access the Livestatus port.

Livestatus with stunnel

In case you want to make the encrypted remote Livestatus port available as local unencrypted port, you can use the program stunnel.

/etc/stunnel/cmk_myremotesite.conf
[pinning client]
client = yes
accept = 0.0.0.0:6557
connect = <myremotesiteip>:6557
verifyPeer = yes
CAfile = /etc/stunnel/myremotesite.pem

After restart of stunnel, unencrypted access to the local port is possible.

user@host:~$ echo -e "GET hosts\nColumns: name\n\n" | nc localhost 6557

SSL in scripts

If you want to use scripts to access Livestatus via SSL, avoid using openssl s_client. The primary purpose of this tool is to test connection establishing and to debug certificate chains. To see if the expected output is complete in the event of connection failures, we recommend evaluating the response header. A well-maintained API that supports SSL and header evaluation is the one for Python, which can be found at share/doc/check_mk/livestatus/api/python. Other suitable APIs are listed in the chapter covering Files and Directories.

17.2. Connections via SSH

If access to Livestatus from outside your local network is required, access protection based on IP addresses alone may not be practical. The easiest way to gain authenticated access here is to use the Secure Shell.

With SSH, you have the ability to pass a command that will be executed on the remote server:

user@host:~$ ssh mysite@myserver 'lq "GET hosts\nColumns: name"'
myserver123
myserver234

Alternatively, you can forward the Livestatus port to the host you are currently working on via an SSH tunnel:

user@host:~$ ssh -L 6557:localhost:6557 mysite@myserver

If the connection has been established, in a second console session you can test whether access with openssl s_client is possible:

user@host:~$ openssl s_client -CAfile /tmp/mysite_ca.pem -connect localhost:6557

If this test is successful, any script you have written for direct Livestatus network access can be used on localhost.

18. Setting commands

18.1. Overview

Livestatus can not only be used for data queries, but also for issuing commands directly to the core (CMC or Nagios). A correct command always includes a time stamp – this can in fact be anything required. Because it will additionally be used in the Logs to track the time of the processing however, it is sensible to enter the time as precisely as possible. Commands with a missing time stamp will be discarded, without issuing an error message, and with only a simple entry in the cmc.log!

So that the time stamp can be as precise as possible, it is recommended to not set the command in the input stream, but rather to issue it directly. In such a situation there is also access to variables and the actual current time can be provided:

OMD[mysite]:~$ lq "COMMAND [$(date +%s)] DISABLE_NOTIFICATIONS"

This format works with both the Nagios-Core in the CRE Checkmk Raw Edition and with the CMC in the commercial editions. In the two cores the commands only partly-overlap however. A complete list of the commands for the Nagios-Core can be found directly on the Nagios website. The commands available for the CMC can be found in the Command reference.

18.2. Special features in Nagios

CRE In the list of the commands the syntax is in the following form:

#!/bin/sh
# This is a sample shell script showing how you can submit the CHANGE_CUSTOM_HOST_VAR command
# to Nagios.  Adjust variables to fit your environment as necessary.

now=`date +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'

/bin/printf "[%lu] CHANGE_CUSTOM_HOST_VAR;host1;_SOMEVAR;SOMEVALUE\n" $now > $commandfile

As you have learned, Checkmk uses a much simpler format for issuing commands. To make the Nagios format compatible with Checkmk, you simply need the command, the time stamp, and where applicable, the variables:

OMD[mysite]:~$ lq "COMMAND [$(date +%s)] CHANGE_CUSTOM_HOST_VAR;host1;_SOMEVAR;SOMEVALUE"

19. Files and directories

Path Function

~/tmp/run/live

The Unix-Socket through which queries and commands are submitted.

~/bin/lq

Script command for simplifying issuing of queries and commands to the Unix-Socket in the Livestatus.

~/var/log/cmc.log

The CMC’s log file, in which along with other data the queries/commands are documented.

~/var/check_mk/core/history

The CMC’s log file, in which all changes occurring during the core’s running time are entered – e.g., changes in the state of a host/service.

~/var/check_mk/core/archive/

The history-log files are archived here. These will only be read if required.

~/var/log/nagios.log

The Nagios-Core’s log file, in which along with other data the queries/commands are documented.

~/var/nagios/archive/

The history-log files are archived here. These will only be read if required.

~/share/doc/check_mk/livestatus/LQL-examples/

In this directory a number of examples of Livestatus queries can be found which you can try out. The examples are based on the lq script command – e.g.: lq < 1.lql

~/share/doc/check_mk/livestatus/api/python

the API for Python is in this directory, as well as a number of examples. Also read the README in this directory.

~/share/doc/check_mk/livestatus/api/perl

The API for Perl can be found here. Here as well there is a README. The usage examples are located in the examples sub-directory.

~/share/doc/check_mk/livestatus/api/c++

There are also example codes for the C++ programming language. The code for the API itself is likewise located in an uncompiled form here, so that you have the best insight into the API’s functionality.

On this page