bella.network Blog

Monitor borgbackup with checkmk local check

Monitor your backups of borgbackup using a local checkmk plugin which reports the last time a backup was created
Overview in checkmk about the Borg repositories
Overview in checkmk about the Borg repositories (Photo by Thomas Bella)

Just recently I deleted a folder in Syncthing which I did not want to delete. Since I had set up an hourly incremental backup of Syncthing some time ago, I wanted to use it to restore the deleted data.

Unfortunately I had to notice that the automatic backup did not work anymore since more than half a year because the SSH connection to the backup server was blocked by a firewall change.

This then led me to include my backup server (as described in the article Backups using Borgbackup) in my checkmk monitoring with additional monitoring. So I have a permanent monitoring with a warning after 28 hours without backup and a critical message after 3 days without backup.

Monitoring of the backups is set up directly on the backup server via checkmk local check. Every 30 minutes a bash script is called by checkmk, which reports the current status of the configured repositories to checkmk.

The script is located in /usr/lib/check_mk_agent/local/1800/borg and contains the following content. Please note to replace the path /mnt/borgbackup/ with the path to your backup repositories.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#!/bin/bash
# checkmk Borgbackup check
# Author: Thomas Bella
# Source: https://blog.bella.network/monitor-borgbackup-with-checkmk/
# Version 1.0

set -o nounset

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3

debug(){   ([ "${verbose}" -gt 1 ] && echo "$*") || return 0; }
verbose(){ ([ "${verbose}" -gt 0 ] && echo "$*") || return 0; }
error(){   echo "UNKN - $*"; exit "${STATE_UNKNOWN}"; }

# define warning and critical states of backup age
crit='3 days ago'
warn='28 hours ago'
verbose=0

: "${BORG:=borg}"
command -v "${BORG}" >/dev/null 2>/dev/null \
	|| error "No command '${BORG}' available."

: "${DATE:=date}"
command -v "${DATE}" >/dev/null 2>/dev/null \
	|| error "No command '${DATE}' available."

# convert values to seconds to enable comparison
sec_warn="$(${DATE} --date="${warn}" '+%s')"
sec_crit="$(${DATE} --date="${crit}" '+%s')"

# check warning and critical values
if [ ${sec_crit} -gt ${sec_warn} ] ; then
	error "Warning value has to be a more recent timepoint than critical."
fi


for ENV in $(ls -d /mnt/borgbackup/*.env)
do

	source "${ENV}"

	# get unixtime of last backup
	export BORG_PASSPHRASE BORG_REPO
	last="$(${BORG} list --sort timestamp --last 1 --format '{time}')"
	[ "$?" = 0 ] || error "Cannot list repository archives. Repo Locked?"

	size="$(du -sm ${BORG_REPO} | awk '{ print $1 }')"
	num="$(${BORG} list | wc -l)"

	if [ -z "${last}" ]; then
		echo "CRITICAL - no archive in repository"
		exit "${STATE_CRITICAL}"
	fi

	sec_last="$(${DATE} --date="${last}" '+%s')"

	# interpret the amount of fails
	if [ "${sec_crit}" -gt "${sec_last}" ]; then
		state="${STATE_CRITICAL}"
	elif [ "${sec_warn}" -gt "${sec_last}" ]; then
		state="${STATE_WARNING}"
	else
		state="${STATE_OK}"
	fi

	echo "${state} \"BORG: ${NAME}\" size=${size}|number=${num};5:;3: last backup made on ${last}"

	unset BORG_PASSPHRASE

done

The above script scans the target directory /mnt/borgbackup/ for files with the extension .env. These files contain the environment variables NAME, BORG_REPO and BORG_PASSPHRASE which are used to access the backup repository. Every repository is then checked for the last backup and the result is reported to checkmk as a single line result like:

1
0 "BORG: homecontrol.bella.pm" size=25630|number=24;5:;3: last backup made on Sun, 2023-01-01 06:26:24

Such an .env file contains the following content:

1
2
3
NAME='homecontrol.bella.pm'
BORG_REPO='/mnt/borgbackup/homecontrol.bella.pm'
BORG_PASSPHRASE='mysecretpassword'

This active monitoring of the backups gives me an overview of the number of backups, memory usage and whether they are up-to-date within checkmk. If a backup is no longer up to date, I receive a notification and can check if there are problems. In addition, the script gives me a convenient overview of the number of recent backups and disk usage.

borgbackup oberview of last backups

PS: I was able to restore my deleted data from Syncthing afterwards by restoring a VM backup. The only disadvantage here was that it was not an hourly backup and I had to boot the VM separately which took a bit more time.