Monitor your backups of borgbackup using a local checkmk plugin which reports the last time a backup was created
Overview in checkmk about the Borg repositories(Photo by Thomas Bella)
Just recently I deleted a folder in Syncthing which I did not want to delete. Since I had set up an hourly incremental backup of Syncthing some time ago, I wanted to use it to restore the deleted data.
Unfortunately I had to notice that the automatic backup did not work anymore since more than half a year because the SSH connection to the backup server was blocked by a firewall change.
This then led me to include my backup server (as described in the article Backups using Borgbackup) in my checkmk monitoring with additional monitoring. So I have a permanent monitoring with a warning after 28 hours without backup and a critical message after 3 days without backup.
Monitoring of the backups is set up directly on the backup server via checkmk local check. Every 30 minutes a bash script is called by checkmk, which reports the current status of the configured repositories to checkmk.
The script is located in /usr/lib/check_mk_agent/local/1800/borg and contains the following content. Please note to replace the path /mnt/borgbackup/ with the path to your backup repositories.
#!/bin/bash
# checkmk Borgbackup check# Author: Thomas Bella# Source: https://blog.bella.network/monitor-borgbackup-with-checkmk/# Version 1.0set -o nounset
exportPATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
STATE_OK=0STATE_WARNING=1STATE_CRITICAL=2STATE_UNKNOWN=3debug(){(["${verbose}" -gt 1]&&echo"$*")||return 0;}verbose(){(["${verbose}" -gt 0]&&echo"$*")||return 0;}error(){echo"UNKN - $*";exit"${STATE_UNKNOWN}";}# define warning and critical states of backup agecrit='3 days ago'warn='28 hours ago'verbose=0: "${BORG:=borg}"command -v "${BORG}" >/dev/null 2>/dev/null \
|| error "No command '${BORG}' available.": "${DATE:=date}"command -v "${DATE}" >/dev/null 2>/dev/null \
|| error "No command '${DATE}' available."# convert values to seconds to enable comparisonsec_warn="$(${DATE} --date="${warn}"'+%s')"sec_crit="$(${DATE} --date="${crit}"'+%s')"# check warning and critical valuesif[${sec_crit} -gt ${sec_warn}];then error "Warning value has to be a more recent timepoint than critical."fifor ENV in $(ls -d /mnt/borgbackup/*.env)dosource"${ENV}"# get unixtime of last backupexport BORG_PASSPHRASE BORG_REPO
last="$(${BORG} list --sort timestamp --last 1 --format '{time}')"["$?"=0]|| error "Cannot list repository archives. Repo Locked?"size="$(du -sm ${BORG_REPO}| awk '{ print $1 }')"num="$(${BORG} list | wc -l)"if[ -z "${last}"];thenecho"CRITICAL - no archive in repository"exit"${STATE_CRITICAL}"fisec_last="$(${DATE} --date="${last}"'+%s')"# interpret the amount of failsif["${sec_crit}" -gt "${sec_last}"];thenstate="${STATE_CRITICAL}"elif["${sec_warn}" -gt "${sec_last}"];thenstate="${STATE_WARNING}"elsestate="${STATE_OK}"fiecho"${state} \"BORG: ${NAME}\" size=${size}|number=${num};5:;3: last backup made on ${last}"unset BORG_PASSPHRASE
done
The above script scans the target directory /mnt/borgbackup/ for files with the extension .env. These files contain the environment variables NAME, BORG_REPO and BORG_PASSPHRASE which are used to access the backup repository. Every repository is then checked for the last backup and the result is reported to checkmk as a single line result like:
This active monitoring of the backups gives me an overview of the number of backups, memory usage and whether they are up-to-date within checkmk. If a backup is no longer up to date, I receive a notification and can check if there are problems. In addition, the script gives me a convenient overview of the number of recent backups and disk usage.
PS: I was able to restore my deleted data from Syncthing afterwards by restoring a VM backup. The only disadvantage here was that it was not an hourly backup and I had to boot the VM separately which took a bit more time.