Simplemonitor

A Python-based network and host monitor

The types of monitor available are:

All monitor types share the following configuration options:

setting description required default
type One of the types from the above list yes  
runon A hostname (as returned from Python’s socket.gethostname()) on which the monitor should run. All other hosts will ignore this monitor completely. If unset (default) all hosts will run the monitor. no  
depend A comma-separated list of other monitors on which this one depends. If one of the dependencies fails (or is skipped), this monitor will also skip. A skip does not trigger an alert. no  
tolerance The number of times a monitor can fail before it’s actually considered failed (and generates an alert). Handy for things which intermittently fail to poll (the host monitor is guilty of this). This also interacts with the limit option on alerters. no 1 (i.e. on first failure)
urgent If this monitor is urgent or not. Non-urgent monitors cannot trigger urgent alerters (e.g. the SMS alerter). Set to 0 to make a monitor non-urgent. no 1
gap The number of seconds gap between polls for this monitor. Setting this lower than the global interval will have no effect. Use it to make a monitor poll only once an hour, for example. no 0
remote_alert This monitor wants a remote host to handle alerting instead of the local host. Set to 1 to enable. This is a good candidate for putting in defaults if you want to use remote alerting for all your monitors. no 0
recover_command A command to execute once when this monitor fails. It could, for example, restart a service if an HTTP check fails. no  
recovered_command A command to execute once when this monitor succeeds the first time after being failed. no  
group The group the monitor belongs to. Alerters and Loggers will only fire for monitors which appear in their groups. no default
notify If the monitor should alert at all no 1
failure_doc Information to include in alerts on failure (e.g. a URL to a runbook) no  
gps comma-separated latitude, longitude of this morning, for the HTML logger’s map no  

apcupsd

Uses (an existing and correctly configured) apcupsd to check that a UPS is not running from batteries or having some other problem. Multiplatform.

setting description required default
path

The path to the apcaccess binary. You should only need to specify this if you’ve installed apcupsd somewhere exotic.

no

UNIX: $PATH; Windows: C:\apcupsd\bin

arlo_camera

Check Arlo camera battery level

setting description required default
username

Arlo username

yes
password

Arlo password

yes
device_name

Camera device name (e.g. “Front”)

yes
base_station_id

The number of your base station; only required if you have more than one. It’s an array index, but figuring out which one is which is an exercise left the reader

no

0

command

Run a command and optionally verify the output. If the command exits non-zero, the monitor fails.

setting description required default
command

The command (and params) to execute

yes
result_regexp

A regular expression against which the output of the command is matched.

no
result_max

A maximum value for the command to output (on stdout)

no

compound

Combine (logical-and) multiple failures of other monitors for emergency escalation

setting description required default
monitors

A comma-separated list of other monitors

yes
min_fail

Number of monitors which should fail for this monitor to fail too

no

all

diskspace

Checks the free space on a partition is above a given limit. Multiplatform.

setting description required default
partition

The partition to check for space on. On Windows, this is the drive letter (e.g. C:). On non-Windows, this is the mount point (e.g. /usr).

yes
limit

The minimum amount of free space. Give a number in bytes, or suffix K, M or G for kilobytes, megabytes or gigabytes. Required, no default.

yes

dns

Attempts to resolve a DNS record, and optionally checks the result. Requires the DNS utility dig to be in the $PATH.

setting description required default
record

The DNS name to resolve.

yes
record_type

The type of the record.

no

A

desired_val

The expected value for the record to resolve to. For results with newlines (e.g. MX records), you should format them like:

desired_val: 10 a.mx.domain.com
  20 b.mx.domain.com
  30 c.mx.domain.com

Note the leading spaces on the continuation lines.

no
server

The server to send the request to. If absent, the system default is used.

no

fail

This monitor fails 5 times in a row and then succeeds once. Use for testing. Multiplatform.

This monitor has no additional parameters.

filestat

Examine size and age of a file

setting description required default
filename

The path to the file to monitor

yes
maxage

Maximum allowed age of the file in seconds

no

None; age is ignored

minsize

Minimum allowed size of the file in bytes; can be expressed using “KB” etc suffixes

no

None; size is ignored

hass_sensor

Monitor the existence of a home automation sensor

setting description required default
url

The API URL for the monitor

yes
sensor

Name of the sensor

yes
token

API token for the sensor

yes

host

Pings a host (once per iteration) to see if it’s available. Multiplatform, but can break on non-English ping output without additional config. See also the “ping” monitor.

setting description required default
host

The hostname to ping.

yes
ping_regexp

The regexp which matches a successful ping line. You may need to set this if your ping output is not in English

no

auto

time_regexp

The regexp which matches the ping time in the output. Must set a match group named “ms”. you may need to set this if your ping output is not in English.

no

auto

http

Attempts to fetch a URL and makes sure the HTTP return code is 200 OK. Can also look through the content of the page trying to match a regular expression. Multiplatform.

setting description required default
url

The URL to open.

yes
regexp

The regexp to look for in the page (only if the page loads with status 200 OK). If the regexp does not match, the monitor reports a failure. See Python’s re module for syntax.

no

none

allowed_codes

A list of HTTP codes which are acceptable in addition to 200 OK

no
verify_hostname

If set to false, no SSL hostname verification will be made. Use with the https protocol and self-signed certificates.

no

True

timeout

The timeout for the HTTP request to complete

no

5

headers

JSON map of HTTP header names and values to add to the request

no

loadavg

Check the load average on the host.

setting description required default
which

The load average to monitor: 0 = 1min, 1 = 5min, 2 = 15min

no

1

max

The maximum acceptable value for the given load average.

no

1.00

memory

Check free memory percentage

setting description required default
percent_free

The minimum percent of available (as per psutils’ definition) memory

yes

null

Monitor which always passes. Use for testing.

This monitor has no additional parameters.

ping

Pings a host to make sure it’s up. Uses a Python ping module instead of calling out to an external app, but needs to be run as root.

setting description required default
host

The host/IP to ping.

yes
timeout

The timeout for the ping in seconds

no

5

pkgaudit

Fails if pkg audit reports any vulnerable packages installed.

setting description required default
path

The path to the package binary.

no

/usr/local/sbin/pkg

portaudit

Fails if portaudit reports any vulnerable ports installed.

setting description required default
path

The path for for the portaudit binary.

no

/usr/local/sbin/portaudit

process

Check for a running process

setting description required default
process_name

The process name to check for

min_count

The minimum number of processes to require

no

1

max_count

The maximum number of processes allowed

no

infinity

username

Limit matches to processes owned by this username

no

blank (any user)

rc

Checks a FreeBSD-style service is running, by running its rc script (in /usr/local/etc/rc.d) with the status command. May work for other types of rc.d/init.d system. Not for Windows.

setting description required default
service

The name of the service to check. This is the name of the rc.d script in /usr/local/etc/rc.d/. Any trailing “.sh” is optional and will be added if needed.

yes
path

The path of the folder containing the rc script

no

/usr/local/etc/rc.d

return_code

The integer return code required from the script

no

0

ring

Check battery level of Ring Doorbell

setting description required default
device_name

The name of the Ring Doorbell to monitor

yes
minimum_battery

The minimum battery percent allowed

no

25

username

Your Ring username (e.g. email address). Accounts using MFA are not supported. You can create a separate account for API access.

yes
password

Your Ring password

yes

service

Checks a Windows service to make sure it’s running. Windows only.

setting description required default
service

The short name of the service to monitor. This is the “Service name” on the General tab of the service properties (in the Services MMC snap-in).

yes
host

The hostname to check the service on.

no

localhost

svc

Checks a supervise service is running. Not for Windows.

setting description required default
path

The path to the service’s directory (e.g. /var/service/something).

yes

systemd-unit

Monitor a systemd unit status

setting description required default
name

The name of the unit to monitor

yes
load_states

Comma-separated list of desired load states for the unit

no

loaded

active_states

Comma-separated list of desired active states for the unit

no

active, reloading

sub_states

Comma-separates list of desired sub states for the unit

no

tcp

Checks that a TCP port is open. Doesn’t care what happens after the connection is opened. Multiplatform.

setting description required default
host

The name of the host to connect to.

yes
port

The port to connect to. Integer only (no service names).

yes

unifi_failover

Checks a Unifi Security Gateway for failover WAN status. (The USG must be in your known_hosts file.)

setting description required default
router_address

The address of the USG

yes
router_username

The username to log in as

yes
router_password

The password to log in with (if not using ssh key)

no
ssh_key

The SSH private key to log in with (if not using password)

no
check_interface

The name of the failover interface to check

no

eth2

unifi_watchdog

Checks a Unifi Security Gateway to make sure the WAN failover is healthy (The USG must be in your known_host file.)

setting description required default
router_address

The address of the USG

yes
router_username

The username to log in as

yes
router_password

The password to log in with (if not using ssh key)

no
ssh_key

The SSH private key to log in with (if not using password)

no
primary_interface

The name of the primary interface

no

pppoe0

secondary_interface

The name of the secondary interface

no

eth2

unix_service

Check a generic unix service with the “service” command

setting description required default
service

Name of the service to check

yes
state

The state the service should be in; either running (command exits 0) or stopped (command exits 1)

no

running

Fork me on GitHub