Synopsis

In more than one occasion, it has happened that I receive a message from a colleague or client along the lines of:

Hi Helder, the website is saying '502 bad gateway' - would you mind taking a look?

Out comes a frustration sigh, and there goes my brain on a race to try and anticipate what could've gone wrong. The joke is on me, for neglecting monitoring and just blindly deploying servers - Lightsail instances in this case.

So, clearly, the answer is, setup your dang monitoring Man!

For this application we're using Monit, developed by Tildeslash Ltd, latest stable release 5.26.0 released on my 25th birthday on July 26th 2019. Right, onto the application.

Application

We're going to start by installing the Monit package. If you're using a debian family OS you can probably run:

sudo apt install monit

I like my servers BSD so I'm going with:

pkg install monit

After a successful installation we want to enable Monit in our rc.conf, again, if you're running with Debian or similar you might want to use:

sudo systemctl enable monit

On FreeBSD I'm running:

sysrc monit_enable="YES"

That was one of the last Debian references, I promise.

Following that, we have to configure Monit using a configuration file. You'll find a file called "monitrc.sample" under "/usr/local/etc/". Within that directory, move or copy the sample as "monitrc" and edit it to your taste, I'll post mine below. It is also handy to create a directory in "/usr/local/etc" called "monit.d/" to include what you can call configuration fragments (configurations for multiple websites for example).

My monitrc (last line is relevant to the point above):

###############################################################################
## Monit control file
###############################################################################
##
## Comments begin with a '#' and extend through the end of the line. Keywords
## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'.
##
## Below you will find examples of some frequently used statements. For
## information about the control file and a complete list of statements and
## options, please have a look in the Monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start Monit in the background (run as a daemon):
#
set daemon  30              # check services at 30 seconds intervals
#   with start delay 240    # optional: delay the first check by 4-minutes (by
#                           # default Monit check immediately after Monit start)
#
#
## Set syslog logging. If you want to log to a standalone log file instead,
## specify the full path to the log file
#
set log syslog

#
#
## Set the location of the Monit lock file which stores the process id of the
## running Monit instance. By default this file is stored in $HOME/.monit.pid
#
# set pidfile /var/run/monit.pid
#
## Set the location of the Monit id file which stores the unique id for the
## Monit instance. The id is generated and stored on first Monit start. By
## default the file is placed in $HOME/.monit.id.
#
# set idfile /var/.monit.id
#
## Set the location of the Monit state file which saves monitoring states
## on each cycle. By default the file is placed in $HOME/.monit.state. If
## the state file is stored on a persistent filesystem, Monit will recover
## the monitoring state across reboots. If it is on temporary filesystem, the
## state will be lost on reboot which may be convenient in some situations.
#
# set statefile /var/.monit.state
#
#

## Set limits for various tests. The following example shows the default values:
##
# set limits {
#     programOutput:     512 B,      # check program's output truncate limit
#     sendExpectBuffer:  256 B,      # limit for send/expect protocol test
#     fileContentBuffer: 512 B,      # limit for file content test
#     httpContentBuffer: 1 MB,       # limit for HTTP content test
#     networkTimeout:    5 seconds   # timeout for network I/O
#     programTimeout:    300 seconds # timeout for check program
#     stopTimeout:       30 seconds  # timeout for service stop
#     startTimeout:      30 seconds  # timeout for service start
#     restartTimeout:    30 seconds  # timeout for service restart
# }

## Set global SSL options (just most common options showed, see manual for
## full list).
#
# set ssl {
#     verify     : enable, # verify SSL certificates (disabled by default but STRONGLY RECOMMENDED)
#     selfsigned : allow   # allow self signed SSL certificates (reject by default)
# }
#
#
## Set the list of mail servers for alert delivery. Multiple servers may be
## specified using a comma separator. If the first mail server fails, Monit
# will use the second mail server in the list and so on. By default Monit uses
# port 25 - it is possible to override this with the PORT option.
#
# set mailserver mail.bar.baz,               # primary mailserver
#                backup.bar.baz port 10025,  # backup mailserver on port 10025
#                localhost                   # fallback relay
#
#
## By default Monit will drop alert events if no mail servers are available.
## If you want to keep the alerts for later delivery retry, you can use the
## EVENTQUEUE statement. The base directory where undelivered alerts will be
## stored is specified by the BASEDIR option. You can limit the queue size
## by using the SLOTS option (if omitted, the queue is limited by space
## available in the back end filesystem).
#
# set eventqueue
#     basedir /var/monit  # set the base directory where events will be stored
#     slots 100           # optionally limit the queue size
#
#
## Send status and events to M/Monit (for more information about M/Monit
## see https://mmonit.com/). By default Monit registers credentials with
## M/Monit so M/Monit can smoothly communicate back to Monit and you don't
## have to register Monit credentials manually in M/Monit. It is possible to
## disable credential registration using the commented out option below.
## Though, if safety is a concern we recommend instead using https when
## communicating with M/Monit and send credentials encrypted. The password
## should be URL encoded if it contains URL-significant characters like
## ":", "?", "@". Default timeout is 5 seconds, you can customize it by
## adding the timeout option.
#
# set mmonit http://monit:monit@192.168.1.10:8080/collector
#     # with timeout 30 seconds              # Default timeout is 5 seconds
#     # and register without credentials     # Don't register credentials
#
#
## Monit by default uses the following format for alerts if the mail-format
## statement is missing::
## --8<--
set mail-format {
  from:    Monit <monit@email.com>
  subject: monit alert --  $EVENT $SERVICE
  message: $EVENT Service $SERVICE
                Date:        $DATE
                Action:      $ACTION
                Host:        $HOST
                Description: $DESCRIPTION

}
set mailserver example.com port 587
  username user password pass
  using TLSV1 with timeout 30 seconds
set alert email@email.com
## --8<--
##
## You can override this message format or parts of it, such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded at runtime. For example, to override the sender, use:
#
# set mail-format { from: monit@foo.bar }
#
#
## You can set alert recipients whom will receive alerts if/when a
## service defined in this file has errors. Alerts may be restricted on
## events by using a filter as in the second example below.
#
#set alert email@email                       # receive all alerts
#
## Do not alert when Monit starts, stops or performs a user initiated action.
## This filter is recommended to avoid getting alerts for trivial cases.
#
# set alert your-name@your.domain not on { instance, action }
#
#
## Monit has an embedded HTTP interface which can be used to view status of
## services monitored and manage services from a web interface. The HTTP
## interface is also required if you want to issue Monit commands from the
## command line, such as 'monit status' or 'monit restart service' The reason
## for this is that the Monit client uses the HTTP interface to send these
## commands to a running Monit daemon. See the Monit Wiki if you want to
## enable SSL for the HTTP interface.
#
set httpd port 2812 
    use address 127.0.0.1  # only accept connection from localhost (drop if you use M/Monit)
    #allow 0.0.0.0/0.0.0.0
    allow 127.0.0.1
    allow username:password      # require user 'admin' with password 'monit'
    

###############################################################################
## Services
###############################################################################
##
## Check general system resources such as load average, cpu and memory
## usage. Each test specifies a resource, conditions and the action to be
## performed should a test fail.
#
#  check system $HOST
#    if loadavg (1min) per core > 2 for 5 cycles then alert
#    if loadavg (5min) per core > 1.5 for 10 cycles then alert
#    if cpu usage > 95% for 10 cycles then alert
#    if memory usage > 75% then alert
#    if swap usage > 25% then alert
#
#
## Check that a process is running, in this case nginx, and that it respond
## to HTTP and HTTPS requests. Check its resource usage such as cpu and memory,
## and number of children. If the process is not running, Monit will restart
## it by default. 

check process nginx with pidfile /var/run/nginx.pid
    start program = "/usr/local/etc/rc.d/nginx start" with timeout 60 seconds
    stop program  = "/usr/local/etc/rc.d/nginx stop"
    group personal
#
#
## Check a remote host availability by issuing a ping test and check the
## content of a response from a web server. Up to three pings are sent and
## connection to a port and an application level network check is performed.
#
check host mysqldb with address 127.0.0.1 
    if failed ping then alert
    if failed port 3306 protocol mysql then alert

## Check mysqld
check process mysqld with pidfile /var/db/mysql/freebsd.pid
    start program = "/usr/local/etc/rc.d/mysql-server start" with timeout 60 seconds
    stop program = "/usr/local/etc/rc.d/mysql-server stop"
    if failed unixsocket /var/run/mysql/mysql.sock then restart
#
#
#
## Check php-fpm
check process php-fpm with pidfile /var/run/php-fpm.pid
    if cpu > 50% for 2 cycles then alert
    if memory > 400 MB then alert

## Check periodic daily log for warnings
check file dailylog with path /var/log/daily.log
    if content != "[\n\r\s]+" then alert

## Check rkhunter daily log for warnings
check file rkhunter with path /var/log/rkhunter.log
    if content = "Warning" then alert

###############################################################################
## Includes
###############################################################################
##
## It is possible to include additional configuration parts from other files or
## directories.
#
include /usr/local/etc/monit.d/*

I suggest you go through the file line by line to get a general grasp of how checks are made and how you can adapt it to your specific situation. The official documentation might also be of help.

Use CTRL+F or your browser search-in-page tool for the next bit.

Some sections of my configuration I would like to emphasize:

  1. set daemon
    Run checks every 30 seconds
  2. set mailserver example.com
    You want to adjust this section with your own SMTP details in order to receive notifications via email
  3. set alert
    The email address which will receive the alerts or notifications
  4. set httpd
    Configure this section with the IP addresses you want to be allowed to access the web interface (below):
  1. check process/check file
    All occurrences of either of the above, which are the checks displayed in the image above and which you can tailor to your needs
  2. start program
    start program and stop program are extremely relevant as this is what I like to call, the server's auto-healing. We can play around with all sorts of conditions here, if the memory consumed is on a certain threshold then restart that service, etc.

This is a very powerful piece of software and I highly encourage you read their official documentation and perhaps delve into the configuration examples provided by your OS package manager, I know Debian has a lot of available configurations out of the box that you can literally symlink from available to enabled.

We could go into a lot more detail, but in this case the installation steps plus the configuration file will already give you a powerful ally on the mission of maintaining your infrastructure fearlessly.

A big thank you to the developers who have made this possible and free, a lot of open source heroes out there.

I hope this serves you well.