Nagios remote resources monitoring using SSH (check_by_ssh)

Recently I have been setting up Nagios as the increasing number of machines and services per machines can make it difficult to monitor and tell what’s wrong and what’s not or when you should pay more attention at a system or a service.

Following Nagios documentation is pretty much straight forward to set up the monitoring server. Start monitoring exposed services such as SSH, HTTP, FTP, MySQL, PostgreSQL is also straight forward. Plugins such as check_tcp and check_udp provide also an easy way to see if a service is actually running. For instance, for a CVS pserver, you can use the check_tcp script to check if port 2401 is open or not. Not the best way you should actually test a service but works OK when you want to do a check.

The systems I had to get monitored regarding their local resources were of three types: LCFG Linux, self-managed Linux and self-managed Solaris. This differentiation brings a bit of complexity on its own as they need different ways of sorting monitoring with SSH but still of course using the same principles and techniques. The easiest one is the LCFG ones as a configuration header was created and “included” in every system that needed to be monitored. That looks something like the following:

/** Configuration for monitored remote hosts.
*   This header will allow Nagios server to monitor
*   services on remote system that use this header by
*   running check_by_ssh.
**/

/** Nagios will fail to run remote command if an SSH banner is displayed **/
!openssh.sshdopts       mREMOVE(Banner)

!tcpwrappers.allow_sshd mCONCATQ(" <Nagios_server_hostname_goes_here")

!auth.extrapasswd       mADD(nagios)
auth.pwent_nagios       nagios:*:007:007:Nagios:/home/nagios:/bin/bash
!auth.extragroup        mADD(nagios)
auth.grpent_nagios      nagios:*:007:apache

/** You may add "nagios" user to the user access list of the machine depending the
authentication method **/

/** Public key authentication for 'nagios' user **/
!file.files             mADD(nagiosKey)
file.file_nagiosKey     /localdisk/home/nagios/.ssh/authorized_keys
file.type_nagiosKey     literal
file.mode_nagiosKey     0644
!file.tmpl_nagiosKey    mCONCATQ("<hey_goes_here>")

!profile.packages       mEXTRA(+nagios-plugins-1.4.13-4.el5)

/** List of plugins to be installed remotely **/
!profile.packages       mEXTRA(+nagios-plugins-disk-1.4.13-4.el5)
!profile.packages       mEXTRA(+nagios-plugins-load-1.4.13-4.el5)
!profile.packages       mEXTRA(+nagios-plugins-procs-1.4.13-4.el5)
!profile.packages       mEXTRA(+nagios-plugins-swap-1.4.13-4.el5)
!profile.packages       mEXTRA(+nagios-plugins-users-1.4.13-4.el5)

The self managed systems would make use of either a local or network “nagios” account using public key authentication and each remote system would need to have installed manually its own set of required plugins. A single compile of the plugins in the NFS home directory of the network “nagios” account might not work when you have multiple different *NIX operating systems.

I have configured the Nagios config files for remote services based on this *very* helpful and clear guide http://wiki.nagios.org/index.php/Howtos:checkbyssh_RedHat

The key point with the remote commands is to define the right commands for Nagios, pointing at the right location of the plugins remotely and passing the correct arguments. So, five remote services have been defined, as can be seen from the RPMs above: check_disk, check_load, check_procs, check_swap, check_users.

To call each remote plugin, new command definitions need to be added in /etc/nagios/commands.cfg

define command{
        command_name    check_remote_disk
        command_line    $USER1$/check_by_ssh -p $ARG1$ \
        -H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_disk \
        -w $ARG2$ -c $ARG3$ -p $ARG4$'
        }

define command{
        command_name    check_remote_users
        command_line    $USER1$/check_by_ssh -p $ARG1$ \
        -H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_users \
        -w $ARG2$ -c $ARG3$'
        }

define command{
        command_name    check_remote_load
        command_line    $USER1$/check_by_ssh -p $ARG1$ \
       -H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_load \
       -w $ARG2$ -c $ARG3$'
        }

define command{
        command_name    check_remote_procs
        command_line    $USER1$/check_by_ssh -p $ARG1$ 
       -H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_procs \
       -w $ARG2$ -c $ARG3$ -s $ARG4$'
        }

define command{
        command_name    check_remote_swap
        command_line    $USER1$/check_by_ssh -p $ARG1$ \
        -H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_swap \
        -w $ARG2$ -c $ARG3$'
        }

Depending on the setup you might need to change the location of the plugins or use more options such as desirable user to login, location of keys, IPv4 or IPv6 connection, use of SSH1 or SSH2 etc… Once having defined the commands, they can be used to define services within host configuration files.

The main reason I wanted to avoid using NRPE was the fact that one more services should be exposed, even internally, from system that you want to expose only what is necessary. NRPE would be useful if Windows servers should be monitored for their resources.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s