Recently I have been setting up Nagios as the increasing number of machines and services per machines can make it difficult to monitor and tell what’s wrong and what’s not or when you should pay more attention at a system or a service.
Following Nagios documentation is pretty much straight forward to set up the monitoring server. Start monitoring exposed services such as SSH, HTTP, FTP, MySQL, PostgreSQL is also straight forward. Plugins such as check_tcp and check_udp provide also an easy way to see if a service is actually running. For instance, for a CVS pserver, you can use the check_tcp script to check if port 2401 is open or not. Not the best way you should actually test a service but works OK when you want to do a check.
The systems I had to get monitored regarding their local resources were of three types: LCFG Linux, self-managed Linux and self-managed Solaris. This differentiation brings a bit of complexity on its own as they need different ways of sorting monitoring with SSH but still of course using the same principles and techniques. The easiest one is the LCFG ones as a configuration header was created and “included” in every system that needed to be monitored. That looks something like the following:
/** Configuration for monitored remote hosts.
* This header will allow Nagios server to monitor
* services on remote system that use this header by
* running check_by_ssh.
**/
/** Nagios will fail to run remote command if an SSH banner is displayed **/
!openssh.sshdopts mREMOVE(Banner)
!tcpwrappers.allow_sshd mCONCATQ(" <Nagios_server_hostname_goes_here")
!auth.extrapasswd mADD(nagios)
auth.pwent_nagios nagios:*:007:007:Nagios:/home/nagios:/bin/bash
!auth.extragroup mADD(nagios)
auth.grpent_nagios nagios:*:007:apache
/** You may add "nagios" user to the user access list of the machine depending the
authentication method **/
/** Public key authentication for 'nagios' user **/
!file.files mADD(nagiosKey)
file.file_nagiosKey /localdisk/home/nagios/.ssh/authorized_keys
file.type_nagiosKey literal
file.mode_nagiosKey 0644
!file.tmpl_nagiosKey mCONCATQ("<hey_goes_here>")
!profile.packages mEXTRA(+nagios-plugins-1.4.13-4.el5)
/** List of plugins to be installed remotely **/
!profile.packages mEXTRA(+nagios-plugins-disk-1.4.13-4.el5)
!profile.packages mEXTRA(+nagios-plugins-load-1.4.13-4.el5)
!profile.packages mEXTRA(+nagios-plugins-procs-1.4.13-4.el5)
!profile.packages mEXTRA(+nagios-plugins-swap-1.4.13-4.el5)
!profile.packages mEXTRA(+nagios-plugins-users-1.4.13-4.el5)
The self managed systems would make use of either a local or network “nagios” account using public key authentication and each remote system would need to have installed manually its own set of required plugins. A single compile of the plugins in the NFS home directory of the network “nagios” account might not work when you have multiple different *NIX operating systems.
I have configured the Nagios config files for remote services based on this *very* helpful and clear guide http://wiki.nagios.org/index.php/Howtos:checkbyssh_RedHat
The key point with the remote commands is to define the right commands for Nagios, pointing at the right location of the plugins remotely and passing the correct arguments. So, five remote services have been defined, as can be seen from the RPMs above: check_disk, check_load, check_procs, check_swap, check_users.
To call each remote plugin, new command definitions need to be added in /etc/nagios/commands.cfg
define command{
command_name check_remote_disk
command_line $USER1$/check_by_ssh -p $ARG1$ \
-H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_disk \
-w $ARG2$ -c $ARG3$ -p $ARG4$'
}
define command{
command_name check_remote_users
command_line $USER1$/check_by_ssh -p $ARG1$ \
-H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_users \
-w $ARG2$ -c $ARG3$'
}
define command{
command_name check_remote_load
command_line $USER1$/check_by_ssh -p $ARG1$ \
-H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_load \
-w $ARG2$ -c $ARG3$'
}
define command{
command_name check_remote_procs
command_line $USER1$/check_by_ssh -p $ARG1$
-H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_procs \
-w $ARG2$ -c $ARG3$ -s $ARG4$'
}
define command{
command_name check_remote_swap
command_line $USER1$/check_by_ssh -p $ARG1$ \
-H $HOSTADDRESS$ -C '/usr/lib/nagios/plugins/check_swap \
-w $ARG2$ -c $ARG3$'
}
Depending on the setup you might need to change the location of the plugins or use more options such as desirable user to login, location of keys, IPv4 or IPv6 connection, use of SSH1 or SSH2 etc… Once having defined the commands, they can be used to define services within host configuration files.
The main reason I wanted to avoid using NRPE was the fact that one more services should be exposed, even internally, from system that you want to expose only what is necessary. NRPE would be useful if Windows servers should be monitored for their resources.