Introduction
Nagios is very handy tool for every network/system administrator. It is an open source monitoring software application that helps on monitoring your organization infrastructure and allows us to automate any host alert. It can be used with Linux as well as other Unix variants, however we are going to create nagios plugins with Python on CentOS 6.
Initial Set Up
Before getting into the creation of Nagios plugins, you have to first install RPMForge repository and NRPE on client VPS. NRPE stands for Nagios Remote Plugin Executor.
To install RPM repo:
rpm -ivh http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm yum -y install python nagios-nrpe useradd nrpe && chkconfig nrpe on
Creation of Python Script
Python programming language allows us to create scripts and install additional libraries. Here, we will be creating a script which will check disk usage and throw alert if it is over 85%.
the script is given below:
#!/usr/bin/python import os, sys used_space=os.popen("df -h / | grep -v Filesystem | awk '{print $5}'").readline().strip() if used_space < "85%": print "OK - %s of disk space used." % used_space sys.exit(0) elif used_space == "85%": print "WARNING - %s of disk space used." % used_space sys.exit(1) elif used_space > "85%": print "CRITICAL - %s of disk space used." % used_space sys.exit(2) else: print "UKNOWN - %s of disk space used." % used_space sys.exit(3)
Ensure to save the script in the same directory as other Nagios plugins, /usr/lib64/nagios/plugins/ checkdiskspace.py
To execute:
chmod +x /usr/lib64/nagios/plugins/checkdiskspace.py
if you are really good with python programming then you can customize the script to apply different logic to trigger alerts and exit codes or return codes on your needs.
The list of Nagios Return Codes are as follows:
Exit Codes / Return Codes | Status |
0 | OK |
1 | WARNING |
2 | CRITICAL |
3 | UNKNOWN |
Configure NRPE
After done with scripting, we need to add the script to nrpe configuration /etc/nagios/nrpe.cfg on client host.
First of all we will delete the original config file (/etc/nagios/nrpe.cfg) and update it with the following lines of code:
log_facility=daemon pid_file=/var/run/nrpe/nrpe.pid server_port=5666 nrpe_user=nrpe nrpe_group=nrpe allowed_hosts=198.211.117.251 dont_blame_nrpe=1 debug=0 command_timeout=60 connection_timeout=300 include_dir=/etc/nrpe.d/ command[checkdiskspace_python]=/usr/lib64/nagios/plugins/ checkdiskspace.py
NOTE: Make sure to specify the correct values for allowed_hosts, which should be the IP of monitoring server.
Restart the Nagios nrpe service:
service nrpe restart
Set Up on Nagios Monitoring Server
In this step, we will be adding a new host to check Nagios on monitoring server.
edit the configuration file with your favorite editor as given below:
vi /etc/nagios/objects/commands.cfg.
define command{ command_name checkdiskspace_python command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c checkdiskspace_python }
NRPE will make TCP connections to port 5666 and will run the command ‘checkdiskspace_python’ which we have already defined in /etc/nagios/nrpe.cfg on remote host.
In this tutorial, we will be adding ‘CentOS6Droplet’ as server for monitoring and will update /etc/nagios/servers/CentOS6Droplet.cfg with following lines of code:
define service { use generic-service host_name CentOSDroplet service_description Custom Disk Checker In Python check_command checkdiskspace_python }
once the file has been updated restart the nagios service.
service nagios restart
You can verify whether the new check is working or not as given below in the picture: