Home | 简体中文 | 繁体中文 | 杂文 | 知乎专栏 | Github | OSChina 博客 | 云社区 | 云栖社区 | Facebook | Linkedin | 视频教程 | 打赏(Donations) | About
知乎专栏多维度架构 微信号 netkiller-ebook | QQ群:128659835 请注明“读者”

6.3. Nagios

homepage: http://www.nagios.org/

6.3.1. Install

6.3.1.1. Nagios core

Nagios 是一种开放源代码监视软件,它可以扫描主机、服务、网络方面存在的问题。Nagios 与其他类似的包之间的主要区别在于,Nagios 将所有的信息简化为“工作(working)”、“可疑的(questionable)”和“故障(failure)”状态,并且 Nagios 支持由插件组成的非常丰富的“生态系统”。这些特性使得用户能够进行有效安装,在此过程中无需过多地关心细节内容,只提供他们所需的信息即可。

install

$ sudo apt-get install nagios3 nagios-nrpe-plugin
		

add user nagiosadmin for nagios

$ sudo htpasswd -c /etc/nagios2/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
		

Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.

$ groupadd nagcmd
$ sudo usermod -a -G nagcmd nagios
$ sudo usermod -a -G nagcmd www-data
$ cat /etc/group
nagcmd:x:1003:nagios,www-data
		

reload apache

$ sudo /etc/init.d/apache2 reload
 * Reloading web server config apache2                    [ OK ]
		

6.3.1.2. Monitor Client nrpe

		
nagios-nrpe-server --------> nagios core (nagios-nrpe-plugin)
		
		

nagios-nrpe-server 的功能是向服务器发送监控数据, 而服务器端通过nagios-nrpe-plugin接收监控数据。

sudo apt-get install nagios-nrpe-server nagios-plugins
		

/etc/nagios/nrpe.cfg

/etc/nagios/nrpe_local.cfg

$ sudo vim /etc/nagios/nrpe_local.cfg
allowed_hosts=172.16.1.2

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 20% -c 10%
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -e
command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
command[check_disk_home]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /home
command[check_sda_iostat]=/usr/lib/nagios/plugins/check_iostat -d sda -w 100 -c 200
command[check_sdb_iostat]=/usr/lib/nagios/plugins/check_iostat -d sdb -w 100 -c 200
# command[check_uri_user]=/usr/lib/nagios/plugins/check_http -I 127.0.0.1 -p 80 -u http://example.com/test/ok.php
# command[check_mysql]=/usr/lib/nagios/plugins/check_mysql -H localhost -u root -ppassword test -P 3306
		

重启后生效

/etc/init.d/nagios-nrpe-server restart
		

6.3.1.3. Monitoring Windows Machines

Nagios 可以监控windows服务器,需要安装下面软件。

NSClient++

http://sourceforge.net/projects/nscplus

6.3.1.4. PNP4Nagios 图表插件

http://www.pnp4nagios.org/

6.3.2. nagios

Install Nagios & Plugins

[root@database ~]# yum -y install nagios nagios-plugins-all nagios-plugins-nrpe
		

Create the default Nagios web access user & set a password

# htpasswd -c /etc/nagios/passwd nagiosadmin
		

Verify default config files

nagios -v /etc/nagios/nagios.cfg		
		

Start Nagios

Start Nagios
		

Configure it to start on boot

chkconfig --levels 345 nagios on
		

http://localhost/nagios/

6.3.3. nrpe node

# yum install nrpe nagios-plugins-all

allowed_hosts=172.16.1.2

command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200
command[check_http]=/usr/lib64/nagios/plugins/check_http -I 127.0.0.1 -p 80 -u http://www.example.com/index.html
command[check_swap]=/usr/lib64/nagios/plugins/check_swap -w 20% -c 10%
command[check_all_disks]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -e

# chkconfig nrpe on
# service nrpe start
		

其实没有必要安装所有的监控插件

yum install nrpe -y
yum install nagios-plugins-disk  nagios-plugins-load nagios-plugins-ping nagios-plugins-procs nagios-plugins-swap nagios-plugins-users -y		
		

6.3.4. 配置 Nagios

$ sudo vim /etc/nagios3/nagios.cfg

cfg_dir=/etc/nagios3/hosts
cfg_dir=/etc/nagios3/servers
cfg_dir=/etc/nagios3/switches
cfg_dir=/etc/nagios3/routers

admin_email=nagios, neo.chen@example.com
		

6.3.4.1. authorized

add user neo for nagios

$ sudo htpasswd /etc/nagios3/htpasswd.users neo
New password:
Re-type new password:
Adding password for user neo
			

# grep default_user_name cgi.cfg
#default_user_name=guest

# grep authorized cgi.cfg
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
#authorized_for_read_only=user1,user2
			


$ sudo vim /etc/nagios3/cgi.cfg

authorized_for_all_services=nagiosadmin,neo
authorized_for_all_hosts=nagiosadmin,neo

			

6.3.4.2. contacts

$ sudo vim /etc/nagios3/conf.d/contacts_nagios2.cfg

###############################################################################
# contacts.cfg
###############################################################################

define contact{
        contact_name                    neo
        alias                           Neo
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,r
        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-host-by-email
        email                           neo.chen@example.com
        }

###############################################################################
###############################################################################
#
# CONTACT GROUPS
#
###############################################################################
###############################################################################

# We only have one contact in this simple configuration file, so there is
# no need to create more than one contact group.

define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 root, neo
        }

			

当服务出现w—报警(warning),u—未知(unkown),c—严重(critical),r—从异常恢复到正常,在这四种情况下通知联系人

当主机出现d-当机(down),u—返回不可达(unreachable),r—从异常情况恢复正常,在这3种情况下通知联系人

确认 contact_groups 已经设置

neo@monitor:/etc/nagios3$ grep admins conf.d/generic-host_nagios2.cfg
                contact_groups                  admins
neo@monitor:/etc/nagios3$ grep admins conf.d/generic-service_nagios2.cfg
                contact_groups                  admins
			

6.3.4.3. hostgroups

$ sudo vim /etc/nagios3/conf.d/hostgroups_nagios2.cfg

define hostgroup {
        hostgroup_name  mysql-servers
                alias           MySQL Servers
                members         *
        }

			

6.3.4.4. generic-service

$ cat /etc/nagios3/conf.d/generic-service_nagios2.cfg
# generic service template definition
define service{
        name                            generic-service ; The 'name' of this service template
        active_checks_enabled           1       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       ; We should obsess over this service (if necessary)
        check_freshness                 0       ; Default is to NOT check service 'freshness'
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                notification_interval           0               ; Only send notifications on status change by default.
                is_volatile                     0
                check_period                    24x7
                normal_check_interval           5
                retry_check_interval            1
                max_check_attempts              4
                notification_period             24x7
                notification_options            w,u,c,r
                contact_groups                  admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
			
  • notification_interval 报警发送间隔,单位分钟

  • normal_check_interval 间隔时间

  • retry_check_interval 重试间隔时间

  • max_check_attempts 检查次数,4次失败后报警

6.3.4.5. SOUND OPTIONS

发出警报声

			
$ sudo vim /etc/nagios3/cgi.cfg

# SOUND OPTIONS
# These options allow you to specify an optional audio file
# that should be played in your browser window when there are
# problems on the network.  The audio files are used only in
# the status CGI.  Only the sound for the most critical problem
# will be played.  Order of importance (higher to lower) is as
# follows: unreachable hosts, down hosts, critical services,
# warning services, and unknown services. If there are no
# visible problems, the sound file optionally specified by
# 'normal_sound' variable will be played.
#
#
# <varname>=<sound_file>
#
# Note: All audio files must be placed in the /media subdirectory
# under the HTML path (i.e. /usr/local/nagios/share/media/).

host_unreachable_sound=hostdown.wav
host_down_sound=hostdown.wav
service_critical_sound=critical.wav
service_warning_sound=warning.wav
service_unknown_sound=warning.wav
normal_sound=noproblem.wav
			
			

6.3.4.6. SMS 短信

vim /etc/nagios3/commands.cfg

# 'notify-host-by-sms' command definition
define command{
        command_name    notify-host-by-sms
        command_line    /srv/sms/sms $CONTACTPAGER$ "Host: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n"
        }

# 'notify-service-by-sms' command definition
define command{
        command_name    notify-service-by-sms
        command_line    /srv/sms/sms $CONTACTPAGER$ "Service: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$"
        }
			
sudo vim /etc/nagios3/conf.d/contacts_nagios2.cfg
define contact{
        contact_name                    neo
        alias                           Neo
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,r
        service_notification_commands   notify-service-by-email, notify-service-by-sms
        host_notification_commands      notify-host-by-email, notify-host-by-sms
        email                           neo.chen@example.com
        pager 							13113668899
        }

			

6.3.4.7. nrpe plugins

neo@monitor:/etc/nagios3/hosts$ sudo cat www.example.com.cfg

define host{
        use             generic-host            ; Inherit default values from a template
        host_name       www.example.com             ; The name we're giving to this host
        alias           Some Remote Host        ; A longer name associated with the host
        address         172.16.1.10              ; IP address of the host
        hostgroups      http-servers                    ; Host groups this host is associated with
        }

# NRPE disk check.
define service {
        use                             generic-service
        host_name                       www.example.com
        service_description             nrpe-disk
        check_command                   check_nrpe_1arg!check_all_disks!172.16.1.10
}
define service {
        use                             generic-service
        host_name                       www.example.com
        service_description             nrpe-users
        check_command                   check_nrpe_1arg!check_users!172.16.1.10
}
define service {
        use                             generic-service
        host_name                       www.example.com
        service_description             nrpe-swap
        check_command                   check_nrpe_1arg!check_swap!172.16.1.10
}
define service {
        use                             generic-service
        host_name                       www.example.com
        service_description             nrpe-procs
        check_command                   check_nrpe_1arg!check_total_procs!172.16.1.10
}
define service {
        use                             generic-service
        host_name                       www.example.com
        service_description             nrpe-load
        check_command                   check_nrpe_1arg!check_load!172.16.1.10
}
define service {
        use                             generic-service
        host_name                       www.example.com
        service_description             nrpe-zombie_procs
        check_command                   check_nrpe_1arg!check_zombie_procs!172.16.1.10
}

			

6.3.5. 配置监控设备

6.3.5.1. routers

vim /etc/nagios3/routers/firewall.cfg

define host{

        use             generic-host; Inherit default values from a template

        host_name       firewall         ; The name we're giving to this switch

        alias           Cisco PIX 515E Firewall ; A longer name associated with the switch

        address         172.16.1.254            ; IP address of the switch

        hostgroups      all,networks            ; Host groups this switch is associated with

        }

define service{

        use                     generic-service ; Inherit values from a template

        host_name                       firewall ; The name of the host the service is associated with

        service_description     PING            ; The service description

        check_command           check_ping!200.0,20%!600.0,60%  ; The command used to monitor the service

        normal_check_interval   5       ; Check the service every 5 minutes under normal conditions

        retry_check_interval    1       ; Re-check the service every minute until its final/hard state is determined

        }

define service{

        use                     generic-service ; Inherit values from a template

        host_name                       firewall

        service_description     Uptime

        check_command           check_snmp!-C public -o sysUpTime.0

        }

			

6.3.5.2. host

define service{
    use                             local-service
    host_name                       www.example.com
    service_description             Host Alive
    check_command                   check-host-alive
    }
			

6.3.5.3. service

6.3.5.3.1. http

hosts

$ cat /etc/nagios3/hosts/www.example.com.cfg
define host{

        use             generic-host            ; Inherit default values from a template

        host_name       www.example.com             ; The name we're giving to this host

        alias           Some Remote Host        ; A longer name associated with the host

        address         120.132.14.6           ; IP address of the host

        hostgroups      all,http-servers        ; Host groups this host is associated with

        }

define service{

        use             generic-service         ; Inherit default values from a template

        host_name               www.example.com

        service_description     HTTP

        check_command   check_http

        }

				

HTTP状态

neo@monitor:~$ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -s "HTTs"
HTTP CRITICAL: HTTP/1.1 404 Not Found - string not found - 336 bytes in 0.001 second response time |time=0.000733s;;;0.000000 size=336B;;;0

neo@monitor:~$ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -e '404'
HTTP OK: Status line output matched "404" - 336 bytes in 0.001 second response time |time=0.000715s;;;0.000000 size=336B;;;0

				
6.3.5.3.2. mysql hosts
$ sudo vim /etc/nagios3/hosts/mysql.cfg


define host{

        use             generic-host            ; Inherit default values from a template

        host_name       mysql-master.example.com            ; The name we're giving to this host

        alias           Some Remote Host        ; A longer name associated with the host

        address         172.16.1.6             ; IP address of the host

        hostgroups      all,mysql-servers       ; Host groups this host is associated with

        }

define service{

        use             generic-service         ; Inherit default values from a template

        host_name               mysql-master.example.com

        service_description     MySQL

        check_command   check_mysql_database!user!passwd!database

        }

				
6.3.5.3.3. check_tcp
define service{
    use                        generic-service     
    host_name                  db.example.com
    service_description        MySQL Master1 Port
    check_command              check_tcp!3306
    }
				

6.3.6. Nagios Plugins

检查命令配置文件 /etc/nagios-plugins/config/

6.3.6.1. check_ping

nagios check_ping命令使用方法

具体如下:
-H    主机地址
-w     WARNING 状态:   响应时间(毫秒),丢包率 (%)   阀值
-c     CRITICAL状态:    响应时间(毫秒),丢包率 (%)   阀值
-p     发送的包数           默认5个包
-t     超时时间             默认10秒
-4|-6                        使用ipv4|ipv6 地址     默认ipv4
		

实例:

/usr/lib64/nagios/plugins/check_ping -H 74.125.71.106 -w 100.0,20% -c 200.0,50%
		

6.3.6.2. check_procs

# /usr/lib64/nagios/plugins/check_procs
PROCS OK: 75 processes

# /usr/lib64/nagios/plugins/check_procs -a mingetty
PROCS OK: 6 processes with args 'mingetty'

# /usr/lib64/nagios/plugins/check_procs -C crond
PROCS OK: 1 process with command name 'crond'
		

6.3.6.3. check_users

监控如果有用户登陆就发出警告

# /usr/lib64/nagios/plugins/check_users -w 0 -c 5
USERS WARNING - 1 users currently logged in |users=1;0;5;0
		

监控用户上线5

# /usr/lib64/nagios/plugins/check_users -w 5 -c 50
USERS OK - 1 users currently logged in |users=1;5;50;0
		

6.3.6.4. check_http

命令定义

define command{
        command_name    check_http_404
        command_line    /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -e '404'
        }

define command{
        command_name    check_http_status
        command_line    /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -e '$ARG1$'
        }

define command{
        command_name    check_http_url
        command_line    /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$'  -u '$ARG1$'
        }
		

默认HTTP健康检查超时时间是10秒,如果你的网站需要更长的时间才能打开可以使用-t参数修改默认Timeout时间

# 'check_http' command definition
define command{
        command_name    check_http
        command_line    /usr/lib/nagios/plugins/check_http -t 30 -H '$HOSTADDRESS$' -I '$HOSTADDRESS$'
        }
		
# /srv/nagios/libexec/check_http -H www.163.com
HTTP OK: HTTP/1.0 200 OK - 657627 bytes in 1.772 second response time |time=1.771681s;;;0.000000 size=657627B;;;0

$ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -s "HTTs"
HTTP CRITICAL: HTTP/1.1 404 Not Found - string not found - 336 bytes in 0.001 second response time |time=0.000733s;;;0.000000 size=336B;;;0

$ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -e '404'
HTTP OK: Status line output matched "404" - 336 bytes in 0.001 second response time |time=0.000715s;;;0.000000 size=336B;;;0
		

6.3.6.5. check_mysql

命令参数

check_mysql [-d database] [-H host] [-P port] [-s socket]
       [-u user] [-p password] [-S]


/usr/lib64/nagios/plugins/check_mysql -d dbname -H 202.176.120.10 -P 3306 -u test -p password
Uptime: 254264  Threads: 16  Questions: 535110791  Slow queries: 21  Opens: 110  Flush tables: 1  Open tables: 81  Queries per second avg: 2104.547
		
6.3.6.5.1. check_mysql
$ /usr/lib64/nagios/plugins/check_mysql --hostname=172.16.1.5 --port=3306 --username=monitor --password=monitor
Uptime: 27001  Threads: 8  Questions: 25280156  Slow queries: 14941  Opens: 1389932  Flush tables: 3  Open tables: 128  Queries per second avg: 936.267
				
6.3.6.5.2. mysql.cfg check_mysql_replication
				
cat >> /usr/lib64/nagios/plugins/check_mysql_replication <<EOF
#!/bin/bash

declare -a slave_is

slave_is=($(mysql -h$1 -umonitor -pxmNhj -e "show slave status\G"|grep Running |awk '{print $2}'))

if [ "${slave_is[0]}" = "Yes" -a "${slave_is[1]}" = "Yes" ]
     then
     echo "OK - Slave is running"
     exit 0
else
     echo "Critical - Slave is error"
     exit 2
fi
EOF
				
				
sudo chmod +x /usr/lib64/nagios/plugins/check_mysql_replication
/usr/lib64/nagios/plugins/check_mysql_replication 172.16.1.4
Critical - slave is error
				
vim /etc/nagios-plugins/config/mysql.cfg

# 'check_mysql_replication' command definition
define command{
        command_name    check_mysql_replication
        command_line    /usr/lib/nagios/plugins/check_mysql_replication $HOSTADDRESS$
}
define command{
        command_name    check_mysql_replication_host
        command_line    /usr/lib/nagios/plugins/check_mysql_replication '$ARG1$'
}
				

				
6.3.6.5.3. nrpe.cfg check_mysql_replication

nrpe.cfg

				
cat >> /usr/lib64/nagios/plugins/check_mysql_replication <<EOF
#!/bin/bash

declare -a slave_is

slave_is=($(mysql -umonitor -pxmNhj -e "show slave status\G"|grep Running |awk '{print $2}'))

if [ "${slave_is[0]}" = "Yes" -a "${slave_is[1]}" = "Yes" ]
     then
     echo "OK - slave is running"
     exit 0
else
     echo "Critical - slave is error"
     exit 2
fi
EOF

command[check_mysql_slave]=/usr/lib64/nagios/plugins/check_mysql_replication

/usr/local/nagios/libexec/check_nrpe -H 192.168.1.1
/usr/local/nagios/libexec/check_nrpe -H 192.168.1.1 -c check_mysql_replication


define service {
	host_name 192.168.10.232
	service_description check_mysql_replication
	check_period 24x7
	max_check_attempts 5
	normal_check_interval 3
	retry_check_interval 2
	contact_groups mygroup
	notification_interval 5
	notification_period 24x7
	notification_options w,u,c,r
	check_command check_nrpe!check_mysql_replication
}
				
				

6.3.6.6. Disk

6.3.6.6.1. disk.cfg
$ cat /etc/nagios-plugins/config/disk.cfg
# 'check_disk' command definition
define command{
	command_name	check_disk
	command_line	/usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e -p '$ARG3$'
	}

# 'check_all_disks' command definition
define command{
	command_name	check_all_disks
	command_line	/usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e
	}

# 'ssh_disk' command definition
define command{
	command_name	ssh_disk
	command_line	/usr/lib/nagios/plugins/check_by_ssh -H '$HOSTADDRESS$' -C '/usr/lib/nagios/plugins/check_disk -w '\''$ARG1$' -c '\''$ARG2$'\'' -e -p '\''$ARG3$'\'
	}

####
# use these checks, if you want to test IPv4 connectivity on IPv6 enabled systems
####

# 'ssh_disk_4' command definition
define command{
        command_name    ssh_disk_4
        command_line    /usr/lib/nagios/plugins/check_by_ssh -H '$HOSTADDRESS$' -C '/usr/lib/nagios/plugins/check_disk -w '\''$ARG1$'\'' -c '\''$ARG2$'\'' -e -p '\''$ARG3$'\' -4
        }
				
6.3.6.6.2. check_disk

WARNING/CRITICAL 报警阀值

-w 10% -c 5%
-w 100M -c 50M
				

-p, --path=PATH, --partition=PARTITION 参数监控路径,可以一次写多个参数

$ /usr/lib/nagios/plugins/check_disk -w 10% -c 5% -p / -p /opt -p /boot
DISK OK - free space: / 23872 MB (66% inode=92%); /opt 99242 MB (47% inode=93%); /boot 276 MB (63% inode=99%);| /=11767MB;33792;35669;0;37547 /opt=110882MB;199232;210300;0;221369 /boot=160MB;414;437;0;460

$ /usr/lib/nagios/plugins/check_disk -w 100M -c 50M -p / -p /opt -p /boot
DISK OK - free space: / 23872 MB (66% inode=92%); /opt 99242 MB (47% inode=93%); /boot 276 MB (63% inode=99%);| /=11768MB;37447;37497;0;37547 /opt=110882MB;221269;221319;0;221369 /boot=160MB;360;410;0;460
				

-x, --exclude_device=PATH 排除监控路径

/usr/lib64/nagios/plugins/check_disk -w 10% -c 5% -e -x /bak -x /u01
				
6.3.6.6.3. disk-smb.cfg
$ cat disk-smb.cfg
# 'check_disk_smb' command definition
define command{
	command_name	check_disk_smb
	command_line	/usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$'
	}


# 'check_disk_smb_workgroup' command definition
define command{
	command_name	check_disk_smb_workgroup
	command_line	/usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$' -W '$ARG3$'
	}


# 'check_disk_smb_host' command definition
define command{
	command_name	check_disk_smb_host
	command_line	/usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$'
	}


# 'check_disk_smb_workgroup_host' command definition
define command{
	command_name	check_disk_smb_workgroup_host
	command_line	/usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$' -W '$ARG3$'
	}


# 'check_disk_smb_user' command definition
define command{
	command_name	check_disk_smb_user
	command_line	/usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$' -u '$ARG3$' -p '$ARG4$' -w '$ARG5$' -c '$ARG6$'
	}


# 'check_disk_smb_workgroup_user' command definition
define command{
	command_name	check_disk_smb_workgroup_user
	command_line	/usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$' -W '$ARG3$' -u '$ARG4$' -p '$ARG5$'
	}


# 'check_disk_smb_host_user' command definition
define command{
	command_name	check_disk_smb_host_user
	command_line	/usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$' -u '$ARG3$' -p '$ARG4$'
	}


# 'check_disk_smb_workgroup_host_user' command definition
define command{
	command_name	check_disk_smb_workgroup_host_user
	command_line	/usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$' -W '$ARG3$' -u '$ARG4$' -p '$ARG5$'
	}
				

6.3.6.7. check_tcp

6.3.6.7.1. 端口检查
$ /usr/lib/nagios/plugins/check_tcp -H 172.16.1.2 -p 80
TCP OK - 0.000 second response time on port 80|time=0.000369s;;;0.000000;10.000000
				
6.3.6.7.2. Memcache
$ /usr/lib64/nagios/plugins/check_tcp -H localhost -p 11211 -t 5 -E -s 'stats\r\nquit\r\n' -e 'uptime' -M crit
TCP OK - 0.001 second response time on port 11211 [STAT pid 29253
STAT uptime 36088
STAT time 1311100189
STAT version 1.4.5
STAT pointer_size 64
STAT rusage_user 3.207512
STAT rusage_system 50.596308
STAT curr_connections 10
STAT total_connections 97372
STAT connection_structures 84
STAT cmd_get 84673
STAT cmd_set 273
STAT cmd_flush 0
STAT get_hits 84336
STAT get_misses 337
STAT delete_misses 0
STAT delete_hits 0
STAT incr_misses 0
STAT incr_hits 0
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 0
STAT cas_badval 0
STAT auth_cmds 0
STAT auth_errors 0
STAT bytes_read 49280152
STAT bytes_written 46326517326
STAT limit_maxbytes 4294967296
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT threads 4
STAT conn_yields 0
STAT bytes 1345
STAT curr_items 14
STAT total_items 241
STAT evictions 0
STAT reclaimed 135
END]|time=0.000658s;;;0.000000;5.000000
				
6.3.6.7.3. Redis
# /usr/lib64/nagios/plugins/check_tcp -H 192.168.2.1 -p 6379 -t 5 -E -s 'info\r\n' -q 'quit\r\n' -e 'uptime_in_days' -M crit
TCP OK - 0.001 second response time on port 6379 [$1043
redis_version:2.4.10
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.6
process_id:21331
uptime_in_seconds:18152153
uptime_in_days:210
lru_clock:1801614
used_cpu_sys:1579.41
used_cpu_user:2279.26
used_cpu_sys_children:54.32
used_cpu_user_children:54.11
connected_clients:2
connected_slaves:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:1158016
used_memory_human:1.10M
used_memory_rss:1560576
used_memory_peak:1289920
used_memory_peak_human:1.23M
mem_fragmentation_ratio:1.35
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:0
changes_since_last_save:2
bgsave_in_progress:0
last_save_time:1423107828
bgrewriteaof_in_progress:0
total_connections_received:594376
total_commands_processed:1350747
expired_keys:12199
evicted_keys:0
keyspace_hits:511525
keyspace_misses:124116
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:361
vm_enabled:0
role:master
slave0:192.168.6.1,58091,online
db0:keys=1913,expires=7]|time=0.000815s;;;0.000000;5.000000
				

6.3.6.8. check_log

官方的 check_log 有很多缺陷,不能监控大文件。它的监控原理是cat log to oldlog 然后通过diff比较

6.3.6.9. check_traffic

http://exchange.nagios.org/directory/Plugins/Network-Connections,-Stats-and-Bandwidth/check_traffic-2Esh/details

https://github.com/cloved/check_traffic

网卡流量监测

6.3.6.10. Nagios nrpe plugins

nrpe 插件接收来自nagios-nrpe-server数据报告

cat /etc/nagios3/hosts/host.example.org.cfg

define host{

        use             generic-host            ; Inherit default values from a template

        host_name       host.example.org        ; The name we're giving to this host

        alias           Some Remote Host        ; A longer name associated with the host

        address         172.16.1.3              ; IP address of the host

        hostgroups      all                     ; Host groups this host is associated with

        }

# NRPE disk check.
define service {
        use                             generic-service
        host_name                       backup
        service_description             nrpe-disk
        check_command                   check_nrpe_1arg!check_all_disks!172.16.1.3
}
define service {
        use                             generic-service
        host_name                       backup
        service_description             nrpe-users
        check_command                   check_nrpe_1arg!check_users!172.16.1.3
}
define service {
        use                             generic-service
        host_name                       backup
        service_description             nrpe-swap
        check_command                   check_nrpe_1arg!check_swap!172.16.1.3
}
define service {
        use                             generic-service
        host_name                       backup
        service_description             nrpe-procs
        check_command                   check_nrpe_1arg!check_procs!172.16.1.3
}

			

6.3.6.11. check_nt

Define windows services that should be monitored.

# Define a host for the Windows machine we'll be monitoring
# Change the host_name, alias, and address to fit your situation

define host{
use             windows-server              ; Inherit default values from a template
host_name   remote-windows-host      ; The name we're giving to this host
alias            Remote Windows Host     ; A longer name associated with the host
address       192.168.1.4                   ; IP address of the remote windows host
}

define service{
use                     generic-service
host_name               remote-windows-host
service_description     NSClient++ Version
check_command           check_nt!CLIENTVERSION
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Uptime
check_command           check_nt!UPTIME
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     CPU Load
check_command           check_nt!CPULOAD!-l 5,80,90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Memory Usage
check_command           check_nt!MEMUSE!-w 80 -c 90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     C:\ Drive Space
check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     W3SVC
check_command           check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Explorer
check_command           check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}
		

Enable Password Protection

define command{
command_name	check_nt
command_line	$USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s My2Secure$Password -v $ARG1$ $ARG2$
}
		

6.3.6.12. nsca - Nagios Service Check Acceptor

# yum install nsca
		

6.3.6.13. jmx

nagios plugin to check jmx

https://code.google.com/p/jmxquery/

wget https://jmxquery.googlecode.com/files/jmxquery-1.3-bin.zip
unzip jmxquery-1.3-bin.zip
chmod +x check_jmx		
		
		<![CDATA[
# ./check_jmx -help
Usage: check_jmx [-option...] -U url -O object -A attribute
       (to query an attribute)
   or  check_jmx [-option...] -U url -O object -M method
       (to invoke a zero-argument method)
   or  check_jmx -help
       (to display this help page)

Mandatory parameters are:
 -U     JMX URL, for example: "service:jmx:rmi:///jndi/rmi://localhost:1616/jmxrmi"
 -O     Object name to be checked, for example, "java.lang:type=Memory"
 -A   	Attribute of the object to be checked, for example, "NonHeapMemoryUsage" (not compatible with -M switch)
 -M     Zero-argument method to be invoked (not compatible with -A switch)

Options are:
 -K <key>
        Key for compound data, for example, "used"
 -I <info attribute>
        Attribute of the object containing information for text output
 -J <info attribute key>
        Attribute key for -I attribute compound data, for example, "used"
 -v[v[v[v]]]
	    Verbatim level controlled as a number of v
 -w <limit>
	    Warning long value
 -c <limit>
	    Critical long value
 -default <value>
        Use default value if requested object/attribute/method does not exist
 -username <user name> -password <password>
	    Credentials for JMX

Note that if warning level > critical, system checks object attribute value to be LESS THAN OR EQUAL warning, critical
If warning level < critical, system checks object attribute value to be MORE THAN OR EQUAL warning, critical 		
		
		

例 6.2. 

# ./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:9012/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 731847066 -c 1045495808
JMX OK - HeapMemoryUsage.used=98617544 | HeapMemoryUsage.used=98617544,committed=514850816;init=536870912;max=7635730432;used=98617544			
			
# ./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:9012/jmxrmi -O org:type=Spring,name=BackgroundService -A QueueSize -w 10 -c 20
JMX CRITICAL - org:type=Spring,name=BackgroundService
			

6.3.7. FAQ

6.3.7.1. Macro Name

http://nagios.sourceforge.net/docs/3_0/macrolist.html

6.3.7.2. 插件开发手册

https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT