nagios分佈式監控配置

因爲我方與某運營商有合做關係他們出機器，但機器是歸咱們管理，因爲配置主動監控進不了他們機器，因此有這個被動監控需求。在無聊的值班日把nagios分佈式監控給搞定了，也算是一種收穫吧，很喜歡街上空蕩蕩的感受，但卻不得不爲本身的溫飽問題感到擔心...
進入主題
1、在分佈式機器上安裝和nagios主監控機同樣配置好，在界面上呈現出相關監控信息後

2、開始安裝NSCA模塊
下載地址
http://nchc.dl.sourceforge.net/sourceforge/nagios/nsca-2.7.2.tar.gz
安裝步驟
./configure && make all
這樣就安裝完畢了，下面是客戶端相關配置
cd nsca-2.7.2
1）拷貝相關文件至nagios目錄下，注意權限問題
cp sample-config/send_nsca.cfg /usr/local/nagios/etc/
cp src/send_nsca /usr/local/nagios/bin/

2） /usr/local/nagios/etc/send_nsca.cfg文件的password選項與主監控機的一致
password=xxx
3）nagios.cfg配置該配置文件中不加入send_nsca.cfg
nagios_user=nagios
nagios_group=nagios
#上面根據實際狀況更改
ocsp_command=submit_check_result
use_syslog=0#這個寫入message刷信息我以爲煩把他禁了
enable_notifications=0 #把分佈式機器上的nagios通知功能禁用
obsess_over_services=1 #設置爲obsess
4）/usr/local/nagios/etc/commands.cfg配置文件與原來監控機配置一致的基礎上添加
define command{
command_name submit_check_result
command_line /usr/local/nagios/libexec/submit_check_result $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$'
}

5）/usr/local/nagios/libexec/submit_check_result腳本內容
#!/bin/sh

# Arguments:
# $1 = host_name (Short name of host that the service is
# associated with)
# $2 = svc_description (Description of the service)
# $3 = state_string (A string representing the status of
# the given service - "OK", "WARNING", "CRITICAL"
# or "UNKNOWN")
# $4 = plugin_output (A text string that should be used
# as the plugin output for the service checks)
#

# Convert the state string to the corresponding return code
return_code=-1

case "$3" in
OK)
return_code=0
;;
WARNING)
return_code=1
;;
CRITICAL)
return_code=2
;;
UNKNOWN)
return_code=-1
;;
esac
# pipe the service check info into the send_nsca program, which
# in turn transmits the data to the nsca daemon on the central
# monitoring server

/usr/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" | /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg

#########################################
#注意腳本中的central_server爲主監控機的IP
#########################################
6）若保留該詞，則需更改/etc/hosts文件
xx.xx.xx.xx central_server
當客戶端正常工做你能夠看到進程會不斷變化
[root@TJSJHL241-189 nsca-2.7.2]# ps -ef|grep nsca
nagios   31772 31770 0 11:37 ?        00:00:00 /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg
root     31774 17037 0 11:37 pts/0    00:00:00 grep nsca
[root@TJSJHL241-189 nsca-2.7.2]# ps -ef|grep nsca
nagios   31786 31784 0 11:37 ?        00:00:00 /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg
root     31788 17037 0 11:37 pts/0    00:00:00 grep nsca
[root@TJSJHL241-189 nsca-2.7.2]# ps -ef|grep nsca
nagios   31792 31790 0 11:37 ?        00:00:00 /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg
root     31794 17037 0 11:37 pts/0    00:00:00 grep nsca

本身也能夠作個test文件測試客戶端到服務端的連通性好比test文件內容爲
"rrw-2-1" TestMessage 0 This is a test message.
[root@TJSJHL241-189 etc]# /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg < test
0 data packet(s) sent to host successfully.
只要看到sent to host successfully就能夠了
當時一直很苦惱爲何個人成功了但發的倒是0，最後我就先不考慮這問題先去配置下了，後來發現，主監控與分佈式監控的內容是一致的，嘿嘿，若有人知道的話不妨不說，在此謝過了

7）下面觀察下services.cfg的配置
define service{
        hostgroup_name                  rrw-game,rrw-res
        service_description             CPU
        check_command                   check_nrpe_cpu
        contact_groups                   yunwei
        check_period                    24x7
        max_check_attempts              4
        normal_check_interval           10
        retry_check_interval            60
        notifications_enabled           1
        notification_options            u,c,r
        check_freshness                 1
        freshness_threshold             20
        }
這是分佈式上的多出的兩選項
        check_freshness                 1
        freshness_threshold             20#此處爲正常檢測間隔的2x

能夠觀察下nagios.cfg
check_freshness 1 此選項黑認是爲1的

分佈式機器差很少了，啓動nagios便可

3、主監控機配置
下載nsca模塊
1）安裝如上相同
make all 以後
cp sample-config/nsca.cfg /usr/local/nagios/etc/
cp src/nsca /usr/local/nagios/bin/

vi /usr/local/nagios/etc/nsca.cfg更改下面配置
nsca_user=nagios
nsca_group=nagios
password=xxx
此處我就不交由xinetd啥的管理了，自已直接運行
/usr/local/nagios/bin/nsca -d -c /usr/local/nagios/etc/nsca.cfg
2）rc.local添加開機自啓動
/usr/local/nagios/bin/nsca -d -c /usr/local/nagios/etc/nsca.cfg
3）services添加
nsca            5667/tcp                        # NSCA
4）iptables開放5667端口
iptables -I RH-Firewall-1-INPUT -p tcp -m tcp --dport 5667 -j ACCEPT
service iptables save
5）新建一個目錄放置分佈式機器的相關配置文件hosts.cfg hostgroups.cfg與分佈式機器配置一致
services.cfg先拷貝分佈式機器的配置文件，完成後的配置參考以下
define service{
        hostgroup_name                  rrw-game,rrw-res
        service_description             CPU
        check_command                   check_nrpe_cpu
        contact_groups                  yunwei
        check_period                    24x7
        max_check_attempts              4
        normal_check_interval           10
        retry_check_interval            60
        notifications_enabled           1
        notification_options            u,c,r
        active_checks_enabled           0
        check_freshness                 0#注意此處若是爲1，主監控會主動刷新吧，不等分佈式機器遞交信息過來就會刷新設置爲1時會等分佈式機器遞交結果過來再刷新
        freshness_threshold             20
        passive_checks_enabled          1 # We want only passive checking
        flap_detection_enabled          0
        is_volatile                     0
        }

6）啓動相關服務nsca nagios
/usr/local/nagios/bin/nsca -d -c /usr/local/nagios/etc/nsca.cfg
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

4、主監控與分佈式機器
主監控與分佈式機器添加新監控點時兩個監控機器都得同時添加

說明，參照文檔時有提把下面添加到commands.cfg文件，同時在服務裏調用些命令，目前我這邊卻是像沒有使用到，不知道運行一段時間會不會出問題，有待觀察，嘿嘿，看這塊有沒有必要須要，具體請參照下面的文檔吧
define command{
command_name check_dummy
command_line $USER1$/check_dummy $ARG1$
}

參考文檔：
nsca模塊裏的README
http://nagios.sourceforge.net/download/contrib/documentation/misc/NSCA_Setup.pdf
http://www.packtpub.com/article/passive-checks-nsca-nagios-service-check-acceptor