因爲我方與某運營商有合做關係他們出機器,但機器是歸咱們管理,因爲配置主動監控進不了他們機器,因此有這個被動監控需求。在無聊的值班日把nagios分佈式監控給搞定了,也算是一種收穫吧,很喜歡街上空蕩蕩的感受,但卻不得不爲本身的溫飽問題感到擔心...
進入主題 1、在分佈式機器上安裝和nagios主監控機同樣配置好,在界面上呈現出相關監控信息後
2、開始安裝NSCA模塊
下載地址 http://nchc.dl.sourceforge.net/sourceforge/nagios/nsca-2.7.2.tar.gz 安裝步驟 ./configure && make all 這樣就安裝完畢了,下面是客戶端相關配置 cd nsca-2.7.2 1)拷貝相關文件至nagios目錄下,注意權限問題 cp sample-config/send_nsca.cfg /usr/local/nagios/etc/ cp src/send_nsca /usr/local/nagios/bin/
2) /usr/local/nagios/etc/send_nsca.cfg文件的password選項與主監控機的一致
password=xxx 3)nagios.cfg配置該配置文件中不加入send_nsca.cfg nagios_user=nagios nagios_group=nagios #上面根據實際狀況更改 ocsp_command=submit_check_result use_syslog=0#這個寫入message刷信息我以爲煩把他禁了 enable_notifications=0 #把分佈式機器上的nagios通知功能禁用 obsess_over_services=1 #設置爲obsess 4)/usr/local/nagios/etc/commands.cfg配置文件與原來監控機配置一致的基礎上添加 define command{ command_name submit_check_result command_line /usr/local/nagios/libexec/submit_check_result $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$' }
5)/usr/local/nagios/libexec/submit_check_result腳本內容
#!/bin/sh
# Arguments:
# $1 = host_name (Short name of host that the service is # associated with) # $2 = svc_description (Description of the service) # $3 = state_string (A string representing the status of # the given service - "OK", "WARNING", "CRITICAL" # or "UNKNOWN") # $4 = plugin_output (A text string that should be used # as the plugin output for the service checks) #
# Convert the state string to the corresponding return code
return_code=-1
case "$3" in
OK) return_code=0 ;; WARNING) return_code=1 ;; CRITICAL) return_code=2 ;; UNKNOWN) return_code=-1 ;; esac # pipe the service check info into the send_nsca program, which # in turn transmits the data to the nsca daemon on the central # monitoring server
/usr/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" | /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg
#########################################
#注意腳本中的central_server爲主監控機的IP ######################################### 6)若保留該詞,則需更改/etc/hosts文件 xx.xx.xx.xx central_server 當客戶端正常工做你能夠看到進程會不斷變化 [root@TJSJHL241-189 nsca-2.7.2]# ps -ef|grep nsca nagios 31772 31770 0 11:37 ? 00:00:00 /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg root 31774 17037 0 11:37 pts/0 00:00:00 grep nsca [root@TJSJHL241-189 nsca-2.7.2]# ps -ef|grep nsca nagios 31786 31784 0 11:37 ? 00:00:00 /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg root 31788 17037 0 11:37 pts/0 00:00:00 grep nsca [root@TJSJHL241-189 nsca-2.7.2]# ps -ef|grep nsca nagios 31792 31790 0 11:37 ? 00:00:00 /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg root 31794 17037 0 11:37 pts/0 00:00:00 grep nsca
本身也能夠作個test文件測試客戶端到服務端的連通性好比test文件內容爲
"rrw-2-1" TestMessage 0 This is a test message. [root@TJSJHL241-189 etc]# /usr/local/nagios/bin/send_nsca central_server -c /usr/local/nagios/etc/send_nsca.cfg < test 0 data packet(s) sent to host successfully. 只要看到sent to host successfully就能夠了 當時一直很苦惱爲何個人成功了但發的倒是0,最後我就先不考慮這問題先去配置下了,後來發現,主監控與分佈式監控的內容是一致的,嘿嘿,若有人知道的話不妨不說,在此謝過了
7)下面觀察下services.cfg的配置
define service{ hostgroup_name rrw-game,rrw-res service_description CPU check_command check_nrpe_cpu contact_groups yunwei check_period 24x7 max_check_attempts 4 normal_check_interval 10 retry_check_interval 60 notifications_enabled 1 notification_options u,c,r check_freshness 1 freshness_threshold 20 } 這是分佈式上的多出的兩選項 check_freshness 1 freshness_threshold 20#此處爲正常檢測間隔的2x
能夠觀察下nagios.cfg
check_freshness 1 此選項黑認是爲1的
分佈式機器差很少了,啓動nagios便可
3、主監控機配置
下載nsca模塊 1)安裝如上相同 make all 以後 cp sample-config/nsca.cfg /usr/local/nagios/etc/ cp src/nsca /usr/local/nagios/bin/
vi /usr/local/nagios/etc/nsca.cfg更改下面配置
nsca_user=nagios nsca_group=nagios password=xxx 此處我就不交由xinetd啥的管理了,自已直接運行 /usr/local/nagios/bin/nsca -d -c /usr/local/nagios/etc/nsca.cfg 2)rc.local添加開機自啓動 /usr/local/nagios/bin/nsca -d -c /usr/local/nagios/etc/nsca.cfg 3)services添加 nsca 5667/tcp # NSCA 4)iptables開放5667端口 iptables -I RH-Firewall-1-INPUT -p tcp -m tcp --dport 5667 -j ACCEPT service iptables save 5)新建一個目錄放置分佈式機器的相關配置文件hosts.cfg hostgroups.cfg與分佈式機器配置一致 services.cfg先拷貝分佈式機器的配置文件,完成後的配置參考以下 define service{ hostgroup_name rrw-game,rrw-res service_description CPU check_command check_nrpe_cpu contact_groups yunwei check_period 24x7 max_check_attempts 4 normal_check_interval 10 retry_check_interval 60 notifications_enabled 1 notification_options u,c,r active_checks_enabled 0 check_freshness 0#注意此處若是爲1,主監控會主動刷新吧,不等分佈式機器遞交信息過來就會刷新設置爲1時會等分佈式機器遞交結果過來再刷新 freshness_threshold 20 passive_checks_enabled 1 # We want only passive checking flap_detection_enabled 0 is_volatile 0 }
6)啓動相關服務nsca nagios
/usr/local/nagios/bin/nsca -d -c /usr/local/nagios/etc/nsca.cfg /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg 4、主監控與分佈式機器 主監控與分佈式機器添加新監控點時兩個監控機器都得同時添加
說明,參照文檔時有提把下面添加到commands.cfg文件,同時在服務裏調用些命令,目前我這邊卻是像沒有使用到,不知道運行一段時間會不會出問題,有待觀察,嘿嘿,看這塊有沒有必要須要,具體請參照下面的文檔吧
define command{ command_name check_dummy command_line $USER1$/check_dummy $ARG1$ } |