在被 MC 上面,使用nagios-plugins提供的插件,得出監數據,將數據發送到 MS 端,MS上面運行的daemon(常見的是nsca,或 nrdp,mod_gearman )用來接收這些數據,按照預約義的格式傳遞給nagios,nagios核心進程將會對數據進行處理(前臺展現,警報)。html
優勢:相比與主動模式,被動模式能很大程度地下降nagios負載python
缺點:當監控的主機規模進一步擴大,會被"外部命令文件"I/O侷限所拖累,(事件代理模式除外)ios
注意:(本文是ubuntu14.04爲基礎環境,建立被動模式的配置,如下分別用術語中定義的簡稱 MS,MC)編程
/etc/nagios3/nagios.cfgubuntu
<pre> check_external_commands = 1 (enable commands file) # 容許檢測結果寫入外部命令文件 command_check_interval = -1 (check the external command file as often as possible) </pre>數據結構
添加模板,修改配置文件 template.cfg,添加以下內容:ssh
<pre> define service{ name passive_service use generic-service max_check_attempts 1 active_checks_enabled 0 #(關閉主動檢測) passive_checks_enabled 1 #(開啓被動檢測) normal_check_interval 5 retry_check_interval 1 check_freshness 1 # (開啓強制刷新) notifications_enabled 1 notification_interval 5 notification_period 24x7 contact_groups admins register 0 #(必須) } </pre>ide
/etc/nagios3/commands.cfgui
<pre> define command { command_name check_dummy command_line /usr/lib/nagios/plugins/check_dummy $ARG1$ $ARG2$ } </pre>加密
check_dummy指令實際上不檢查任何東西,指定兩個參數,一個是狀態,一個是輸出,始終返回這兩個參數。
<pre> # /usr/lib/nagios/plugins/check_dummy 0 successful OK: successful # /usr/lib/nagios/plugins/check_dummy 1 failed WARNING: failed # /usr/lib/nagios/plugins/check_dummy 2 failed CRITICAL: failed # /usr/lib/nagios/plugins/check_dummy 3 failed UNKNOWN: failed </pre>
<pre> define service { use passive_service host_name localhost service_description check_disk_passive freshness_threshold 86400 # 主服務端強制刷新的時間(s) check_command check_dummy!1!"Check failed No return data for 24 hours } </pre>
主機檢測格式: [<timestamp>] PROCESS_HOST_CHECK_RESULT;<host_name>;<host_status>;<plugin_output>
<pre> timestamp: unix時間戳 PROCESS_HOST_CHECK_RESULT 外部命令 host_name: 監控的主機地址 host_status: 主機的狀態( 0 = OK,1 = WARNING,2 =CRITICAL,3 = UNKNOWN) plugin_output: 主機檢查的文本輸出 </pre>
服務檢測格式: [<timestamp>] PROCESS_SERVICE_CHECK_RESULT;<host_name>;<svc_description>;<return_code>;<plugin_output>
<pre> timestamp: unix時間戳 PROCESS_SERVICE_CHECK_RESULT 外部命令 host_name: 監控的主機地址 svc_description: 服務的描述名稱(與nagios服務端配置定義的必須一致) return_code: 服務的狀態( 0 = OK,1 = WARNING,2 =CRITICAL,3 = UNKNOWN) plugin_output: 主機檢查的文本輸出 如: </pre>
外部應用程序能夠經過向外部命令文件寫入檢測結果:
<pre> # echo "[`date +%s`] PROCESS_HOST_CHECK_RESULT;mc_hostname;0;ping is ok" >> /var/lib/nagios3/rw/nagios.cmd # echo "[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;mc_hostname;check_ssh_passive;0;test is ok " >> /var/lib/nagios3/rw/nagios.cmd </pre>
外部應用程序能夠向 Nagios內核的spool目錄寫入存放檢測結果的文件(ubuntu14.04 nagios 默認配置是 "/var/lib/nagios3/spool/checkresults"):
主機檢測結果文件:cpjCd4i ### NRDP Check ### start_time=1415871546.0 # Time: Thu, 13 Nov 2014 09:39:06 +0000 host_name=cdn-gc-dongguan1 check_type=1 early_timeout=1 exited_ok=1 return_code=0 output=Everything looks okay!|perfdata\n
服務檢測結果文件:clNVC3S ### NRDP Check ### start_time=1415871546.0 # Time: Thu, 13 Nov 2014 09:39:06 +0000 host_name=cdn-gc-dongguan1 service_description=SSH check_type=1 early_timeout=1 exited_ok=1 return_code=1 output=WARNING: Danger Will Robinson!|perfdata\n
當nagios加載 broker_module=/usr/lib/mod_gearman/mod_gearman.o 後,會在gearman-job-server中建立一個名爲check_results 的任務隊列,client請求任務,worker 處理後,最終將結果寫入check_results 的任務隊列,mod_gearman.o 取回結果傳遞給nagios核心進程,完成一次任務的分發處理。
以python_gearman 編程接口爲例,(這貨貌似很久沒有更新了,也不支持加密傳輸)
#!/usr/bin/env python import gearman gm_client = gearman.GearmanClient(['localhost:4730'] ) completed_job_request = gm_client.submit_job("check_results", "type=passive host_name=cdn-tx-wuhan-ctc3 core_start_time=1415807730.0 start_time=1415807730.19546 finish_time=1415807730.105364 return_code=2 exited_ok=0 output=PING_OK\n")
mod_gearman C源碼片段
gearman_util.c
提交結果的主要部分: add_job_to_queue(...) { task = gearman_client_add_task_low_background( client, NULL, NULL, queue, uniq, ( void * )crypted_data, ( size_t )size, &ret1 ); gearman_task_give_workload(task,crypted_data,size); }