nagios

時間 2019-12-08

標籤 nagios 简体版

原文原文鏈接

RPM包源

centos5 32位epel源下載地址： www.lishiming.net/data/attachment/forum/epel-release-5-4_32.noarch.rpm

64位下載地址： www.lishiming.net/data/attachment/forum/epel-release-5-4_64.noarch.rpm

centos6

32位epel yum源下載地址： www.lishiming.net/data/attachment/forum/epel-release-6-8_32.noarch.rpm

64位下載地址： www.lishiming.net/data/attachment/forum/epel-release-6-8_64.noarch.rpm

一、Nagios安裝 - 服務端（192.168.0.11）

Centos6默認的yum源裏沒有nagios相關的rpm包，可是咱們能夠安裝一個epel的擴展源：

rpm -ivh www.lishiming.net/data/attachment/forum/epel-release-6-8_64.noarch.rpm

yum install -y httpd nagios nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

設置登陸nagios後臺的用戶和密碼：htpasswd -c /etc/nagios/passwd nagiosadmin

vim /etc/nagios/nagios.cfg

nagios -v /etc/nagios/nagios.cfg 檢測配置文件

啓動服務：service httpd start; service nagios start

瀏覽器訪問： http://ip/nagios

二、Nagios安裝 - 客戶端（192.168.0.12）

在客戶端機器上 rpm -ivh http://www.aminglinux.com/bbs/data/attachment/forum/month_1211/epel-release-6-7.noarch.rpm

yum install -y nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

vim /etc/nagios/nrpe.cfg 找到「allowed_hosts=127.0.0.1」改成「allowed_hosts=127.0.0.1,192.168.0.11」後面的ip爲服務端ip; 找到」 dont_blame_nrpe=0」改成「dont_blame_nrpe=1」

啓動客戶端 /etc/init.d/nrpe start

三、監控中心（192.168.0.11）添加被監控主機（192.168.0.12）

cd /etc/nagios/conf.d/

vim 192.168.0.12.cfg //加入：

define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variables that are defined
; in (or inherited by) the linux-server host template definition.
host_name 192.168.0.12
alias 0.12
address 192.168.0.12
}
define service{
use generic-service
host_name 192.168.0.12
service_description check_ping
check_command check_ping!100.0,20%!200.0,50%
max_check_attempts 5
normal_check_interval 1
}
define service{
use generic-service
host_name 192.168.0.12
service_description check_ssh
check_command check_ssh
max_check_attempts 5
normal_check_interval 1
}
define service{
use generic-service
host_name 192.168.0.12
service_description check_http
check_command check_http
max_check_attempts 5
normal_check_interval 1
}

四、配置文件的簡單說明

咱們定義的配置文件中一共監控了三個service：ssh, ping, http 這三個項目是使用本地的nagios工具去鏈接遠程機器，也就是說即便客戶端沒有安裝nagios-plugins以及nrpe也是能夠監控到的。其餘的一些service諸如負載、磁盤使用等是須要服務端經過nrpe去鏈接到遠程主機得到信息，因此須要遠程主機安裝nrpe服務以及相應的執行腳本(nagios-plugins)

max_check_attempts 5 #當nagios檢測到問題時，一共嘗試檢測5次都有問題纔會告警，若是該數值爲1，那麼檢測到問題當即告警

normal_check_interval 1#從新檢測的時間間隔，單位是分鐘，默認是3分鐘

notification_interval 60 #在服務出現異常後，故障一直沒有解決，nagios再次對使用者發出通知的時間。單位是分鐘。若是你認爲，全部的事件只須要一次通知就夠了，能夠把這裏的選項設爲0。

五、繼續添加服務

服務端vim /etc/nagios/objects/commands.cfg

增長：define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

繼續編輯 vim /etc/nagios/conf.d/192.168.0.12.cfg

增長以下內容：

define service{
use generic-service
host_name 192.168.0.12
service_description check_load
check_command check_nrpe!check_load
max_check_attempts 5
normal_check_interval 1
}
define service{
use generic-service
host_name 192.168.0.12
service_description check_disk_hda1
check_command check_nrpe!check_hda1
max_check_attempts 5
normal_check_interval 1
}
define service{
use generic-service
host_name 192.168.0.12
service_description check_disk_hda2
check_command check_nrpe!check_hda2
max_check_attempts 5
normal_check_interval 1
}

check_nrpe!check_load ：這裏的check_nrpe就是在commands.cfg剛剛定義的，check_load是遠程主機上的一個檢測腳本

在遠程主機上vim /etc/nagios/nrpe.cfg 搜索check_load，這行就是在服務端上要執行的腳本了，咱們能夠手動執行這個腳本

把check_hda1更改一下：/dev/hda1 改成 /dev/sda1

再加一行command[check_hda2]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda2

客戶端上重啓一下nrpe服務: service nrpe restart

服務端也重啓一下nagios服務: service nagios restart

六、配置圖形顯示 pnp4nagios

（1）安裝

yum install pnp4nagios rrdtool

（2）配置主配置文件

vim /etc/nagios/nagios.cfg //修改以下配置

process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
enable_environment_macros=1

（3）修改commands.cfg

vim /etc/nagios/objects/commands.cfg //註釋掉原有對process-host-perfdata和process-service-perfdata，從新定義

define command {
command_name process-service-perfdata
command_line /usr/bin/perl /usr/libexec/pnp4nagios/process_perfdata.pl
}
define command {
command_name process-host-perfdata
command_line /usr/bin/perl /usr/libexec/pnp4nagios/process_perfdata.pl -d HOSTPERFDATA
}

（4）修改配置文件templates.cfg

vim /etc/nagios/objects/templates.cfg define host {

name hosts-pnp

action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_

process_perf_data 1

}

define service {

name srv-pnp

action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$

process_perf_data 1

}

（5）修改host和service配置

vim /etc/nagios/conf.d/192.168.0.12.cfg

把「define host{

use linux-server」

改成：

define host{

use linux-server,hosts-pnp

修改對應的service，好比

把

define service{

use generic-service

host_name 192.168.0.12

service_description check_disk_hda1

check_command check_nrpe!check_hda1

max_check_attempts 5

normal_check_interval 1

}

改成：

define service{

use generic-service,srv-pnp

host_name 192.168.0.12

service_description check_disk_hda1

check_command check_nrpe!check_hda1

max_check_attempts 5

normal_check_interval 1

}

（6）重啓和啓動各個服務：

service nagios restart

service httpd restart

service npcd start

(7) 訪問測試

兩種訪問方法：

ip/nagios/

ip/pnp4nagios/

七、配置郵件告警

vim /etc/nagios/objects/contacts.cfg //增長：

define contact{
contact_name 123
use generic-contact
alias aming
email lishiming2009@139.com
}
define contact{
contact_name 456
use generic-contact
alias aaa
email aminglinux@139.com
}
define contactgroup{
contactgroup_name common
alias common
members 123,456
}

而後在要須要告警的服務裏面加上contactgroup

define service{
use generic-service
host_name 192.168.0.12
service_description check_load
check_command check_nrpe!check_load
max_check_attempts 5
normal_check_interval 1
contact_groups common
}

八、幾個重要參數說明

notifications_enabled : 是否開啓提醒功能。1爲開啓，0爲禁用。通常，這個選項會在主配置文件（nagios.cfg）中定義，效果相同。

notification_interval: 以前剛介紹過，表示重複發送提醒信息的最短間隔時間。默認間隔時間是60分鐘。若是這個值設置爲0，將不會發送重複提醒。

notification_period: 發送提醒的時間段。很是重要的主機（服務）我定義爲7×24，通常的主機（服務）就定義爲上班時間。若是不在定義的時間段內，不管什麼問題發生，都不會發送提醒。

notification_options: 這個參數定義了發送提醒包括的狀況：d = 狀態爲DOWN, u = 狀態爲UNREACHABLE , r = 狀態恢復爲OK , f = flapping。，n=不發送提醒。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。