##目標 配置分佈式的icinga2監控系統。分佈式監控適用於服務器遍及在多個區域,且須要一個master作統一管理。php
##搭建環境 ###服務器 系統: ubuntu 15.04/ubuntu 14.04node
icinga2主節點: 192.168.19.77 負責分發配置,統一展現監控結果。mysql
icinga2子節點1: 192.168.19.45 負責監控openstack RegionOne區域的全部服務器linux
icinga2子節點2: 192.168.19.30 負責將空openstack RegionTwo區域的全部服務器ios
爲了充分利用現有的nagios插件,使用nrpe監控服務器。 ###拓撲圖 web
##安裝配置icinga2 注意: 若是特別說明,全部操做均在192.168.19.77,45,30執行。sql
###設置包源ubuntu
# add-apt-repository ppa:formorer/icinga # apt-get update
###安裝icinga2vim
# apt-get install icinga2
##安裝classicui界面 在192.168.77上執行windows
apt-get install icinga2-classicui -y
##配置分佈式監控 ###設置ssl 在192.168.77上執行
####生成ca證書
#icinga2 pki new-ca
####生成各個節點須要用的key,crt #####設置節點名 key與crt的名字須要與節點名吻合,默認使用hostname。若是要自定義節點名須要編輯/etc/icinga2/constants.conf
,修改以下配置
const NodeName = "node-master"
node-master是新的節點名
#####生成key,crt
# cd /tmp ##192.168.19.77 # icinga2 pki new-cert --cn node-master --key node-master.key --csr node-master.csr # icinga2 pki sign-csr --csr node-master.csr --cert node-master.crt ##192.168.19.45 # icinga2 pki new-cert --cn node-45 --key node-45.key --csr node-45.csr # icinga2 pki sign-csr --csr node-45.csr --cert node-45.crt ##192.168.19.30 # icinga2 pki new-cert --cn node-30 --key node-30.key --csr node-30.csr # icinga2 pki sign-csr --csr node-30.csr --cert node-30.crt
分佈複製ca.crt,<node-name>.crt,<node-name>.key到3臺服務器的/etc/icinga2/pki/
目錄。下面顯示的是192.168.19.77的pki目錄
# ll /etc/icinga2/pki/ total 20 drwxr-xr-x 2 root root 4096 May 11 17:19 ./ drwxr-x--- 9 nagios nagios 4096 May 13 12:09 ../ -rw-rw-rw- 1 root root 1688 May 11 15:25 ca.crt -rw-rw-rw- 1 root root 1663 May 11 15:28 node-master.crt -rw-rw-rw- 1 root root 3243 May 11 15:26 node-master.key
#####啓用api功能
# icinga2 feature enable api
添加accept_config = true
, accept_commands = true
參數
# vim /etc/icinga2/features-enabled/api.conf /** * The API listener is used for distributed monitoring setups. */ object ApiListener "api" { cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt" key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key" ca_path = SysconfDir + "/icinga2/pki/ca.crt" ticket_salt = TicketSalt accept_config = true accept_commands = true }
# service icinga2 restart
###配置endopoint,zone EndPoint 須要與NodeName一致。
# vim /etc/icinga2/zones.conf object Endpoint "node-master" { host = "192.168.19.77" } object Endpoint "node-45" { host = "192.168.19.45" } object Endpoint "node-30" { host = "192.168.19.30" } object Zone "zone-master" { endpoints = [ "node-master" ] } object Zone "zone-45" { parent = "zone-master" endpoints = [ "node-45" ] } object Zone "zone-30" { parent = "zone-master" endpoints = [ "node-30" ] } object Zone "global-templates" { global = true }
這裏配置了4個區域,分別是zone-master,zone-45,zone-30,global-templates。global-templates區域中的配置會分發到全部的區域。
###配置文件管理 ####建立配置目錄,目錄名要與區域名一致。 在192.168.77上執行
# mkdir /etc/icinga2/zones.d/global-templates/ # mkdir /etc/icinga2/zones.d/zone-30/ # mkdir /etc/icinga2/zones.d/zone-45/ # mkdir /etc/icinga2/zones.d/zone-master/
####將/etc/icinga2/conf.d中的文件複製到/etc/icinga2/zones.d/global-templates/
# cp -rf /etc/icinga2/conf.d/* /etc/icinga2/zones.d/global-templates/
####註釋掉3臺服務器上conf.d目錄
# vim /etc/icinga2/icinga2.conf #include_recursive "conf.d"
####配置master監控兩個子節點
object Host NodeName { import "generic-host" address = "127.0.0.1" vars.os = "Linux" vars.disks["disk"] = { } vars.notification["mail"] = { groups = [ "icingaadmins" ] } /*NIC dell2950 */ vars.interfaces["eth0"]={ interface_speed = 100 } } object Host "node-45" { import "generic-host" address = "192.168.19.45" vars.os = "Linux" vars.disks["disk"] = { } vars.notification["mail"] = { groups = [ "icingaadmins" ] } /*NIC*/ vars.interfaces["em1"]={ interface_speed = 1000 } /* openstack */ vars.openstack_controller_listen="192.168.19.45" vars.openstack["keystone"]="controller" vars.openstack["cinder"]="controller" vars.openstack["glance"]="controller" vars.openstack["heat"]="controller" vars.openstack["nova"]="controller" } object Host "node-30" { import "generic-host" address = "192.168.19.30" vars.os = "Linux" vars.disks["disk"] = { } vars.notification["mail"] = { groups = [ "icingaadmins" ] } /*NIC*/ vars.interfaces["em1"]={ interface_speed = 1000 } }
####清空緩存 # rm -rf /var/lib/icinga2/api/zones/*
####重啓icinga2,查看結果
# service icinga2 restart
打開http://192.168.19.77/icinga2-classicui/
配置正確的話,應該能夠看到77,45,39服務器的監控信息
##使用SNMP監控網絡 ###在icinga2服務端安裝監控插件 參考: http://www.tontonitch.com/tiki/tiki-index.php?page=Nagios+plugins+-+interfacetable_v3t+-+documentation+-+0.05+-+Installation&structure=Nagios+plugins+-+interfacetable_v3t+-+documentation+-+0.05
安裝依賴
apt-get install snmpd sudo perl libconfig-general-perl libdata-dumper-simple-perl libsort-naturally-perl libexception-class-perl libencode-perl
文檔裏漏掉下面幾個包:
apt-get install libdata-dumper-simple-perl libsys-statistics-linux-perl libnet-snmp-perl
編譯安裝snmp網絡檢測程序
從http://www.tontonitch.com/tiki/tiki-index.php?page=Nagios+plugins+-+interfacetable_v3t
下載最新的包,解壓進入目錄後:
./configure make fullinstall
測試
/usr/local/nagios/libexec/check_interface_table_v3t.pl -V
啓用icinga2監控腳本
cp /usr/share/icinga2/include/plugins-contrib.d/network-components.conf /etc/icinga2/zones.d/global-templates/
注意:
修改其中的command
爲正確腳本路徑.
修改-C
爲正確的值,默認是public
.
應用檢測命令到主機上
# vim services.conf apply Service "network"{ import "generic-service" display_name = "check network" check_command = "interfacetable" assign where host.vars.os == "Linux" }
###被監控端安裝配置SNMP服務
apt-get install snmpd snmp
修改監聽地址,鏈接串,且只容許icinga2服務端鏈接
# vim /etc/snmp/snmpd.conf agentAddress udp:10.33.10.66:161 rocommunity NmQwZmVmZT 10.33.10.84 #註釋掉這行 #rocommunity public default -V systemonly #註釋這行,它會監聽一個隨機端口 #trapsink localhost public
重啓snmpd服務 service snmpd restart
##使用nrpe監控服務器 ###源碼安裝 不用apt的緣由是apt安裝的nrpe不接受監控命令傳參。
####建立nagios用戶
useradd nagios -M -s /bin/false
####編譯安裝
apt-get install build-essential libssl-dev libssl0.9.8 unzip make -y cd /usr/local/src wget http://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.15/nrpe-2.15.tar.gz/download -O nrpe-2.15.tar.gz tar -xf nrpe-2.15.tar.gz cd nrpe-2.15 ./configure --enable-command-args --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/x86_64-linux-gnu make all make install-daemon
####安裝通用插件 apt-get install nagios-plugins -y
####添加配置文件
mkdir /etc/nagios vim /etc/nagios/nrpe.cfg log_facility=daemon pid_file=/var/run/nrpe.pid server_address=172.16.240.30 server_port=5666 nrpe_user=nagios nrpe_group=nagios allowed_hosts=127.0.0.1,172.16.240.30 dont_blame_nrpe=1 debug=0 command_timeout=60 connection_timeout=300 command[check_load]=/usr/lib/nagios/libexec/check_load -w 20,15,10 -c 50,40,30 command[check_disk]=/usr/lib/nagios/libexec/check_disk -w 10% -c 5% -W 10% -K 5% -A command[check_mem]=/usr/lib/nagios/libexec/check_mem -u -C -w 85 -c 90 command[check_proc_num]=/usr/lib/nagios/libexec/check_procs -m PROCS -w 1500:1000 -c 300000:1500 command[check_zombie_procs]=/usr/lib/nagios/libexec/check_procs -w 5 -c 10 -s Z command[check_swap]=/usr/lib/nagios/libexec/check_swap -a -w 30% -c 15% command[check_local_port]=/usr/lib/nagios/libexec/check_tcp -H localhost -p $ARG1$ -w 2 -c 5 command[check_linux_raid]=/usr/lib/nagios/libexec/check_linux_raid command[check_md_raid]=/usr/lib/nagios/libexec/check_md_raid command[check_icmp]=/usr/lib/nagios/libexec/check_icmp $ARG1$ command[check_lvs]=/usr/lib/nagios/libexec/check_ipvsadm command[check_backup]=/usr/lib/nagios/libexec/check_backup command[check_dns]=/usr/lib/nagios/libexec/check_dns -H www.baidu.com command[check_http]=/usr/lib/nagios/libexec/check_http command[check_rsyncd]=/usr/lib/nagios/libexec/check_rsyncd command[check_monitorwebbackup]=/usr/lib/nagios/libexec/check_monitorwebbackup command[check_monitormysqlbackup]=/usr/lib/nagios/libexec/check_monitormysqlbackup command[check_diskhealth]=/usr/lib/nagios/libexec/check_openmanage --check storage -b ctrl_fw=all/ctrl_driver=all/ctrl_stdr=all/bat_charge=all/encl=all/ps=all -t 30 command[check_safe]=/usr/lib/nagios/libexec/check_safe $ARG1$ command[check_tcptraffic]=/usr/lib/nagios/plugins/contrib/check_tcptraffic -w $ARG1$ -c $ARG2$ -s $ARG3$ -i $ARG4$ command[check_iostat_io]=/usr/lib/nagios/libexec/check_iostat -d $ARG1$ -i -p command[check_iostat_waittime]=/usr/lib/nagios/libexec/check_iostat -d $ARG1$ -W -p command[check_keystone_api]=/usr/lib/nagios/libexec/check_keystone --auth_url "http://$ARG1$:35357/v2.0" --username $ARG2$ --tenant $ARG3$ --password $ARG4$ command[check_keystone_proc]=/usr/lib/nagios/plugins/check_procs -c 1: -u keystone command[check_cinder_api_proc]=/usr/lib/nagios/libexec/check_service.sh -o linux -s cinder-api command[check_cinder_scheduler_proc]=/usr/lib/nagios/libexec/check_service.sh -o linux -s cinder-scheduler command[check_cinder_api]=/usr/lib/nagios/libexec/check_cinder_api --auth_url "http://$ARG1$:35357/v2.0" --username $ARG2$ --tenant $ARG3$ --password $ARG4$ command[check_cinder_scheduler_connectivity]=/usr/lib/nagios/libexec/check_cinder-scheduler.sh
####在rc.local加入啓動命令
# vim /etc/rc.local /usr/local/nagios/bin/nrpe -c /etc/nagios/nrpe.cfg -d
####從其餘服務器拷貝libexec文件夾 例如: 221.228.84.84(無錫三線)
文件夾位置:/usr/lib/nagios/libexec
注意: 須要將check_icmp
插件設置爲root用戶,而後設置setuid,否則沒權限執行命令:
chown root.root /usr/lib/nagios/libexec/check_icmp chmod u+s /usr/lib/nagios/libexec/check_icmp chmod g+s /usr/lib/nagios/libexec/check_icmp
####安裝check_tcptraffic插件 apt-get install libnagios-plugin-perl libreadonly-xs-perl -y
在mac上解壓後用zip從新打包上傳到ubuntu,否則解壓會出錯
進入源碼目錄執行下面命令
perl Makefile.PL make make install
測試
/usr/lib/nagios/plugins/contrib/check_tcptraffic -w 10485760 -c 10485760 -s 1000 -i em1 /usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_tcptraffic -p 5666 -a 10485760 20971520 1000 em1
####安裝dell磁盤監控
設置軟件源
echo 'deb http://linux.dell.com/repo/community/ubuntu trusty openmanage' | sudo tee -a /etc/apt/sources.list.d/linux.dell.com.sources.list gpg --keyserver pool.sks-keyservers.net --recv-key 1285491434D8786F gpg -a --export 1285491434D8786F | sudo apt-key add -
安裝
apt-get update apt-get install srvadmin-all
啓動服務
service dataeng start
####安裝iostat監控
安裝iostat apt-get install sysstat
(若是已有check_iostat
腳本則忽略) 下載監控腳本 到 /usr/lib/nagios/libexec/check_iostat
,添加執行權限,所屬用戶和組改成nagios
注意:要下載評論裏的腳本,原始腳本有問題。
記得在icinga2的host裏配置里加入:
#check disk io and waittime vars.disk_device = "sdb"
修改sdb
爲須要監控的硬盤
####啓動nrpe
# /usr/local/nagios/bin/nrpe -c /etc/nagios/nrpe.cfg -d
##在icinga2中啓用nrpe ###配置LibxecDir常量 根據系統取消選擇相應的註釋:
# vim /etc/icinga2/constants.conf //ubuntu //const LibexecDir = "/usr/lib/nagios/libexec" //youfu centos //const LibexecDir = "/usr/local/nagios/libexec"
###配置checkcommand模板
# vim /etc/icinga2/zones.d/global-templates/nrpe_base.conf template CheckCommand "nrpe-common" { import "plugin-check-command" command = [ LibexecDir + "/check_nrpe" ] arguments = { "-H" = "$nrpe_address$" "-t" = "$nrpe_timeout$" "-p" = "$nrpe_port$" "-c" = "$nrpe_command$" "-a" = { value = "$nrpe_args$" repeat_key = false order = 1 } } vars.nrpe_address = "$address$" vars.nrpe_port = 5666 vars.nrpe_timeout = 60 }
###配置經常使用linux檢測命令
# vim /etc/icinga2/zones.d/global-templates/nrpe_linux.conf object CheckCommand "nrpe-disk" { import "nrpe-common" #vars.nrpe_args = [ "$disk_wfree$%", "$disk_cfree$%"] vars.nrpe_command = "check_disk" #vars.disk_wfree = 20 #vars.disk_cfree = 10 } object CheckCommand "nrpe-diskhealth" { import "nrpe-common" vars.nrpe_command = "check_diskhealth" } object CheckCommand "nrpe-tcptraffic" { import "nrpe-common" vars.nrpe_args = [ "$tcptraffic_wbytes$", "$tcptraffic_cbytes$","$interface_speed$","$interface_name$"] vars.nrpe_command = "check_tcptraffic" vars.tcptraffic_wbytes = 10485760 /*10M=10*1024*1024*/ vars.tcptraffic_cbytes = 20971520 /*20M=20*1024*1024*/ #vars.tcptraffic_wbytes = 1 /*10M=10*1024*1024*/ #vars.tcptraffic_cbytes = 2 /*20M=20*1024*1024*/ } object CheckCommand "nrpe-load" { import "nrpe-common" vars.nrpe_command = "check_load" } object CheckCommand "nrpe-mem" { import "nrpe-common" vars.nrpe_command = "check_mem" } object CheckCommand "nrpe-proc_num" { import "nrpe-common" vars.nrpe_command = "check_proc_num" } object CheckCommand "nrpe-zombie_procs" { import "nrpe-common" vars.nrpe_command = "check_zombie_procs" } object CheckCommand "nrpe-swap" { import "nrpe-common" vars.nrpe_command = "check_swap" } object CheckCommand "nrpe-dns" { import "nrpe-common" vars.nrpe_command = "check_dns" } object CheckCommand "nrpe-safe" { import "nrpe-common" vars.nrpe_command = "check_safe" } object CheckCommand "nrpe-iostat_io" { import "nrpe-common" vars.nrpe_command = "check_iostat_io" vars.nrpe_args = [ "$disk_device$" ] } object CheckCommand "nrpe-iostat_waittime" { import "nrpe-common" vars.nrpe_command = "check_iostat_waittime" vars.nrpe_args = [ "$disk_device$" ] } apply Service "check_nrpe:" for (disk_nrpe_linux => config in host.vars.disks) { import "generic-service" display_name = "Check Nrpe:" + disk_nrpe_linux check_command = "nrpe-disk" vars += config assign where host.vars.os == "Linux" } #Storage Error! No controllers found on ubuntu15.04 dell r720 apply Service "check_nrpe:diskhealth"{ import "generic-service" display_name = "Check Nrpe: Diskhealth" check_command = "nrpe-diskhealth" assign where host.vars.os == "Linux" } apply Service "check_nrpe:tcptraffic" for (interface_name =>interface_info in host.vars.interfaces){ import "generic-service" display_name = "Check Nrpe: Tcptraffic "+ interface_name check_command = "nrpe-tcptraffic" vars.interface_name = interface_name vars+=interface_info assign where host.vars.interfaces && host.vars.os == "Linux" } apply Service "check_nrpe:load"{ import "generic-service" display_name = "Check Nrpe: Load" check_command = "nrpe-load" assign where host.vars.os == "Linux" } apply Service "check_nrpe:mem"{ import "generic-service" display_name = "Check Nrpe: Mem" check_command = "nrpe-mem" assign where host.vars.os == "Linux" } apply Service "check_nrpe:proc_num"{ import "generic-service" display_name = "Check Nrpe: proc_num" check_command = "nrpe-proc_num" assign where host.vars.os == "Linux" } apply Service "check_nrpe:zombie_procs"{ import "generic-service" display_name = "Check Nrpe: zombie_procs" check_command = "nrpe-zombie_procs" assign where host.vars.os == "Linux" } apply Service "check_nrpe:swap"{ import "generic-service" display_name = "Check Nrpe: swap" check_command = "nrpe-swap" assign where host.vars.os == "Linux" } apply Service "check_nrpe:dns"{ import "generic-service" display_name = "Check Nrpe: dns" check_command = "nrpe-dns" assign where host.vars.os == "Linux" } apply Service "check_nrpe:safe"{ import "generic-service" display_name = "Check Nrpe: safe" check_command = "nrpe-safe" assign where host.vars.os == "Linux" } apply Service "check_nrpe:iostat_io"{ import "generic-service" display_name = "Check Nrpe: iostat_io" check_command = "nrpe-iostat_io" vars.disk_device = host.vars.disk_device assign where host.vars.os == "Linux" } apply Service "check_nrpe:iostat_waittime"{ import "generic-service" display_name = "Check Nrpe: iostat_waittime" check_command = "nrpe-iostat_waittime" vars.disk_device = host.vars.disk_device assign where host.vars.os == "Linux" }
###配置經常使用window檢測命令
# vim /etc/icinga2/zones.d/global-templates/nrpe_windows.conf object CheckCommand "windows-nrpe-cpu" { import "nrpe-common" vars.nrpe_args = [] vars.nrpe_command = "alias_cpu" } object CheckCommand "windows-nrpe-disk" { import "nrpe-common" vars.nrpe_args = [] vars.nrpe_command = "alias_disk" } object CheckCommand "windows-nrpe-uptime" { import "nrpe-common" vars.nrpe_args = [] vars.nrpe_command = "uptime" } object CheckCommand "windows-nrpe-mem" { import "nrpe-common" vars.nrpe_args = [] vars.nrpe_command = "alias_mem" } apply Service "windows_check_nrpe:cpu"{ import "generic-service" display_name = "Windows-Check Nrpe: CPU" check_command = "windows-nrpe-cpu" assign where host.vars.os == "Windows" } apply Service "windows_check_nrpe:disk"{ import "generic-service" display_name = "Windows-Check Nrpe: Disk" check_command = "windows-nrpe-disk" assign where host.vars.os == "Windows" } apply Service "windows_check_nrpe:uptime"{ import "generic-service" display_name = "Windows-Check Nrpe: Uptime" check_command = "windows-nrpe-uptime" assign where host.vars.os == "Windows" } apply Service "windows_check_nrpe:Mem"{ import "generic-service" display_name = "Windows-Check Nrpe: Mem" check_command = "windows-nrpe-mem" assign where host.vars.os == "Windows" }
###重載icinga2 在reload後,master會自動分發配置給兩個子節點,從而作到統一配置,統一分發。
# service icinga2 reload
url: /icinga2-classicui/
默認用戶: icingaadmin
###排錯
若是配置分佈式節點後發現沒法監控,或者global-templates配置沒有生效,能夠刪除/var/lib/icinga2/api/zones
下的文件,而後重啓icinga2.
CHECK_NRPE: Error - Could not complete SSL handshake.: 查看·nrpe.cfg·的allowed_hosts
配置項是否正確
check_iostat_*
沒有數據:OK - I/O stats: Transfers/Sec= Read Requests/Sec= Write Requests/Sec= KBytes Read/Sec= KBytes_Written/Sec=
,sysstat
包沒有安裝。