最近在研究雲監控的相關工具,以前寫過Ganglia的安裝步驟,這回來記錄下Nagios的安裝步驟。
php
本文不講解相關原理,若想了解請參考其餘資料.html
本文目的: 即便以前未觸過nagios,也能按照文中步驟搭建本身的nagios監控集羣.java
@Author duangr node
@Website http://my.oschina.net/duangr/blog/183160mysql
Nagios是一個可運行在Linux/Unix平臺之上的開源監視系統,能夠用來監視系統運行狀態和網絡信息。Nagios能夠監視所指定的本地或遠程主機以及服務,同時提供異常通知功能。在系統或服務狀態異常時發出郵件或短信報警第一時間通知網站運維人員,在狀態恢復後發出正常的郵件或短信通知。
linux
Host Name | IP | OS |
Arch |
duangr-1 | 192.168.56.10 | CentOS 6.4 | x86_64 |
duangr-2 | 192.168.56.11 |
CentOS 6.4 |
x86_64 |
duangr-3 | 192.168.56.12 |
CentOS 6.4 |
x86_64 |
項 | 值 |
監控服務主節點(Master) |
duangr-1 |
被監控從節點(Slave) | duangr-2, duangr-3 |
Nagios主節點須要安裝:ios
nagiosweb
nagios-pluginredis
nrpesql
php
apache
Nagios從節點須要安裝:
nagios-plugin
nrpe
安裝路徑規劃
項 | 值 |
nagios安裝路徑 |
/usr/local/nagios |
php安裝路徑 | /usr/local/php |
apache安裝路徑 | /usr/local/apache2 |
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel gcc-4.4.7-3.el6.x86_64 glibc-2.14.1-6.x86_64 glibc-common-2.14.1-6.x86_64 gd-2.0.35-11.el6.x86_64 package gd-devel is not installed package xinetd is not installed openssl-devel-1.0.0-27.el6.x86_64
如有缺失,請先安裝. 可經過以下幾個鏡像網站下載相關安裝包:
http://rpm.pbone.net/
http://mirrors.163.com/centos/6.4/os/x86_64/Packages/
http://mirrors.sohu.com/centos/6.4/os/x86_64/Packages/
安裝後再次檢查以下:
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel gcc-4.4.7-3.el6.x86_64 glibc-2.14.1-6.x86_64 glibc-common-2.14.1-6.x86_64 gd-2.0.35-11.el6.x86_64 gd-devel-2.0.35-11.el6.x86_64 xinetd-2.3.14-38.el6.x86_64 openssl-devel-1.0.0-27.el6.x86_64
useradd nagios -d /usr/local/nagios passwd nagios (密碼自定義)
tar -zxf nagios-4.0.2.tar.gz cd nagios-4.0.2 ./configure --prefix=/usr/local/nagios make all make install && make install-init && make install-commandmode && make install-config
將nagios添加爲服務
chkconfig --add nagios chkconfig nagios off chkconfig --level 35 nagios on chkconfig --list nagios nagios 0:關閉 1:關閉 2:關閉 3:啓用 4:關閉 5:啓用 6:關閉
tar -zxf nagios-plugins-1.5.tar.gz cd nagios-plugins-1.5 ./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios make && make install
若是出現mysql相關的編譯錯誤,是mysql的默認安裝路徑被修改致使的,調整with-mysql後從新make
./configure --prefix=/usr/local/nagios --with-mysql=/usr/local/mysql make && make install
tar -zxf nrpe-2.15.tar.gz cd nrpe-2.15 ./configure --enable-command-args make all make install-plugin
下面步驟只須要在被監控節點執行
make install-daemon && make install-daemon-config && make install-xinetd
若是是被監控節點,須要配置NRPE已守護進程運行(經過xinetd來運行)
一、更改/etc/xinetd.d/nrpe文件,設置容許nagios主節點服務器鏈接
vi /etc/xinetd.d/nrpe only_from = 127.0.0.1 192.168.56.10
二、在/etc/services結尾增長:
nrpe 5666/tcp # NRPE
三、增長對參數的支持
vi /usr/local/nagios/etc/nrpe.cfg dont_blame_nrpe=1
四、啓動xinetd
service xinetd restart
五、驗證nrpe是否監聽
netstat -at | grep nrpe
六、測試nrpe是否正常運行
/usr/local/nagios/libexec/check_nrpe -H localhost NRPE v2.15
若是是監控服務主節點,在所有被監控節點NRPE配置完成後,能夠依次作下檢測
/usr/local/nagios/libexec/check_nrpe -H 192.168.56.11 NRPE v2.15 /usr/local/nagios/libexec/check_nrpe -H 192.168.56.12 NRPE v2.15
tar -zxf httpd-2.2.23.tar.gz cd httpd-2.2.23 ./configure --prefix=/usr/local/apache2 make && make install
cd /export/home/tools/soft/php tar -zxf php-5.4.10.tar.gz cd /php-5.4.10 ./configure --prefix=/usr/local/php --with-apxs2=/usr/local/apache2/bin/apxs make && make install
vi /usr/local/apache2/conf/httpd.conf
.... Listen 80 .... <IfModule dir_module> DirectoryIndex index.html index.php AddType application/x-httpd-php .php </IfModule> .... #setting for nagios ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin" <Directory "/usr/local/nagios/sbin"> AuthType Basic Options ExecCGI AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthUserFile /usr/local/nagios/etc/htpasswd Require valid-user </Directory> Alias /nagios "/usr/local/nagios/share" <Directory "/usr/local/nagios/share"> AuthType Basic Options None AllowOverride None Order allow,deny Allow from all AuthName "nagios Access" AuthUserFile /usr/local/nagios/etc/htpasswd Require valid-user </Directory>
爲web訪問時添加用戶名和密碼(此處用戶名爲admin,可自定義)
/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd admin
啓動apache
/usr/local/apache2/bin/apachectl start
訪問頁面:
# su - nagios $ vi /usr/local/nagios/etc/nrpe.cfg
修改成以下配置內容:
command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$ command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$ command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ command[check_procs_args]=/usr/local/nagios/libexec/check_procs $ARG1$ command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$
以上監控命令功能:
check_users 監控登錄用戶數
check_load 監控CPU負載
check_disk 監控磁盤的使用
check_procs 監控進程數量,狀態包括 RSZDT
check_swap 監控SWAP分區使用
配置完上述命令後,重啓 xinetd服務
service xinetd restart
檢查監控命令配置是否ok
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users -a 5 10 /usr/local/nagios/libexec/check_nrpe -H localhost -c check_load -a 15,10,5 30,25,20 /usr/local/nagios/libexec/check_nrpe -H localhost -c check_disk -a 20% 10% / /usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT /usr/local/nagios/libexec/check_nrpe -H localhost -c check_swap -a 20% 10%
(使用 nagios 用戶)
vi /usr/local/nagios/etc/cgi.cfg
修改以下內容,爲admin用戶增長權限:
default_user_name=admin authorized_for_system_information=nagiosadmin,admin authorized_for_configuration_information=nagiosadmin,admin authorized_for_system_commands=nagiosadmin,admin authorized_for_all_services=nagiosadmin,admin authorized_for_all_hosts=nagiosadmin,admin authorized_for_all_service_commands=nagiosadmin,admin authorized_for_all_host_commands=nagiosadmin,admin
(使用 nagios 用戶)
vi /usr/local/nagios/etc/nagios.cfg
#cfg_file=/export/home/nagios/etc/objects/localhost.cfg (註釋掉) cfg_dir=/export/home/nagios/etc/servers
主配置文件聲明瞭監控腳本的存儲路徑爲 ./servers, 默認沒有此目錄,須要手工建立
nagios 會讀取 servers 目錄下面後綴爲.cfg的所有文件做爲配置文件
cd /usr/local/nagios/etc mkdir servers cd servers
聲明一個監控的主機組,將主機環境中提到的三臺主機所有加入監控
vi /export/home/nagios/etc/servers/group.cfg
新文件,內容以下:
define hostgroup{ hostgroup_name duangr-server alias duangr Server members duangr-1,duangr-2,duangr-3 }
解釋下上面的配置:
hostgroup_name: 主機組的名稱,可隨意指定
alias: 主機組別名,可隨意指定
members: 主機組成員,多個主機名稱以前使用逗號分隔.另外主機名稱必須與 define host 中host_name 一致.
主機的定義,後面會說到.
下面開始定義具體的主機
先定義本地主機 duangr-1
vi /export/home/nagios/etc/servers/duangr-1.cfg
新文件,內容以下:
define host{ use linux-server host_name duangr-1 alias duangr-1 address 192.168.56.10 } define service{ use local-service host_name duangr-1 service_description Host Alive check_command check-host-alive } define service{ use local-service host_name duangr-1 service_description Users check_command check_local_users!20!50 } define service{ use local-service host_name duangr-1 service_description CPU check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 } define service{ use local-service host_name duangr-1 service_description Disk Root check_command check_local_disk!20%!10%!/ } define service{ use local-service host_name duangr-1 service_description Disk Home check_command check_local_disk!20%!10%!/export/home } define service{ use local-service host_name duangr-1 service_description Zombie Procs check_command check_local_procs!5!10!Z } define service{ use local-service host_name duangr-1 service_description Total Procs check_command check_local_procs!250!400!RSZDT } define service{ use local-service host_name duangr-1 service_description Swap Usage check_command check_local_swap!20!10 }
說明下,因爲是此主機也是監控服務主節點所在主機,所以可使用check_local_* 的相關命令來進行監控.
這個文件中已經將經常使用的監控項配置進去.
再定義遠程主機duangr-2和duangr-3
定義遠程主機的監控以前,須要先定義check_nrpe命令
vi /usr/local/nagios/etc/objects/commands.cfg
在文件的最後面添加以下內容:
# 'check_nrpe' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ } define command{ command_name check_nrpe_args command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$ }
定義duangr-2主機的監控配置
$ vi /usr/local/nagios/etc/servers/duangr-2.cfg
新文件,內容以下:
define host{ use linux-server host_name duangr-2 alias duangr-2 address 192.168.56.11 } define service{ use local-service host_name duangr-2 service_description Host Alive check_command check-host-alive } define service{ use local-service host_name duangr-2 service_description Users check_command check_nrpe_args!check_users!5 10 } define service{ use local-service host_name duangr-2 service_description CPU check_command check_nrpe_args!check_load!15,10,5 30,25,20 } define service{ use local-service host_name duangr-2 service_description Disk Root check_command check_nrpe_args!check_disk!20% 10% / } define service{ use local-service host_name duangr-2 service_description Disk /export/home check_command check_nrpe_args!check_disk!20% 10% /export/home } define service{ use local-service host_name duangr-2 service_description Procs Zombie check_command check_nrpe_args!check_procs!5 10 Z } define service{ use local-service host_name duangr-2 service_description Procs Total check_command check_nrpe_args!check_procs_args!"-w400 -c600" } define service{ use local-service host_name duangr-2 service_description Swap Usage check_command check_nrpe_args!check_swap!20% 10% } ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; 下面是一些經常使用進程的監控,主要是雲平臺相關進程 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; 監控crond進程 define service{ use local-service host_name duangr-2 service_description PS: crond check_command check_nrpe_args!check_procs_args!"-c1:1 -Ccrond" } ;; 監控zookeeper進程 define service{ use local-service host_name duangr-2 service_description PS: QuorumPeerMain check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.quorum.QuorumPeerMain" } ;;監控storm的從節點進程 define service{ use local-service host_name duangr-2 service_description PS: supervisor check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.supervisor" } ;; 監控storm的主節點進程 define service{ use local-service host_name duangr-2 service_description PS: nimbus check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.nimbus" } ;; 監控MetaQ進程 define service{ use local-service host_name duangr-2 service_description PS: MetaQ check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -ametamorphosis-server-w" } ;; 監控Redis進程 define service{ use local-service host_name duangr-2 service_description PS: redis-server check_command check_nrpe_args!check_procs_args!"-c1:1 -Credis-server" } ;; 監控hadoop主節點NameNode進程 define service{ use local-service host_name duangr-2 service_description PS: NameNode check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.NameNode" } ;; 監控hadoop主節點SecondaryNameNode進程 define service{ use local-service host_name duangr-2 service_description PS: SecondaryNameNode check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.SecondaryNameNode" } ;; 監控hadoop主節點ResourceManager進程 define service{ use local-service host_name duangr-2 service_description PS: ResourceManager check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.resourcemanager.ResourceManager" } ;; 監控hadoop從節點DataNode進程 define service{ use local-service host_name duangr-2 service_description PS: DataNode check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.datanode.DataNode" } ;;監控hadoop從節點NodeManager進程 define service{ use local-service host_name duangr-2 service_description PS: NodeManager check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.nodemanager.NodeManager" }
說明下,因爲duangr-2是遠程主機,所以使用check_nrpe_args命令來監控.
這個文件中已經將經常使用的監控項配置進去, 同時還包含了hadoop、storm、zookeeper、metaq、redis的相關進程監控,主要的監控思路是判斷進程是否存在。
定義duangr-3主機的監控配置
vi duangr-3.cfg
內容與duangr-2.cfg相似,只須要修改 host_name 、alias、 address便可.
定義監控人郵件地址
vi /usr/local/nagios/etc/objects/contacts.cfg
define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined above) alias Nagios Admin ; Full name of user email yourname@domain.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** }
除了配置監控郵件的接收人外,還要確保:
本主機與郵件服務器互通
本主機SendMail可使用外部SMTP服務發送郵件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios已是一個服務,也能夠執行以下操做:
service nagios start/stop/restart/status