雲監控 Nagios 安裝步驟

前言

最近在研究雲監控的相關工具,以前寫過Ganglia的安裝步驟,這回來記錄下Nagios的安裝步驟。
php

本文不講解相關原理,若想了解請參考其餘資料.html

本文目的: 即便以前未觸過nagios,也能按照文中步驟搭建本身的nagios監控集羣.java

@Author  duangr node

@Website http://my.oschina.net/duangr/blog/183160mysql

1. Nagios簡介

Nagios是一個可運行在Linux/Unix平臺之上的開源監視系統,能夠用來監視系統運行狀態和網絡信息。Nagios能夠監視所指定的本地或遠程主機以及服務,同時提供異常通知功能。在系統或服務狀態異常時發出郵件或短信報警第一時間通知網站運維人員,在狀態恢復後發出正常的郵件或短信通知。
linux

2. 相關環境

Host Name IP OS
Arch
duangr-1 192.168.56.10 CentOS 6.4 x86_64
duangr-2 192.168.56.11
CentOS 6.4
x86_64
duangr-3 192.168.56.12
CentOS 6.4
x86_64

3. 部署規劃

監控服務主節點(Master)
duangr-1
被監控從節點(Slave) duangr-2, duangr-3

Nagios主節點須要安裝:ios

  • nagiosweb

  • nagios-pluginredis

  • nrpesql

  • php

  • apache

Nagios從節點須要安裝:

  • nagios-plugin

  • nrpe

安裝路徑規劃

nagios安裝路徑
/usr/local/nagios
php安裝路徑 /usr/local/php
apache安裝路徑 /usr/local/apache2

4. 代碼獲取

5. 前提依賴

5.1 主機環境檢查(所有節點)

# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
gcc-4.4.7-3.el6.x86_64
glibc-2.14.1-6.x86_64
glibc-common-2.14.1-6.x86_64
gd-2.0.35-11.el6.x86_64
package gd-devel is not installed
package xinetd is not installed
openssl-devel-1.0.0-27.el6.x86_64

如有缺失,請先安裝. 可經過以下幾個鏡像網站下載相關安裝包:

  • http://rpm.pbone.net/

  • http://mirrors.163.com/centos/6.4/os/x86_64/Packages/

  • http://mirrors.sohu.com/centos/6.4/os/x86_64/Packages/

安裝後再次檢查以下:

# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
gcc-4.4.7-3.el6.x86_64
glibc-2.14.1-6.x86_64
glibc-common-2.14.1-6.x86_64
gd-2.0.35-11.el6.x86_64
gd-devel-2.0.35-11.el6.x86_64
xinetd-2.3.14-38.el6.x86_64
openssl-devel-1.0.0-27.el6.x86_64

6. 編譯安裝

6.1 建立用戶nagios(所有節點)

useradd nagios -d /usr/local/nagios
passwd nagios   (密碼自定義)

6.2 安裝nagios主程序(主節點安裝)

tar -zxf nagios-4.0.2.tar.gz
cd nagios-4.0.2
./configure --prefix=/usr/local/nagios     
make all
make install && make install-init && make install-commandmode && make install-config

將nagios添加爲服務

chkconfig --add nagios 
chkconfig nagios off
chkconfig --level 35 nagios on
chkconfig --list nagios    
nagios          0:關閉  1:關閉  2:關閉  3:啓用  4:關閉  5:啓用  6:關閉

6.3 安裝nagios插件(所有節點安裝)

tar -zxf nagios-plugins-1.5.tar.gz
cd nagios-plugins-1.5
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios      
make && make install

若是出現mysql相關的編譯錯誤,是mysql的默認安裝路徑被修改致使的,調整with-mysql後從新make

./configure --prefix=/usr/local/nagios  --with-mysql=/usr/local/mysql
make && make install

6.4 安裝NRPE(所有節點安裝)

tar -zxf nrpe-2.15.tar.gz
cd nrpe-2.15
./configure --enable-command-args
make all
make install-plugin

下面步驟只須要在被監控節點執行

make install-daemon && make install-daemon-config && make install-xinetd

6.4.1 被監控節點配置

若是是被監控節點,須要配置NRPE已守護進程運行(經過xinetd來運行)

一、更改/etc/xinetd.d/nrpe文件,設置容許nagios主節點服務器鏈接

vi /etc/xinetd.d/nrpe
only_from       = 127.0.0.1 192.168.56.10

二、在/etc/services結尾增長:

nrpe      5666/tcp       # NRPE

三、增長對參數的支持

vi /usr/local/nagios/etc/nrpe.cfg
dont_blame_nrpe=1

四、啓動xinetd

service xinetd restart

五、驗證nrpe是否監聽

netstat -at | grep nrpe

六、測試nrpe是否正常運行

/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.15

6.4.2 主節點配置

若是是監控服務主節點,在所有被監控節點NRPE配置完成後,能夠依次作下檢測

/usr/local/nagios/libexec/check_nrpe -H 192.168.56.11
NRPE v2.15
/usr/local/nagios/libexec/check_nrpe -H 192.168.56.12
NRPE v2.15

6.5 安裝Apache(主節點安裝)

tar -zxf httpd-2.2.23.tar.gz
cd httpd-2.2.23
./configure --prefix=/usr/local/apache2
make && make install

6.6 安裝PHP(主節點安裝)

cd /export/home/tools/soft/php
tar -zxf php-5.4.10.tar.gz
cd /php-5.4.10
./configure --prefix=/usr/local/php  --with-apxs2=/usr/local/apache2/bin/apxs
make  && make install

6.7 使用apache 發佈PHP的WEB

vi /usr/local/apache2/conf/httpd.conf

....
Listen 80
....
<IfModule dir_module>
    DirectoryIndex index.html index.php
    AddType application/x-httpd-php .php
</IfModule>
....
#setting for nagios
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
     AuthType Basic
     Options ExecCGI
     AllowOverride None
     Order allow,deny
     Allow from all
     AuthName "Nagios Access"
     AuthUserFile /usr/local/nagios/etc/htpasswd
     Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
     AuthType Basic
     Options None
     AllowOverride None
     Order allow,deny
     Allow from all
     AuthName "nagios Access"
     AuthUserFile /usr/local/nagios/etc/htpasswd
     Require valid-user
</Directory>

爲web訪問時添加用戶名和密碼(此處用戶名爲admin,可自定義)

/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd admin

啓動apache

/usr/local/apache2/bin/apachectl start

訪問頁面:

   http://192.168.56.10/nagios/

7. 配置Nagios

7.1 配置遠程被監控節點

7.1.1 修改配置文件

# su - nagios
$ vi /usr/local/nagios/etc/nrpe.cfg

修改成以下配置內容:

command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
command[check_procs_args]=/usr/local/nagios/libexec/check_procs  $ARG1$
command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$

以上監控命令功能:

  • check_users                    監控登錄用戶數

  • check_load                     監控CPU負載

  • check_disk                      監控磁盤的使用

  • check_procs                   監控進程數量,狀態包括 RSZDT

  • check_swap                    監控SWAP分區使用

7.1.2 重啓xinetd服務

配置完上述命令後,重啓 xinetd服務

service xinetd restart

7.1.3 校驗配置

檢查監控命令配置是否ok

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users  -a 5 10
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load   -a 15,10,5 30,25,20
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_disk    -a 20% 10% /
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_swap  -a 20% 10%

7.2 配置監控服務主節點

7.2.1 cgi.cfg(控制CGI訪問的配置文件)

(使用 nagios 用戶)

vi /usr/local/nagios/etc/cgi.cfg

修改以下內容,爲admin用戶增長權限:

default_user_name=admin
authorized_for_system_information=nagiosadmin,admin
authorized_for_configuration_information=nagiosadmin,admin
authorized_for_system_commands=nagiosadmin,admin
authorized_for_all_services=nagiosadmin,admin
authorized_for_all_hosts=nagiosadmin,admin
authorized_for_all_service_commands=nagiosadmin,admin
authorized_for_all_host_commands=nagiosadmin,admin

7.2.2 nagios.cfg(nagios主配置文件)

(使用 nagios 用戶)

vi /usr/local/nagios/etc/nagios.cfg

#cfg_file=/export/home/nagios/etc/objects/localhost.cfg      (註釋掉)
cfg_dir=/export/home/nagios/etc/servers

主配置文件聲明瞭監控腳本的存儲路徑爲 ./servers, 默認沒有此目錄,須要手工建立

nagios 會讀取 servers 目錄下面後綴爲.cfg的所有文件做爲配置文件

cd /usr/local/nagios/etc
mkdir servers
cd servers

7.2.3 定義監控的主機組

聲明一個監控的主機組,將主機環境中提到的三臺主機所有加入監控

vi /export/home/nagios/etc/servers/group.cfg

新文件,內容以下:

define hostgroup{
   hostgroup_name      duangr-server
   alias               duangr Server
   members             duangr-1,duangr-2,duangr-3
}

解釋下上面的配置:

  • hostgroup_name:    主機組的名稱,可隨意指定

  • alias:                        主機組別名,可隨意指定

  • members:                主機組成員,多個主機名稱以前使用逗號分隔.另外主機名稱必須與 define host 中host_name 一致.

主機的定義,後面會說到.

7.2.4 定義監控的主機

下面開始定義具體的主機

7.2.4.1 本地主機監控配置

先定義本地主機 duangr-1

vi /export/home/nagios/etc/servers/duangr-1.cfg

新文件,內容以下:

define host{
       use                          linux-server
       host_name                    duangr-1
       alias                        duangr-1
       address                      192.168.56.10
       }

define service{
       use                             local-service
       host_name                       duangr-1
       service_description             Host Alive
       check_command                   check-host-alive
       }
define service{
       use                             local-service
       host_name                       duangr-1
       service_description             Users
       check_command                   check_local_users!20!50
       }
define service{
       use                             local-service
       host_name                       duangr-1
       service_description             CPU
       check_command                   check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
       }
define service{
       use                             local-service
       host_name                       duangr-1
       service_description             Disk Root
       check_command                   check_local_disk!20%!10%!/
       }
define service{
       use                             local-service
       host_name                       duangr-1
       service_description             Disk Home
       check_command                   check_local_disk!20%!10%!/export/home
       }
define service{
       use                             local-service
       host_name                       duangr-1
       service_description             Zombie Procs
       check_command                   check_local_procs!5!10!Z
       }
define service{
       use                             local-service
       host_name                       duangr-1
       service_description             Total Procs
       check_command                   check_local_procs!250!400!RSZDT
       }
define service{
       use                             local-service
       host_name                       duangr-1
       service_description             Swap Usage
       check_command                   check_local_swap!20!10
       }

說明下,因爲是此主機也是監控服務主節點所在主機,所以可使用check_local_* 的相關命令來進行監控.

這個文件中已經將經常使用的監控項配置進去.

7.2.4.2 遠程主機監控配置

再定義遠程主機duangr-2和duangr-3

定義遠程主機的監控以前,須要先定義check_nrpe命令

vi /usr/local/nagios/etc/objects/commands.cfg

在文件的最後面添加以下內容:

# 'check_nrpe' command definition
define command{
       command_name    check_nrpe
       command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
       }
define command{
       command_name    check_nrpe_args
       command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$
       }

定義duangr-2主機的監控配置

$ vi /usr/local/nagios/etc/servers/duangr-2.cfg

新文件,內容以下:

define host{
       use                     linux-server
       host_name               duangr-2
       alias                   duangr-2
       address                 192.168.56.11
       }

define service{
       use                             local-service
       host_name                       duangr-2
       service_description             Host Alive
       check_command                   check-host-alive
       }
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             Users
       check_command                   check_nrpe_args!check_users!5 10
       }
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             CPU
       check_command                   check_nrpe_args!check_load!15,10,5 30,25,20
       }
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             Disk Root
       check_command                   check_nrpe_args!check_disk!20% 10% /
       }
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             Disk /export/home
       check_command                   check_nrpe_args!check_disk!20% 10% /export/home
       }
define service{
      use                             local-service
      host_name                       duangr-2
      service_description             Procs Zombie
      check_command                   check_nrpe_args!check_procs!5 10 Z
      }
define service{
      use                             local-service
      host_name                       duangr-2
      service_description             Procs Total
      check_command                   check_nrpe_args!check_procs_args!"-w400 -c600"
      }
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             Swap Usage
       check_command                   check_nrpe_args!check_swap!20% 10%
       }

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;  下面是一些經常使用進程的監控,主要是雲平臺相關進程
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 監控crond進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: crond
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Ccrond"
       }
;; 監控zookeeper進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: QuorumPeerMain
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.quorum.QuorumPeerMain"
       }
;;監控storm的從節點進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: supervisor
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.supervisor"
       }
;; 監控storm的主節點進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: nimbus
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.nimbus"
       }
;; 監控MetaQ進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: MetaQ
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -ametamorphosis-server-w"
       }
;; 監控Redis進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: redis-server
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Credis-server"
       }
;; 監控hadoop主節點NameNode進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: NameNode 
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.NameNode"
       }
;; 監控hadoop主節點SecondaryNameNode進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: SecondaryNameNode
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.SecondaryNameNode"
       }
;; 監控hadoop主節點ResourceManager進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: ResourceManager
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.resourcemanager.ResourceManager"
       }
;; 監控hadoop從節點DataNode進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: DataNode
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.datanode.DataNode"
       }
;;監控hadoop從節點NodeManager進程
define service{
       use                             local-service
       host_name                       duangr-2
       service_description             PS: NodeManager
       check_command                   check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.nodemanager.NodeManager"
       }

說明下,因爲duangr-2是遠程主機,所以使用check_nrpe_args命令來監控.

這個文件中已經將經常使用的監控項配置進去, 同時還包含了hadoop、storm、zookeeper、metaq、redis的相關進程監控,主要的監控思路是判斷進程是否存在。


定義duangr-3主機的監控配置

vi duangr-3.cfg 

內容與duangr-2.cfg相似,只須要修改 host_name 、alias、 address便可.

7.2.4.3 郵件監控

定義監控人郵件地址

vi /usr/local/nagios/etc/objects/contacts.cfg

define contact{
       contact_name                    nagiosadmin             ; Short name of user
       use                             generic-contact         ; Inherit default values from generic-contact template (defined above)
       alias                           Nagios Admin            ; Full name of user
       email                           yourname@domain.com 
                                                               ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
       }

除了配置監控郵件的接收人外,還要確保:

  • 本主機與郵件服務器互通

  • 本主機SendMail可使用外部SMTP服務發送郵件

7.2.4.4 校驗配置

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

7.2.4.5 啓動

/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

nagios已是一個服務,也能夠執行以下操做:

service nagios start/stop/restart/status

8. 監控頁面

http://192.168.56.10/nagios

9. 相關連接

相關文章
相關標籤/搜索