nagios監控windows主機 && linux主機

1、NSclient++與nrpehtml

nagios對windows的監控主要有兩種方法,一種是NSclient++,另外一種是經過nrpe的方式來達到監控目的mysql

NSclient++與nrpe最大的區別就是:linux

一、被監控機上安裝有nrpe,而且還有插件,最終的監控是由這些插件來進行的.當監控主機將監控請求發給nrpe後,nrpe調用插件來完成監控.ios

二、NSclient++則不一樣,被監控機上只安裝NSclient++,沒有任何的插件.當監控主機將監控請求發給NSclient++後,NSclient++直接完成監控,全部的監控是由NSclient++完成的。web

這也說明了NSclient++的一個很大的問題,不靈活,沒有可擴展性.它只能完成本身自己包含的監控操做,不能由一些插件來擴展.好在NSclient++已經作的不錯了,基本上能夠徹底知足咱們的監控須要。sql

NSclient++的原理圖
apache

wKiom1PvKYqT3zN_AACJ0zqers8519.jpg

2、部署過程
vim

一、在windows上安裝NSclient++windows

(1)一直下一步安全

wKiom1PvKuiCrk9AAAAjAmNM0Rg841.png

(2)設置nagios服務器IP地址

wKiom1PvKv2ATA65AACCPbwcXss954.png

(3)檢查NSclient++的端口是否成功開啓

wKioL1PvLIChR2F7AAAZ7iDZyZk706.png

    若是服務沒有開啓,就:win+r --> services.msc --> nsclient++ 開啓服務便可

(4)防火牆打開tcp 12489端口

二、設置nagios服務器

(1)檢測nagios命令是否能夠正常監測windows主機

[root@cacti libexec]#  ./check_nt -H 192.168.200.15 -p 12489 -s dianyi123 -v UPTIME
System Uptime - 3 day(s) 12 hour(s) 32 minute(s)
[root@cacti libexec]# 
[root@cacti libexec]# ./check_nt -H 192.168.200.15 -p 12489 -s dianyi123 -v CPULOAD -w 80 -c 90 -l 5,80,90
CPU Load 0% (5 min average) |   '5 min avg Load'=0%;80;90;0;100                     #-w 警告比例 -c 緊急比例  -l(小寫L) 表示過去5分鐘的平均值,80%爲警告,90%爲緊急
[root@cacti libexec]# 
[root@cacti libexec]# ./check_nt -H 192.168.200.15 -p 12489 -s dianyi123 -v USEDDISKSPACE -w 80 -c 90 -l C 
C:\ - total: 100.83 Gb - used: 13.71 Gb (14%) - free 87.12 Gb (86%) | 'C:\ Used Space'=13.71Gb;80.66;90.74;0.00;100.83

(2)定義命令、主機、服務

    ①、定義命令

[root@cacti ~]# vim /usr/local/nagios/etc/objects/commands.cfg
define command{
        command_name    check_win
        command_line    $USER1$/check_nt -H "$HOSTADDRESS$" -p 12489 -s dianyi123 -v $ARG1$ $ARG2$
}

    ②、定義主機和服務

    爲了方便,主機和監控服務都定義在一個配置文件裏面

    首先創在/usr/local/nagios/etc建一個文件夾servers專門保存各服務器的配置文件,而後以服務器IP命名各服務器配置文件

    這樣的話,nagios.cfg裏面就須要開啓對servers目錄的支持

[root@cacti etc]# pwd
/usr/local/nagios/etc
[root@cacti etc]# 
[root@cacti etc]# ls
cgi.cfg  htpasswd.users  nagios.cfg  objects  resource.cfg  servers
[root@cacti etc]# 
[root@cacti etc]# vim nagios.cfg
cfg_dir=/usr/local/nagios/etc/servers
[root@cacti etc]# 
[root@cacti etc]# vim servers/192.168.200.15.cfg
define host{
        use                     windows-server            ; Name of host template to use
        host_name               192.168.200.15
        alias                   my computer
        address                 192.168.200.15
        }

#define hostgroup{
#       hostgroup_name  windows-servers ; The name of the hostgroup
#       alias           Windows Servers ; Long name of the group
#       }

define service{
        use                             generic-service
        host_name                       192.168.200.15
        service_description             NSClient++ Version
        check_command                   check_win!CLIENTVERSION
         }

define service{
        use                             generic-service
        host_name                       192.168.200.15
        service_description             Uptime
        check_command                   check_win!UPTIME
        }

define service{
        use                             generic-service
        host_name                       192.168.200.15
        service_description             CPU Load
        check_command                   check_win!CPULOAD!-l 5,80,90
}

define service{
       use                     generic-service
       host_name               192.168.200.15
       service_description     Memory Usage
       check_command           check_win!MEMUSE!-w 80 -c 90
       }

define service{
       use                     generic-service
       host_name               192.168.200.15
       service_description     C:\ Drive Space
       check_command           check_win!USEDDISKSPACE!-l c -w 80 -c 90
       }

define service{
       use                     generic-service
       host_name               192.168.200.15
       service_description     D:\ Drive Space
       check_command           check_win!USEDDISKSPACE!-l d -w 80 -c 90
       }

define service{
       use                     generic-service
       host_name               192.168.200.15
       service_description     E:\ Drive Space
       check_command           check_win!USEDDISKSPACE!-l e -w 80 -c 90
       }

#define service{
#       use                     generic-service
#       host_name               192.168.200.15
#       service_description     W3SVC
#       check_command           check_win!SERVICESTATE!-d SHOWALL -l W3SVC
#       }

define service{
       use                     generic-service
       host_name               192.168.200.15
       service_description     Explorer
       check_command           check_win!PROCSTATE!-d SHOWALL -l Explorer.exe
       }

(3)檢查配置文件有無錯誤

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

若是沒有消息,那就是最好的消息,下一步,就能夠重啓nagios服務了

(4)重啓nagios服務

[root@cacti ~]# service nagios restart
Stopping nagios:                                           [  OK  ]
Starting nagios:                                           [  OK  ]

3、nagios監控頁面查看主機與服務

一、主機狀態

wKioL1PvOGOCMIo0AACZgKI45rU930.png

二、服務狀態

wKiom1PvN2nQ0ruDAABa77H1RVc898.png

4、排錯階段

本次部署nagios監控windows主機主要碰到兩個問題

一、主機狀態(status)是down,而不是正常的up

  緣由:這種狀況下,通常都是服務器禁ping了,監控服務器是經過ping服務來檢查被監控服務器是否在線,當把windows服務器ping的回顯請求開啓後,監控成功

  解決:win2008:服務器管理器——設置——高級安全windows防火牆——入站規則——找到「文件和打印機共享(回顯請求-ICMPv4-in)」右擊……選擇「啓用規則」

二、could not fetch information from server

  當把第1個問題解決掉後,Status是UP起來了,但是全部的服務所有都是could not fetch information from server

wKioL1PvOneR3f1lAABUd5w7O6k288.png

  緣由:出現這種情況的緣由是由於nagios服務器沒有從被監控端服務器上得到相關數據,直接緣由就是NSclient++的配置文件中Allowed hosts的IP沒有設置正確

  解決:NSclient++的配置文件中 Allowed hosts = nagios服務器IP

wKiom1PvOveg0Z7bAABSyeAbrf0246.png

  當時在安裝NSclient++時,個人 Allowed hosts = 192.168.200.105 ,個人設置是正確的,可是爲何會變成15我也不知道爲何



5、nagios監控linux主機

一、服務端定義主機

define host{
        use                     linux-server
        host_name               192.168.200.111
        alias                   linux
        address                 192.168.200.111
        }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             root_/
        check_command                   check_nrpe!check_xvda!5%!10%
         }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             /dev/xvdb2
        check_command                   check_nrpe!check_xvdb2!5%!10%
         }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             Check Swap
        check_command                   check_nrpe!check_swap
        }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             total
        check_command                   check_nrpe!check_total_procs
        }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             check_load
        check_command                   check_nrpe!check_load
        }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             check_tcp_3306
        check_command                   check_tcp!3306
        }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             check_users
        check_command                   check_nrpe!check_users
        }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             check_mem
        check_command                   check_nrpe!check_mem
        }

define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             check_mysql
        check_command                   check_nrpe!check_mysql
        }
define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             check_mysql_slave
        check_command                   check_nrpe!check_mysql_slave
        }
   
define service{
        use                             generic-service
        host_name                       192.168.200.111
        service_description             check_http  192.168.200.111/test.html
        check_command                   check_http!'-u /test.html'     #nagios監控網頁狀態(如 200),在commands.cfg中有自帶check_http命令,也可監控域名!
        }

二、客戶端修改:vim /usr/local/nagios/etc/nrpe.cfg

command[check_users]=/usr/local/nagios/libexec/check_users -w 3 -c 5
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_xvda]=/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/xvda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200 
command[check_xvdb2]=/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /dev/xvdb2   #阿里雲
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%                 #/dev/xvdb1 分區作了swap
command[check_mem]=/usr/bin/sudo /usr/local/nagios/libexec/check_mem -w 20 -c 10 
command[check_mysql]=/usr/local/nagios/libexec/check_mysql -H 192.168.200.111 -unagios -dnagios_monitor -p dianyi123
command[check_mysql_slave]=/usr/local/nagios/libexec/check_mysql_slave

三、在nrpe.cfg配置文件中容許nagios服務器IP

[root@localhost ~]# vim /usr/local/nagios/etc/nrpe.cfg 
allowed_hosts=127.0.0.1,192.168.200.105

四、客戶端以獨立進程方式啓動 nrpe

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

五、修改nagios的命令模板

[root@monitor ~]# vim /usr/local/nagios/etc/objects/commands.cfg #添加如下一行
define command {
      command_name  check_nrpe
      command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

不然重啓nagios會報錯:

Error: Service check command 'check_nrpe!check_total_procs' specified in service 'total' for host '192.168.200.105' not defined anywhere!

六、服務端檢測 :

/usr/local/nagios/libexec/check_nrpe -H 192.168.200.111 -c check_sda


6、補充

一、nagios監控windows端口

    基本上socket(收發通訊協議)寫的程序都會對應一個tcp端口出來,咱們只要監控此端口就至關於監控了此程序;如FTP 21,pop 110,smtp 25 這些是常見的tcp端口,常見的端口通常nagios內都有定義的check_nt!,若是不是常見的端口,就需自定義程序的tcp端口。

    在監控以前,要確認端口是打開的,能夠在CMD中telnet一下端口

C:\Users\Administrator>telnet 192.168.200.15 3389

(1)定義命令

[root@cacti objects]# vim /usr/local/nagios/etc/objects/commands.cfg
define command{
       command_name    tcp3389
       command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p 3389 -v CLIENTVERSION
}

(2)定義服務

    主機已定義,主機和服務在一個配置文件裏

[root@cacti servers]# vim /usr/local/nagios/etc/servers/192.168.200.15.cfg
define service{
        use                     generic-service
        host_name               192.168.200.15
        service_description     port3389
        check_command           tcp3389
}

(3)重啓nagios服務

(4)查看驗證

wKiom1P0jpHhVLokAABsKpw7J9w674.png

二、nagios監控linux端口

[root@cacti servers]# pwd
/usr/local/nagios/etc/servers
[root@cacti servers]# 
[root@cacti servers]# vim 192.168.200.18.cfg
define service{
        use                             generic-service
        host_name                       192.168.200.18
        service_description             check_tcp_3306
        check_command                   check_tcp!3306
        }

define service{
        use                             generic-service
        host_name                       192.168.200.18
        service_description             check_tcp_873
        check_command                   check_tcp!873
        }
#
[root@cacti ~]# service nagios restart

############# 若是監聽的端口是這樣的,而不是 *:5666 這樣 ###############
tcp    LISTEN     0      50                             61.138.78.59:7003                                     *:*     
tcp    LISTEN     0      5                                         *:5666                                     *:*  
則須要修改commands中的 $HOSTADDRESS$ 爲61.138.78.59,而後修改command_name,再定義服務便可

三、nagios監控mysql主從同步

   判斷mysql的主從同步主要仍是看那兩個線程:Slave_IO線程和Slave_SQL線程,兩個都是YES的話,就證實是沒有問題的

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.200.17
                  Master_User: doteyplay
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000008
          Read_Master_Log_Pos: 1277
               Relay_Log_File: relay-bin.000025
                Relay_Log_Pos: 1486
        Relay_Master_Log_File: master-bin.000008
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

第一部分:客戶端配置

(1)在被監控的從服務器增長一個用戶

MariaDB [(none)]> grant Replication client on *.* to nagios@localhost identified by 'nagios';
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> 
MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)

(2)驗證命名執行狀態

[root@DBSlave ~]# mysql -unagios -pnagios -e "show slave status\G;"  
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.200.17
                  Master_User: doteyplay
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000008
          Read_Master_Log_Pos: 1277
               Relay_Log_File: relay-bin.000025
                Relay_Log_Pos: 1486
        Relay_Master_Log_File: master-bin.000008
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

(3)編寫腳本/usr/local/nagios/libexec/check_mysql_slave(這是監控其做用的核心)

#!/bin/sh 
declare -a    slave_is 
slave_is=($(/usr/local/mysql/bin/mysql -unagios -pnagios    -e "show slave status\G"|grep Running |awk '{print $2}')) 
if [ "${slave_is[0]}" = "Yes" -a "${slave_is[1]}" = "Yes" ] 
     then 
     echo "OK C2-slave is running" 
     exit 0 
else 
     echo "Critical C2-slave is error" 
     exit 2 
fi 
#
[root@DBSlave libexec]# chmod +x check_mysql_slave   #賦予執行權限
[root@DBSlave libexec]# chown  nagios.nagios check_mysql_slave

(4)在從服務器安裝 nrpe,而後在配置文件nrpe.cfg加入一行

[root@DBSlave ~]# vim /usr/local/nagios/etc/nrpe.cfg
command[check_mysql_slave]=/usr/local/nagios/libexec/check_mysql_slave

(5)手動執行腳本,觀察輸出狀態

[root@DBSlave libexec]# sh check_mysql_slave 
OK C2-slave is running

(6)檢查被監控端的5666端口

[root@DBSlave libexec]# ss -antulp | grep 5666
tcp    LISTEN     0      5                     :::5666                 :::*      users:(("nrpe",26512,5))
tcp    LISTEN     0      5                      *:5666                  *:*      users:(("nrpe",26512,4))
[root@DBSlave libexec]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

第二部分:服務端配置

(1)在監控機上檢查是否可成功監控被監控機

[root@cacti ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.200.18 -c check_mysql_slave
NRPE: Command 'check_mysql_slave' not defined     #遇到問題

排錯:NRPE: Command 'check_mysql_slave' not defined

[root@cacti ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.200.18 
NRPE v2.15

 證實在被監測主機上配置的NRPE已經正常工做,而且監測主機可以經過SSL與被監測主機上的NRPE正常通訊。

[root@DBSlave libexec]# ps -ef | grep nrpe
root     10287  9703  0 12:01 pts/1    00:00:00 vim /usr/local/nagios/etc/nrpe.cfg
root     10522  9639  0 12:30 pts/0    00:00:00 grep nrpe
nagios   26512     1  0 Aug15 ?        00:01:09 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d   #這裏的nrpe是以獨立進程運行的,而非守護進程。先kill一下nrpe再說
[root@DBSlave libexec]# 
[root@DBSlave libexec]# kill -9 26512   #kill nrpe進程
[root@DBSlave libexec]# 
[root@DBSlave libexec]# ps -ef | grep nrpe
root     10287  9703  0 12:01 pts/1    00:00:00 vim /usr/local/nagios/etc/nrpe.cfg
root     10524  9639  0 12:31 pts/0    00:00:00 grep nrpe       #kill 成功
[root@DBSlave libexec]# 
[root@DBSlave libexec]# 
[root@DBSlave libexec]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d   #重啓nrpe
[root@DBSlave libexec]# 
[root@DBSlave libexec]# ps -ef | grep nrpe
root     10287  9703  0 12:01 pts/1    00:00:00 vim /usr/local/nagios/etc/nrpe.cfg
nagios   10526     1  0 12:31 ?        00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
root     10528  9639  0 12:31 pts/0    00:00:00 grep nrpe

再次在監控端測試

[root@cacti ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.200.18 -c check_mysql_slave
OK C2-slave is running   #終於順利經過了,就是nrpe進程的事兒

(2)定義主機、服務

[root@cacti servers]# pwd
/usr/local/nagios/etc/servers
[root@cacti servers]# vim 192.168.200.18.cfg
define host{
        use                     linux-server
        host_name               192.168.200.18
        alias                   linux
        address                 192.168.200.18
        }
       
define service{
        use                             generic-service
        host_name                       192.168.200.18
        service_description             check_mysql_slave
        check_command                   check_nrpe!check_mysql_slave
        }

(3)重啓nagios服務

(4)查看監控狀態

wKioL1P1eVvjb_cnAAAzfT1y6aY028.png

四、nagios經過web界面修改某個服務時報錯

例如對某個服務進行臨時安排其執行時間,或者不讓它發警告,web頁面上都有這樣的設置.可是經常會有錯誤信息以下:

Could not open command file '/usr/local/nagios/var/rw/nagiosNaNd' for update!The permissions on the external command file and/or directory may be incorrect. Read the FAQs on how to setup proper permissions.An error occurred while attempting to commit your command for processing.

 (1)修改屬組

[root@monitor ~]# chown -R nagios.nagios /usr/local/nagios/var/rw/

(2)把apache用戶加入到nagios組

[root@monitor ~]# usermod -G nagios apache

(3)重啓服務

[root@monitor ~]# service nagios restart
[root@monitor ~]# service httpd restart
相關文章
相關標籤/搜索