nagios監控系統搭建及配置

時間 2019-11-07

標籤 nagios 監控系統搭建配置简体版

原文原文鏈接

系統環境：紅帽4.8（64位）安裝操做系統時全部軟件包都已安裝。 html

軟件環境：nagios-3.2.1、nagios-plugins-1.4.15、nrpe-2.12（能夠和個人不同） ios

監控機：192.168.5.58（安裝操做系統的時候是全部軟件包都安裝，apache用系統自帶的） web

被監控機：192.168.3.64（隨便取上圖中的一臺服務器） apache

說明：監控機上須要部署nagios、nagios-plugins、nrpe（nrpe是監控cpu負載，進程數，磁盤空間使用率）。若是說你只想監控本機的ping、或者80端口什麼的那就不須要安裝nrpe插件。同理，若是須要監控被監控機的存活、80端口什麼的也不須要安裝nrpe插件，若是要監控被監控機的cpu負載、進程數、磁盤空間使用率就須要在被監控機上安裝nrpe插件。 c#

監控機下載以下軟件：服務器

wget http://prodownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.1.tar.gz tcp

wget http://prodownloads.sourceforge.net/sourceforge/nagios/nagios-plugins-1.4.15.tar.gz ide

wget http://prodownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz oop

監控機操做以下：測試

安裝nagios的時候，須要先建立一個系統用戶:nagios，命令以下：

groupadd nagios

useradd –g nagios nagios

mkdir /usr/local/nagios

chown –R nagios:nagios /usr/local/nagios

安裝nagios，解壓編譯安裝

tar –zxvf nagios-3.2.1.tar.gz

cd nagios-3.2.1

./configure –prefix=/usr/local/nagios

make all

make install

make install-init

make install-commandmode

make install-config

安裝nagios的插件，解壓編譯安裝

tar –zxvf nagios-plugins-1.4.15.tar.gz

cd nagios-plugins-1.4.15

./configure –prefix=/usr/local/nagios

make

make install

修改apache的配置文件，在最後面加上以下內容：

ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"

Options ExecCGI

AllowOverride None

Order allow,deny

Allow from all

AuthName "Nagios Access"

AuthType Basic

AuthUserFile /usr/local/nagios/etc/htpasswd.users

Require valid-user

</Directory>

Alias /nagios "/usr/local/nagios/share"

Options None

AllowOverride None

Order allow,deny

Allow from all

AuthName "Nagios Access"

AuthType Basic

AuthUserFile /usr/local/nagios/etc/htpasswd.users

Require valid-user

</Directory>

修改下面這行註釋去掉，修改以下：

#ServerName new.host.name:80

修改後：

ServerName 192.168.5.58:80

增長驗證用戶，使用以下命令：

用戶：admin，密碼：abc#123

/usr/bin/htpasswd –c /usr/local/nagios/etc/htpasswd.user admin

New password：輸入密碼

Re-type new password：再輸入一次密碼

Adding password for user admin

至此nagios安裝就完畢了。能夠啓動apache和nagios服務看看首頁是什麼樣子，這樣咱們基本就測試經過了！安裝超級簡單，配置起來可比較麻煩！

/etc/init.d/httpd start

/etc/init.d/nagios start

簡單測試：http://192.168.5.58/nagios

輸入用戶名admin和密碼訪問

注意在啓動apache的時候可能會報錯，至於爲何報錯我也不太懂，反正網頁查看是沒有任何問題的。

Starting httpd: [Thu Aug 30 13:34:45 2012] [warn] The ScriptAlias directive in /etc/httpd/conf/httpd.conf at line 1024 will probably never match because it overlaps an earlier ScriptAlias.

[Thu Aug 30 13:34:45 2012] [warn] The Alias directive in /etc/httpd/conf/httpd.conf at line 1035 will probably never match because it overlaps an earlier Alias.

[ OK ]

網頁查看沒問題的話，下面開始配置nagios，如今我把監控機定義爲一個組，組中成員也只有它，暫時只監控它的存活和80端口。

nagios的主配置文件爲nagios.cfg，它裏面會定義調用其餘的配置文件，好比監控命令，被監控的服務，聯繫人，聯繫組等。

打開nagios.cfg主配置文件

Vi /usr/local/nagios/etc/nagios.cfg

找到以下內容而且修改和我下圖同樣：

# You can specify individual object config files as shown below:

cfg_file=/usr/local/nagios/etc/objects/commands.cfg ====>命令配置文件

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg ====>聯繫人配置文件

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg ====>監視時段配置文件

cfg_file=/usr/local/nagios/etc/objects/templates.cfg ====>模版配置文件

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

cfg_file=/usr/local/nagios/etc/objects/hosts.cfg ====>監控主機配置文件

cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg ====>監控主機組配置文件

cfg_file=/usr/local/nagios/etc/objects/services.cfg ====>監控項目配置文件

cfg_file=/usr/local/nagios/etc/objects/contactgroup.cfg ====>聯繫人組配置文件

# Definitions for monitoring the local (Linux) host

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

另外還有以下兩個參數只須要修改一個，並說明參數的含義：

check_external_commands=1 表示容許在web界面下執行重啓nagios、中止主機/服務檢查等操做，more就是1不用修改

command_check_interval=-1 表示檢查間隔時間，根據本身的狀況定這個時間，這裏我用的是5秒檢查一次，因此我修改成以下：

command_check_interval=5s

修改cgi腳本控制文件cgi.cfg文件

有個參數須要知道一下

use_authentication=1 表示控制相關的cgi腳本

另外找的以下的行：

authorized_for_system_information=nagiosadmin

authorized_for_configuration_information=nagiosadmin

authorized_for_system_commands=nagiosadmin

authorized_for_all_services=nagiosadmin

authorized_for_all_hosts=nagiosadmin

authorized_for_all_service_commands=nagiosadmin

authorized_for_all_host_commands=nagiosadmin

在全部的行後面增長一個用戶admin，這個用戶就是上述建立的用戶

authorized_for_system_information=nagiosadmin,admin

authorized_for_configuration_information=nagiosadmin,admin

authorized_for_system_commands=nagiosadmin,admin

authorized_for_all_services=nagiosadmin,admin

authorized_for_all_hosts=nagiosadmin,admin

authorized_for_all_service_commands=nagiosadmin,admin

authorized_for_all_host_commands=nagiosadmin,admin

下面咱們開始新建上述nagios.cfg文件裏面調用的那些配置文件吧！由於有些文件有，因此我備份一下！其實nagios配置很靈活的，通常位置對應上，格式正確就行。

cd /usr/local/nagios/etc/objects

mv timeperiods.cfg timeperiods.cfg.bak

定義監控時間段，建立文件內容以下：

vi timeperiods.cfg

define timeperiod{

timeperiod_name 24x7

alias 24 Hours A Day, 7 Days A Week

sunday 00:00-24:00

monday 00:00-24:00

tuesday 00:00-24:00

wednesday 00:00-24:00

thursday 00:00-24:00

friday 00:00-24:00

saturday 00:00-24:00

}

定義了一個監控時間段，它的名稱是24x7，監控的時間是天天全天24小時，注意這裏不是*號，而是小寫字母x。

mv contacts.cfg contacts.cfg.bak

定義聯繫人，建立文件內容以下：

vi contacts.cfg

define contact{

contact_name admin =====>聯繫人名稱

alias sys admin =====>別名

service_notification_period 24x7

host_notification_period 24x7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-host-by-email

host_notification_commands notify-host-by-email

email vfast_zengzz@yahoo.cn

pager 13601298217

}

w是報警（warning），u是未知（unkown）,c是嚴重（critical），r是恢復

定義聯繫組，建立文件內容以下：

Vi contactgroup.cfg

define contactgroup{

contactgroup_name sagroup

alias System Administrator

members admin

}

定義監控主機，建立文件內容以下：

Vi hosts.cfg

define host {

host_name 192.168.5.58

alias 192.168.5.58

address 127.0.0.1

check_command check-host-alive

max_check_attempts 5

check_period 24x7

contact_groups sagroup

notification_period 24x7

notification_options d,u,r

}

定義監控組，建立文件內容以下：

Vi hostgroups.cfg

define hostgroup{

hostgroup_name nagios-server

alias nagios-server

members 192.168.5.58

}

最重要的文件，定義監控項，建立文件內容以下：

Vi services.cfg

define service{

host_name 192.168.5.58

service_description check-host-alive

check_command check-host-alive ====>監控主機存活項

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.5.58

service_description check-http

check_command check_http ========>監控主機80端口

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

這就基本就配置完了本機監控本機的存活、80端口兩項了。

下面咱們檢查一下nagios.cfg的配置文件是否有問題，執行以下命令：

/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg

若是沒有錯誤的話最後面會提示：

Total Warnings: 0

Total Errors: 0

重啓nagios服務

/etc/init.d/nagios restart

網頁登錄後點擊host groups按鈕查看應該會出現以下圖，個人5個OK是我加了監控cpu負載、進程數和磁盤空間使用率的（若是沒配置錯誤的話你的應該會出現2個pending狀態，等待幾分鐘以後應該就是2個OK狀態了）：

接下來咱們配置本機監控本機的cpu負載、進程數和磁盤空間使用率，而且監控192.168.3.64這臺被監控機上全部的項，正如上文說到要監控這三項指標的話，必須安裝nrpe插件。因此本機和3.64都要安裝nrpe插件。

安裝nrpe插件請參考以下連接：

http://blog.chinaunix.net/uid-23916356-id-3062081.html

記住本機由於已經安裝了nagios-plugins了，因此只需安裝nrpe就行。（安裝過程略）

被監控機須要安裝nagios-plugins和nrpe兩個插件。（安裝過程略）

由於要用到nrpe模版，因此commands.cfg文件增長以下內容：

define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

上述不少配置文件都已創建，只須要修改便可！如我要把單獨加進來的被監控機單獨分一個組。修改hostgroups.cfg文件，增長以下內容：

define hostgroup{

hostgroup_name ceshi-hadoop

alias ceshi-hadoop

members 192.168.3.64 =====>組成員，多個組員就以逗號隔開寫

}

新加的被監控機添加到監控主機文件，修改hosts.cfg文件，增長以下內容：

define host {

host_name 192.168.3.64

alias 192.168.3.64

address 192.168.3.64

check_command check-host-alive

max_check_attempts 5

check_period 24x7

contact_groups sagroup

notification_period 24x7

notification_options d,u,r

}

由於上述要求有新增的監控項，因此確定要修改監控項文件。最後修改services.cfg文件，增長以下內容：

define service{

host_name 192.168.5.58

service_description check-local-load

check_command check_nrpe!check_load =====> cpu負載

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.5.58

service_description check-local-procs

check_command check_nrpe!check_total_procs =====>進程數

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.5.58

service_description check-local-disk

check_command check_nrpe!check_df =====> 磁盤使用率

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64 ====>主機存活

service_description check-host-alive

check_command check-host-alive

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-local-disk

check_command check_nrpe!check_df ======>磁盤使用率

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-http

check_command check_http ========>80端口

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-tcp-1099

check_command check_tcp!1099 ==========>1099端口

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-tcp-2222

check_command check_tcp!2222 ===========>2222端口

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-tcp-60030

check_command check_tcp!60030 ==============>60030端口

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-tcp-50010

check_command check_tcp!50010 ===========>50010端口

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-local-load

check_command check_nrpe!check_load =========>cpu負載

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name 192.168.3.64

service_description check-total-procs

check_command check_nrpe!check_total_procs =======>進程數

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

這樣就配置完畢了。重啓nagios服務，重啓以前檢查一下配置文件。

/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg

沒有任何問題的話，咱們從新啓動nagios

重啓完以後點擊host groups按鈕應該會出現2個組，每一個組裏面各監控一臺機器，本機監控了5項指標，3.64監控了9項指標。

配置總結：

假如你有新的機器要被監控，大家你須要修改監控主機、監控主機組、監控項三個配置文件。

假如你的一臺機器已經監控了，須要再監控一個端口什麼的，那麼你只須要修改監控項就OK了。總之呢？被監控機上不部署nagios插件之類的話，只能夠監控它的存活、端口開放等。可是監控不了cpu負載、進程數和磁盤使用率等之類指標。最後說一下，我也是菜鳥，有問題的地方請聯繫QQ：316189480

開始配置郵件報警

測試工做：你須要有一個合法的郵箱地址，固然能夠是不少，例如谷歌、yahoo、163、126、139郵箱等。隨便選取一個測試一下吧！

我這裏就拿監控機自帶的sendmail測試吧！

肯定sendmail已經啓動！

[root@slave3 etc]# /etc/init.d/sendmail status

sendmail (pid 5291 5282) is running...

執行以下命令：

echo 「test」 | mail 你的郵箱地址

過幾分鐘肯定你的郵箱能收到這封test內容的郵件

而後修改contacts.cfg文件，修改後的內容以下：

define contact{

contact_name zengzhunzhun

alias zengzhunzhun

service_notification_period 24x7

host_notification_period 24x7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-service-by-email

host_notification_commands notify-host-by-email

email vfast_zengzz@yahoo.cn,zengzhunzhun@gmail.

com,zengzhun@126.com,13601298217@139.com

pager 13601298217

}

修改contactgroups.cfg文件，修改後的內容以下：

define contactgroup{

contactgroup_name sagroup

alias System Administrator

members zengzhunzhun

}

而後重啓nagios服務

/etc/init.d/nagios restart

接下來咱們的郵箱應該就能收到報警了！可能會有一點延遲，由於咱們使用的本機自帶的sendmail服務器不是合法的郵件服務器！我這裏這是簡單的測試一下。若是收不到的話，請查看/var/log/maillog是否發送成功，若是遇到下列相似錯誤的話那麼請修改nagios.cfg文件，將參數notification_timeout=30修改成120，時間單位是秒，修改後記得從新啓動nagios。若是生產環境須要的話還得配置郵件服務器！這就另說了！

[1346316317] Warning: Contact 'admin' service notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: 192.168.2.161\nState: UP\nAddress: 192.168.2.161\nInfo: PING OK - Packet loss = 0%, RTA = 5.50 ms\n\nDate/Time: Thu Aug 30 16:44:46 CST 2012\n" | /bin/mail -s "** PROBLEM Host Alert: 192.168.2.161 is UP **" vfast_zengzz@yahoo.cn' timed out after 30 seconds

開始配置短信報警

衆所周知移動推出的139郵箱是能夠接收短信的，意思就是移動的郵箱接收到郵件時候同時也會給綁定的手機發一封郵件！因此咱們只須要把你的contacts.cfg文件的郵箱改成你的移動139的郵箱就OK了！測試時能夠發送短信通知的。