系統環境:紅帽4.8(64位)安裝操做系統時全部軟件包都已安裝。 html
軟件環境:nagios-3.2.1、nagios-plugins-1.4.15、nrpe-2.12(能夠和個人不同) ios
監控機:192.168.5.58(安裝操做系統的時候是全部軟件包都安裝,apache用系統自帶的) web
被監控機:192.168.3.64(隨便取上圖中的一臺服務器) apache
說明:監控機上須要部署nagios、nagios-plugins、nrpe(nrpe是監控cpu負載,進程數,磁盤空間使用率)。若是說你只想監控本機的ping、或者80端口什麼的那就不須要安裝nrpe插件。同理,若是須要監控被監控機的存活、80端口什麼的也不須要安裝nrpe插件,若是要監控被監控機的cpu負載、進程數、磁盤空間使用率就須要在被監控機上安裝nrpe插件。 c#
監控機下載以下軟件: 服務器
wget http://prodownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.1.tar.gz tcp
wget http://prodownloads.sourceforge.net/sourceforge/nagios/nagios-plugins-1.4.15.tar.gz ide
wget http://prodownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz oop
監控機操做以下: 測試
安裝nagios的時候,須要先建立一個系統用戶:nagios,命令以下:
groupadd nagios
useradd –g nagios nagios
mkdir /usr/local/nagios
chown –R nagios:nagios /usr/local/nagios
安裝nagios,解壓編譯安裝
tar –zxvf nagios-3.2.1.tar.gz
cd nagios-3.2.1
./configure –prefix=/usr/local/nagios
make all
make install
make install-init
make install-commandmode
make install-config
安裝nagios的插件,解壓編譯安裝
tar –zxvf nagios-plugins-1.4.15.tar.gz
cd nagios-plugins-1.4.15
./configure –prefix=/usr/local/nagios
make
make install
修改apache的配置文件,在最後面加上以下內容:
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
修改下面這行註釋去掉,修改以下:
#ServerName new.host.name:80
修改後:
ServerName 192.168.5.58:80
增長驗證用戶,使用以下命令:
用戶:admin,密碼:abc#123
/usr/bin/htpasswd –c /usr/local/nagios/etc/htpasswd.user admin
New password:輸入密碼
Re-type new password:再輸入一次密碼
Adding password for user admin
至此nagios安裝就完畢了。能夠啓動apache和nagios服務看看首頁是什麼樣子,這樣咱們基本就測試經過了!安裝超級簡單,配置起來可比較麻煩!
/etc/init.d/httpd start
/etc/init.d/nagios start
簡單測試:http://192.168.5.58/nagios
輸入用戶名admin和密碼訪問
注意在啓動apache的時候可能會報錯,至於爲何報錯我也不太懂,反正網頁查看是沒有任何問題的。
Starting httpd: [Thu Aug 30 13:34:45 2012] [warn] The ScriptAlias directive in /etc/httpd/conf/httpd.conf at line 1024 will probably never match because it overlaps an earlier ScriptAlias.
[Thu Aug 30 13:34:45 2012] [warn] The Alias directive in /etc/httpd/conf/httpd.conf at line 1035 will probably never match because it overlaps an earlier Alias.
[ OK ]
網頁查看沒問題的話,下面開始配置nagios,如今我把監控機定義爲一個組,組中成員也只有它,暫時只監控它的存活和80端口。
nagios的主配置文件爲nagios.cfg,它裏面會定義調用其餘的配置文件,好比監控命令,被監控的服務,聯繫人,聯繫組等。
打開nagios.cfg主配置文件
Vi /usr/local/nagios/etc/nagios.cfg
找到以下內容而且修改和我下圖同樣:
# You can specify individual object config files as shown below:
cfg_file=/usr/local/nagios/etc/objects/commands.cfg ====>命令配置文件
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg ====>聯繫人配置文件
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg ====>監視時段配置文件
cfg_file=/usr/local/nagios/etc/objects/templates.cfg ====>模版配置文件
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg ====>監控主機配置文件
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg ====>監控主機組配置文件
cfg_file=/usr/local/nagios/etc/objects/services.cfg ====>監控項目配置文件
cfg_file=/usr/local/nagios/etc/objects/contactgroup.cfg ====>聯繫人組配置文件
# Definitions for monitoring the local (Linux) host
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
另外還有以下兩個參數只須要修改一個,並說明參數的含義:
check_external_commands=1 表示容許在web界面下執行重啓nagios、中止主機/服務檢查等操做,more就是1不用修改
command_check_interval=-1 表示檢查間隔時間,根據本身的狀況定這個時間,這裏我用的是5秒檢查一次,因此我修改成以下:
command_check_interval=5s
修改cgi腳本控制文件cgi.cfg文件
有個參數須要知道一下
use_authentication=1 表示控制相關的cgi腳本
另外找的以下的行:
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
在全部的行後面增長一個用戶admin,這個用戶就是上述建立的用戶
authorized_for_system_information=nagiosadmin,admin
authorized_for_configuration_information=nagiosadmin,admin
authorized_for_system_commands=nagiosadmin,admin
authorized_for_all_services=nagiosadmin,admin
authorized_for_all_hosts=nagiosadmin,admin
authorized_for_all_service_commands=nagiosadmin,admin
authorized_for_all_host_commands=nagiosadmin,admin
下面咱們開始新建上述nagios.cfg文件裏面調用的那些配置文件吧!由於有些文件有,因此我備份一下!其實nagios配置很靈活的,通常位置對應上,格式正確就行。
cd /usr/local/nagios/etc/objects
mv timeperiods.cfg timeperiods.cfg.bak
定義監控時間段,建立文件內容以下:
vi timeperiods.cfg
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
定義了一個監控時間段,它的名稱是24x7,監控的時間是天天全天24小時,注意這裏不是*號,而是小寫字母x。
mv contacts.cfg contacts.cfg.bak
定義聯繫人,建立文件內容以下:
vi contacts.cfg
define contact{
contact_name admin =====>聯繫人名稱
alias sys admin =====>別名
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-host-by-email
host_notification_commands notify-host-by-email
email vfast_zengzz@yahoo.cn
pager 13601298217
}
w是報警(warning),u是未知(unkown),c是嚴重(critical),r是恢復
定義聯繫組,建立文件內容以下:
Vi contactgroup.cfg
define contactgroup{
contactgroup_name sagroup
alias System Administrator
members admin
}
定義監控主機,建立文件內容以下:
Vi hosts.cfg
define host {
host_name 192.168.5.58
alias 192.168.5.58
address 127.0.0.1
check_command check-host-alive
max_check_attempts 5
check_period 24x7
contact_groups sagroup
notification_period 24x7
notification_options d,u,r
}
定義監控組,建立文件內容以下:
Vi hostgroups.cfg
define hostgroup{
hostgroup_name nagios-server
alias nagios-server
members 192.168.5.58
}
最重要的文件,定義監控項,建立文件內容以下:
Vi services.cfg
define service{
host_name 192.168.5.58
service_description check-host-alive
check_command check-host-alive ====>監控主機存活項
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.5.58
service_description check-http
check_command check_http ========>監控主機80端口
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
這就基本就配置完了本機監控本機的存活、80端口兩項了。
下面咱們檢查一下nagios.cfg的配置文件是否有問題,執行以下命令:
/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg
若是沒有錯誤的話最後面會提示:
Total Warnings: 0
Total Errors: 0
重啓nagios服務
/etc/init.d/nagios restart
網頁登錄後點擊host groups按鈕查看應該會出現以下圖,個人5個OK是我加了監控cpu負載、進程數和磁盤空間使用率的(若是沒配置錯誤的話你的應該會出現2個pending狀態,等待幾分鐘以後應該就是2個OK狀態了):
接下來咱們配置本機監控本機的cpu負載、進程數和磁盤空間使用率,而且監控192.168.3.64這臺被監控機上全部的項,正如上文說到要監控這三項指標的話,必須安裝nrpe插件。因此本機和3.64都要安裝nrpe插件。
安裝nrpe插件請參考以下連接:
http://blog.chinaunix.net/uid-23916356-id-3062081.html
記住本機由於已經安裝了nagios-plugins了,因此只需安裝nrpe就行。(安裝過程略)
被監控機須要安裝nagios-plugins和nrpe兩個插件。(安裝過程略)
由於要用到nrpe模版,因此commands.cfg文件增長以下內容:
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
上述不少配置文件都已創建,只須要修改便可!如我要把單獨加進來的被監控機單獨分一個組。修改hostgroups.cfg文件,增長以下內容:
define hostgroup{
hostgroup_name ceshi-hadoop
alias ceshi-hadoop
members 192.168.3.64 =====>組成員,多個組員就以逗號隔開寫
}
新加的被監控機添加到監控主機文件,修改hosts.cfg文件,增長以下內容:
define host {
host_name 192.168.3.64
alias 192.168.3.64
address 192.168.3.64
check_command check-host-alive
max_check_attempts 5
check_period 24x7
contact_groups sagroup
notification_period 24x7
notification_options d,u,r
}
由於上述要求有新增的監控項,因此確定要修改監控項文件。最後修改services.cfg文件,增長以下內容:
define service{
host_name 192.168.5.58
service_description check-local-load
check_command check_nrpe!check_load =====> cpu負載
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.5.58
service_description check-local-procs
check_command check_nrpe!check_total_procs =====>進程數
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.5.58
service_description check-local-disk
check_command check_nrpe!check_df =====> 磁盤使用率
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64 ====>主機存活
service_description check-host-alive
check_command check-host-alive
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-local-disk
check_command check_nrpe!check_df ======>磁盤使用率
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-http
check_command check_http ========>80端口
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-tcp-1099
check_command check_tcp!1099 ==========>1099端口
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-tcp-2222
check_command check_tcp!2222 ===========>2222端口
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-tcp-60030
check_command check_tcp!60030 ==============>60030端口
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-tcp-50010
check_command check_tcp!50010 ===========>50010端口
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-local-load
check_command check_nrpe!check_load =========>cpu負載
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
define service{
host_name 192.168.3.64
service_description check-total-procs
check_command check_nrpe!check_total_procs =======>進程數
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
這樣就配置完畢了。重啓nagios服務,重啓以前檢查一下配置文件。
/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg
沒有任何問題的話,咱們從新啓動nagios
重啓完以後點擊host groups按鈕應該會出現2個組,每一個組裏面各監控一臺機器,本機監控了5項指標,3.64監控了9項指標。
假如你有新的機器要被監控,大家你須要修改監控主機、監控主機組、監控項三個配置文件。
假如你的一臺機器已經監控了,須要再監控一個端口什麼的,那麼你只須要修改監控項就OK了。總之呢?被監控機上不部署nagios插件之類的話,只能夠監控它的存活、端口開放等。可是監控不了cpu負載、進程數和磁盤使用率等之類指標。最後說一下,我也是菜鳥,有問題的地方請聯繫QQ:316189480
開始配置郵件報警
測試工做:你須要有一個合法的郵箱地址,固然能夠是不少,例如谷歌、yahoo、163、126、139郵箱等。隨便選取一個測試一下吧!
我這裏就拿監控機自帶的sendmail測試吧!
肯定sendmail已經啓動!
[root@slave3 etc]# /etc/init.d/sendmail status
sendmail (pid 5291 5282) is running...
執行以下命令:
echo 「test」 | mail 你的郵箱地址
過幾分鐘肯定你的郵箱能收到這封test內容的郵件
而後修改contacts.cfg文件,修改後的內容以下:
define contact{
contact_name zengzhunzhun
alias zengzhunzhun
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
email vfast_zengzz@yahoo.cn,zengzhunzhun@gmail.
com,zengzhun@126.com,13601298217@139.com
pager 13601298217
}
修改contactgroups.cfg文件,修改後的內容以下:
define contactgroup{
contactgroup_name sagroup
alias System Administrator
members zengzhunzhun
}
而後重啓nagios服務
/etc/init.d/nagios restart
接下來咱們的郵箱應該就能收到報警了!可能會有一點延遲,由於咱們使用的本機自帶的sendmail服務器不是合法的郵件服務器!我這裏這是簡單的測試一下。若是收不到的話,請查看/var/log/maillog是否發送成功,若是遇到下列相似錯誤的話那麼請修改nagios.cfg文件,將參數notification_timeout=30修改成120,時間單位是秒,修改後記得從新啓動nagios。若是生產環境須要的話還得配置郵件服務器!這就另說了!
[1346316317] Warning: Contact 'admin' service notification command '/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: 192.168.2.161\nState: UP\nAddress: 192.168.2.161\nInfo: PING OK - Packet loss = 0%, RTA = 5.50 ms\n\nDate/Time: Thu Aug 30 16:44:46 CST 2012\n" | /bin/mail -s "** PROBLEM Host Alert: 192.168.2.161 is UP **" vfast_zengzz@yahoo.cn' timed out after 30 seconds
開始配置短信報警
衆所周知移動推出的139郵箱是能夠接收短信的,意思就是移動的郵箱接收到郵件時候同時也會給綁定的手機發一封郵件!因此咱們只須要把你的contacts.cfg文件的郵箱改成你的移動139的郵箱就OK了!測試時能夠發送短信通知的。