想必各個公司都有部署zabbix之類的監控系統來監控服務器的資源使用狀況、各服務的運行狀態,是否這種監控就足夠了呢?有沒有遇到監控系統一切正常確發現項目沒法正常對外提供服務的狀況呢?本篇文章聊聊咱們如何簡單的使用Nagios監控業務的狀態php
文中的業務指用戶訪問的網站頁面,對外提供的API接口,移動端的APP等產品html
一般咱們會在項目所在的機房部署一套監控系統來監控咱們服務器和MySQL之類的公共服務,制定報警策略,在出現異常狀況的時候郵件或短信提醒咱們及時處理。mysql
此類監控主要的關注點有兩個:ios
同時也會存在如下兩個主要問題:nginx
那麼如何解決這兩個問題呢?web
業務狀態監控,就是要最直觀的的反映業務當前是正常仍是故障,該怎麼監控呢?以web項目爲例,首先就是要肯定具體URL的返回狀態,是200正常仍是404未找到等,其次要考慮頁面裏邊的內容是否是正常,咱們知道最終反饋給用戶內容的是由一些靜態資源和後端接口數據共同組成的HTML頁面,想知道內容究竟對不對這個比較困難,退而求其次咱們默認全部靜態資源和後端接口都返回正常狀態則表示正常,這個監控就比較容易實現了。redis
靜態資源能夠直接由nginx服務器處理,nginx的併發能力很強,通常不會成爲性能的瓶頸,針對靜態資源的監控咱們能夠結合ELK一塊兒來看。後端接口的處理性能就要差不少了,對業務狀態的監控也主要是對後端接口狀態的監控,那咱們是否須要監控全部的接口呢?這個實施起來比較麻煩,我以爲沒太大必要,只須要監控幾個有表明性的接口就能夠了,例如咱們全部的項目中都讓開發單獨加了一個health check的接口,這個接口的做用是鏈接項目全部用到的服務進行操做,如接口鏈接mysql進行數據查詢以肯定mysql能給正常提供服務,鏈接redis進行get、set操做以肯定redis服務正常,對於這個接口的監控就能覆蓋到整個鏈路的服務狀況。sql
對於監控服務器和業務服務器在同一個機房內所致使的問題(上邊講到的第二點問題),咱們能夠經過在不一樣的網絡環境內部署獨立的狀態監控來解決,例如辦公區部署Nagios,不一樣網絡監控也更接近用戶的網絡狀況,這套狀態監控就區別於機房部署的資源佔用監控了,主要用來監控業務的狀態,也就是咱們上邊提到的URL和接口狀態。後端
咱們能不能直接將監控部署在機房外的環境來節省一套監控呢?例如公司或者其餘的機房部署監控。這樣不是個好方案,跨網絡的監控性能太差了,首先網絡之間的延遲都比同機房內要大的多,其次大量監控項頻繁的數據傳輸對帶寬也是不小的壓力api
咱們業務狀態監控採用了Nagios,Nagios部署簡單配置靈活,這種場景下很是適合。
1.安裝基礎環境
# apt-get update # apt-get install -y build-essential libgd2-xpm-dev autoconf gcc libc6 make wget # apt-get install -y nginx php5-fpm spawn-fcgi fcgiwrap
2.下載並解壓nagios
# wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.0.8.tar.gz # tar -zxvf nagios-4.0.8.tar.gz # cd nagios-4.0.8 # ./configure && make all # make install-groups-users # usermod -a -G nagios www-data # make install # make install-init # make install-config # make install-commandmode # cd ..
(No output on stdout) stderr: execvp(/usr/local/nagios/libexec/check_ping, ...) failed. errno is 2: No such file or directory
,這是由於咱們只安裝了nagios的core,沒有安裝nagios的插件,須要安裝插件來支持core工做3.安裝nagios-plugins
# wget https://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz # tar -zxvf nagios-plugins-2.2.1.tar.gz # cd nagios-plugins-2.2.1 # ./configure # make # make install # cd ..
/usr/local/nagios/libexec/
下,能夠藉助這些插件來監控咱們的HTTP接口或主機、服務狀態4.建立nagios web訪問的帳號密碼
# vi /usr/local/bin/htpasswd.pl #!/usr/bin/perl use strict; if ( @ARGV != 2 ){ print "usage: /usr/local/bin/htpasswd.pl <username> <password>\n"; } else { print $ARGV[0].":".crypt($ARGV[1],$ARGV[1])."\n"; } # chmod +x /usr/local/bin/htpasswd.pl #利用perl腳本生成帳號密碼到htpasswd.users文件中 # /usr/local/bin/htpasswd.pl nagiosadmin nagios@ops-coffee > /usr/local/nagios/htpasswd.users
/usr/local/nagios/etc/cgi.cfg
5.nginx添加server配置,讓瀏覽器能夠訪問
server { listen 80; server_name ngs.domain.com; access_log /var/log/nginx/nagios.access.log; error_log /var/log/nginx/nagios.error.log; auth_basic "Private"; auth_basic_user_file /usr/local/nagios/htpasswd.users; root /usr/local/nagios/share; index index.php index.html; location / { try_files $uri $uri/ index.php /nagios; } location /nagios { alias /usr/local/nagios/share; } location ~ \.php$ { include /etc/nginx/fastcgi_params; fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; } location ~ ^/nagios/(.*\.php)$ { alias /usr/local/nagios/share/$1; include /etc/nginx/fastcgi_params; fastcgi_pass unix:/var/run/php5-fpm.sock; } location ~ \.cgi$ { root /usr/local/nagios/sbin/; rewrite ^/nagios/cgi-bin/(.*)\.cgi /$1.cgi break; fastcgi_param AUTH_USER $remote_user; fastcgi_param REMOTE_USER $remote_user; include /etc/nginx/fastcgi_params; fastcgi_pass unix:/var/run/fcgiwrap.socket; } }
6.檢查配置文件並啓動
#檢查配置文件是否有語法錯誤 # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #啓動nagios服務 # /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg #啓動fcgiwrap和php5-fpm服務 # service fcgiwrap restart # service php5-fpm restart
7.瀏覽器訪問服務器IP或域名就能夠看到nagios的頁面了,默認有本機的監控數據,不須要的話能夠在配置文件localhost.cfg
中刪除
Nagios的主配置文件路徑爲/usr/local/nagios/etc/nagios.cfg
,裏邊默認已經配置了一些配置文件的路徑,cfg_file=後邊配置的都是配置文件,nagios程序會來這裏讀取配置,咱們能夠新添加一個專門用來監控HTTP API的配置文件
cfg_file=/usr/local/nagios/etc/objects/check_api.cfg
define service{ use generic-service host_name localhost service_description web_project_01 check_command check_http!ops-coffee.cn -S } define service{ use generic-service host_name localhost service_description web_project_02 check_command check_http!ops-coffee.cn -S -u / -e 200 } define service{ use generic-service host_name localhost service_description web_project_03 check_command check_http!ops-coffee.cn -S -u /action/health -k "sign:e5dhn" }
/usr/local/nagios/etc/objects/templates.cfg
/usr/local/nagios/etc/objects/commands.cfg
SSL is not available
,那麼你須要先安裝libssl-dev包,而後從新編譯(./configure --with-openssl=/usr/bin/openssl
)部署nagios-plugin插件添加對ssl的支持define command { command_name check_http command_line $USER1$/check_http -H $ARG1$ }
check_http
就是咱們經過安裝nagios-plugin生成的,位於/usr/local/nagios/libexec/
下,check_http
的詳細用法能夠經過check_http -h查看
,支持比較普遍define service { name generic-service ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 2 ; Re-check the service up to 3 times in order to determine its final (hard) state check_interval 1 ; Check the service every 10 minutes under normal conditions retry_interval 1 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 60 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DON'T REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! }
配置太多就不一一解釋了,配合後邊的英文註釋應該看得懂,說幾個重要的
/usr/local/nagios/etc/objects/contacts.cfg
define contact{ contact_name sa ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined above) alias Nagios Admin ; Full name of user service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r host_notification_commands notify-host-by-email,notify-host-by-sms service_notification_commands notify-service-by-email,notify-service-by-sms email ops-coffee@domain.com pager 15821212121,15822222222 } define contactgroup{ contactgroup_name admins alias Nagios Administrators members sa }
/usr/local/nagios/etc/objects/commands.cfg
文件內所有配置完成後重啓nagios服務,會看到監控已經正常
介紹一款配合nagios用起來很是棒的插件Nagstamon,Nagstamon是一款nagios的桌面小工具(實際上如今不只僅能配合nagios使用,還能配合zabbix等使用),啓動後常駐系統托盤,當nagios監控狀態發生變化時會及時的跳出來併發出聲音警告,可以更加及時的獲取業務狀態。
配置以下:
若是你以爲文章對你有幫助,請轉發分享給更多的人。若是你以爲讀的不盡興,推薦閱讀如下文章: