監控需求php
某項目的應用服務器CPU和內存使用率的監控,經過zabbix系統監控記錄應用服務器上進程的CPU和內存的使用狀況,並以圖表的形式實時展示,以便於咱們分析服務器的性能瓶頸。css
監控方式mysql
利用zabbix監控系統的自動發現功能,首先編寫shell腳本獲取服務器的CPU和內存資源使用率最大的進程,以json的格式輸出,而後對這些進程的CPU和內存資源使用狀況進行監控。(本文監控的進程爲Linux服務器中資源使用率最高的10個進程。)nginx
缺點sql
不適用於監控固定的進程shell
首先使用top命令查看進程狀態,再取出進程的%CPU(該值表示單個CPU的進程從上次更新到如今的CPU時間佔用百分比) 和%MEM值。json
hmracdb2:~ # top top - 13:57:01 up 32 days, 5:21, 2 users, load average: 0.14, 0.26, 0.34 Tasks: 206 total, 1 running, 205 sleeping, 0 stopped, 0 zombie Cpu(s): 3.7%us, 2.7%sy, 0.0%ni, 87.2%id, 6.3%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 3926096k total, 3651612k used, 274484k free, 788120k buffers Swap: 4193276k total, 1369968k used, 2823308k free, 1443884k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2365 root 20 0 854m 315m 12m S 3 8.2 1252:49 ohasd.bin 5307 oracle 20 0 1783m 22m 22m S 3 0.6 1106:03 oracle 4532 root 20 0 676m 31m 13m S 2 0.8 853:35.32 crsd.bin 4272 grid RT 0 437m 282m 52m S 2 7.4 1006:47 ocssd.bin 5279 oracle 20 0 1771m 60m 48m S 2 1.6 477:11.19 oracle 5122 oracle 20 0 654m 15m 12m S 1 0.4 537:40.85 oraagent.bin
因爲top是交互的命令,咱們把top命令的結果輸出到一個文件上vim
hmracdb2:~ # top -b -n 1 > /tmp/.top.txtbash
第一個腳本,獲取監控進程內存資源佔有率前10的進程,輸出格式爲json格式,用於zabbix自動發現進程服務器
# cat discovery_process.sh #!/bin/bash #system process discovery script top -b -n 1 > /tmp/.top.txt && chown zabbix. /tmp/.top.txt proc_array=(`tail -n +8 /tmp/.top.txt | awk '{a[$NF]+=$10}END{for(k in a)print a[k],k}'|sort -gr|head -10|cut -d" " -f2`) length=${#proc_array[@]} printf "{\n" printf '\t'"\"data\":[" for ((i=0;i<$length;i++)) do printf "\n\t\t{" printf "\"{#PROCESS_NAME}\":\"${proc_array[$i]}\"}" if [ $i -lt $[$length-1] ];then printf "," fi done printf "\n\t]\n" printf "}\n"
或者
# cat discovery_process2.sh #!/bin/bash #system process discovery script top -b -n 1 > /tmp/.top.txt && chown zabbix. /tmp/.top.txt proc_array=`tail -n +8 /tmp/.top.txt | awk '{a[$NF]+=$10}END{for(k in a)print a[k],k}'|sort -gr|head -10|cut -d" " -f2` length=`echo "${proc_array}" | wc -l` count=0 echo '{' echo -e '\t"data":[' echo "$proc_array" | while read line do echo -en '\t\t{"{#PROCESS_NAME}":"'$line'"}' count=$(( $count + 1 )) if [ $count -lt $length ];then echo ',' fi done echo -e '\n\t]' echo '}'
輸出的效果以下
[root@Zabbix_19F ~]# ./discovery_process.sh { "data":[ {"{#PROCESS_NAME}":"mysqld"}, {"{#PROCESS_NAME}":"php-fpm"}, {"{#PROCESS_NAME}":"zabbix_server"}, {"{#PROCESS_NAME}":"nginx"}, {"{#PROCESS_NAME}":"sshd"}, {"{#PROCESS_NAME}":"bash"}, {"{#PROCESS_NAME}":"zabbix_agentd"}, {"{#PROCESS_NAME}":"qmgr"}, {"{#PROCESS_NAME}":"pickup"}, {"{#PROCESS_NAME}":"master"} ] }
第二個腳本,用於zabbix監控的具體監控項目(item)的key,經過腳本獲取第一個腳本自動發現的進程的CPU和內存的具體使用狀況與使用率。
#!/bin/bash #system process CPU&MEM use information #mail: mail@huangming.org mode=$1 name=$2 process=$3 mem_total=$(cat /proc/meminfo | grep "MemTotal" | awk '{printf "%.f",$2/1024}') cpu_total=$(( $(cat /proc/cpuinfo | grep "processor" | wc -l) * 100 )) function mempre { mem_pre=`tail -n +8 /tmp/.top.txt | awk '{a[$NF]+=$10}END{for(k in a)print a[k],k}' | grep "\b${process}\b" | cut -d" " -f1` echo "$mem_pre" } function memuse { mem_use=`tail -n +8 /tmp/.top.txt | awk '{a[$NF]+=$10}END{for(k in a)print a[k]/100*'''${mem_total}''',k}' | grep "\b${process}\b" | cut -d" " -f1` echo "$mem_use" | awk '{printf "%.f",$1*1024*1024}' } function cpuuse { cpu_use=`tail -n +8 /tmp/.top.txt | awk '{a[$NF]+=$9}END{for(k in a)print a[k],k}' | grep "\b${process}\b" | cut -d" " -f1` echo "$cpu_use" } function cpupre { cpu_pre=`tail -n +8 /tmp/.top.txt | awk '{a[$NF]+=$9}END{for(k in a)print a[k]/('''${cpu_total}'''),k}' | grep "\b${process}\b" | cut -d" " -f1` echo "$cpu_pre" } case $name in mem) if [ "$mode" = "pre" ];then mempre elif [ "$mode" = "avg" ];then memuse fi ;; cpu) if [ "$mode" = "pre" ];then cpupre elif [ "$mode" = "avg" ];then cpuuse fi ;; *) echo -e "Usage: $0 [mode : pre|avg] [mem|cpu] [process]" esac
咱們先來查看一下當前系統的內存和CPU大小狀況:
-- 內存 [root@Zabbix_19F ~]# cat /proc/meminfo | grep "MemTotal" | awk '{printf "%.f",$2/1024}' 3832 -- CPU [root@Zabbix_19F ~]# cat /proc/cpuinfo | grep "processor" | wc -l 8
執行腳本運行效果以下(獲取監控項key值)
[root@Zabbix_19F ~]# ./process_check.sh avg mem mysqld #輸出mysqld進程使用的內存(計算公式:3832*18.5/100) 708.92 [root@Zabbix_19F ~]# ./process_check.sh pre mem mysqld #輸出mysqld進程內存的使用率 18.5 [root@Zabbix_19F ~]# ./process_check.sh avg cpu mysqld #單個CPU的mysqld進程使用率 3.9 [root@Zabbix_19F ~]# ./process_check.sh pre cpu mysqld #全部CPU的mysqld進程的使用率 0.004875
配置zabbix_agentd,在agentd客戶端的etc/zabbix_agentd.conf中增長userparameter配置,增長進程自動發現的key,和進程資源檢測的key。
hmracdb2:/opt/zabbix # vim etc/zabbix_agentd.conf.d/userparameter_script.conf UserParameter=discovery.process,/opt/zabbix/scripts/discovery_process.sh UserParameter=process.check[*],/opt/zabbix/scripts/process_check.sh $1 $2 $3
配置完以後重啓agentd服務
hmracdb2:/opt/zabbix # service zabbix_agentd restart Shutting down zabbix_agentd done Starting zabbix_agentd done
在zabbix服務器端手動獲取監控項key值數據
[root@Zabbix_19F ~]# zabbix_get -p10050 -k 'discovery.process' -s 10.xxx.xxx.xxx { "data":[ {"{#PROCESS_NAME}":"ohasd.bin"}, {"{#PROCESS_NAME}":"ocssd.bin"}, {"{#PROCESS_NAME}":"oracle"}, {"{#PROCESS_NAME}":"oraagent.bin"}, {"{#PROCESS_NAME}":"crsd.bin"}, {"{#PROCESS_NAME}":"orarootagent.bi"}, {"{#PROCESS_NAME}":"watchdog/3"}, {"{#PROCESS_NAME}":"watchdog/2"}, {"{#PROCESS_NAME}":"watchdog/1"}, {"{#PROCESS_NAME}":"watchdog/0"} ] } [root@Zabbix_19F ~]# zabbix_get -p10050 -k 'process.check[pre,mem,oracle]' -s 10.xxx.xxx.xxx 2.9 [root@Zabbix_19F ~]# zabbix_get -p10050 -k 'process.check[avg,mem,oracle]' -s 10..xxx.xxx.xxx 111.186 [root@Zabbix_19F ~]# zabbix_get -p10050 -k 'process.check[avg,cpu,oracle]' -s 10..xxx.xxx.xxx 4 [root@Zabbix_19F ~]# zabbix_get -p10050 -k 'process.check[pre,cpu,oracle]' -s 10..xxx.xxx.xxx 0.01
配置完agentd後,在zabbix服務器配置Web端的模版與監控項目item
Configuration --> Templates --> Create template -->
建立完模版以後,添加自動發現規則
Discovery rules -->Create discovesy rule
Item prototypes --> Create item prototype
也能夠繼續添加監控的主機和所需監控項,添加完後咱們能夠查看下監控的歷史數據
添加一個進程的CPU使用率的監控項
查看歷史數據
固然還能夠獲取進程內存使用的具體大小狀況
至此,zabbix自動發現進程內存和CPU使用狀況並實時監控配置就完成了