咱們使用HAProxy+Keepalived的方式部署遊戲服務器前端負載均衡和高可用,所以須要對HAProxy的監控情況進行實時監控.html
本文使用的HAProxy版本是1.4.24前端
參考官方文檔http://cbonte.github.io/haproxy-dconv/configuration-1.4.html 中的node
9. Statistics and monitoringhttps://github.com/olindata/tribily-zabbix-templates/tree/master/App_HAProxy python
https://github.com/jlyheden/zabbix_scripts/tree/master/haproxy git
1.監控原理描述github
HAProxy提供HTTP頁面和狀態Unix Socket能夠顯示HAProxy的狀態信息,而且能夠以CSV的格式導出。web
HTTP頁面能夠經過相似http://10.10.41.100/status;csv 的方式查看redis
Unix Socket能夠經過shell
echo "show info;show stat" | sudo socat stdio unix-connect:/tmp/haproxy
json
本文主要經過第二種方式獲取HAProxy的狀態信息
在haproxy.cfg配置文件中設置狀態socket
stats socket /tmp/haproxy level admin
level後面能夠跟級別user,operator,admin
user是最低權限級別,只能看到一些非敏感信息
operator能夠看到所有信息,可是隻能修改一些非敏感信息
admin能夠看到而且操做全部信息,須要慎用
$echo "show help" | sudo socat stdio unix-connect:/tmp/haproxy
Unknown command. Please enter one of the following commands only :
clear counters : clear max statistics counters (add 'all' for all counters)
help : this message
prompt : toggle interactive mode with prompt
quit : disconnect
show info : report information about the running process
show stat : report counters for each proxy and server
show errors : report last request and response errors for each proxy
show sess [id] : report the list of current sessions or dump this session
get weight : report a server's current weight
set weight : change a server's weight
set timeout : change a timeout setting
disable server : set a server in maintenance mode
enable server : re-enable a server that was previously in maintenance mode
show info 報告當前的HAProxy進程信息
Name: HAProxy
Version: 1.4.24
Release_date: 2013/06/17
Nbproc: 1
Process_num: 1
Pid: 7020
Uptime: 110d 16h25m55s
Uptime_sec: 9563155
Memmax_MB: 0
Ulimit-n: 131101
Maxsock: 131101
Maxconn: 65536
Maxpipes: 0
CurrConns: 14
PipesUsed: 0
PipesFree: 0
Tasks: 26
Run_queue: 1
node: master_loadbalance1
description: lb1
show stat顯示HAProxy各個指標的計數
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkf
ail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_cod
e,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,
srv_abrt,
login_game_pool,FRONTEND,,,24,868,2000,196721023,87244966860,121969199234,0,0,171448,,,,,OPEN,,,,,,,,,1,1,0,,,,0,95,0,628
,,,,0,195071390,0,1619236,28338,2034,,93,611,196721000,,,
login_pool,web1_80,0,0,0,38,2000,8333681,2356031055,2827436427,,0,,0,3,2211,11,UP,30,1,0,902,0,9558963
,0,,1,2,1,,8329209,,2,1,,199,L7OK,200,1,20,7967292,0,361648,7,0,0,,,,136,0,
login_pool,web2_80,0,0,0,63,2000,8333998,2358035705,2826639220,,0,,1,6,2281,13,UP,30,1,0,861,0,9558963
0. pxname: proxy name 1. svname: service name (FRONTEND for frontend, BACKEND for backend, any name for server) 2. qcur: current queued requests 3. qmax: max queued requests 4. scur: current sessions 5. smax: max sessions 6. slim: sessions limit 7. stot: total sessions 8. bin: bytes in 9. bout: bytes out 10. dreq: denied requests 11. dresp: denied responses 12. ereq: request errors 13. econ: connection errors 14. eresp: response errors (among which srv_abrt) 15. wretr: retries (warning) 16. wredis: redispatches (warning) 17. status: status (UP/DOWN/NOLB/MAINT/MAINT(via)...) 18. weight: server weight (server), total weight (backend) 19. act: server is active (server), number of active servers (backend) 20. bck: server is backup (server), number of backup servers (backend) 21. chkfail: number of failed checks 22. chkdown: number of UP->DOWN transitions 23. lastchg: last status change (in seconds) 24. downtime: total downtime (in seconds) 25. qlimit: queue limit 26. pid: process id (0 for first instance, 1 for second, ...) 27. iid: unique proxy id 28. sid: service id (unique inside a proxy) 29. throttle: warm up status 30. lbtot: total number of times a server was selected 31. tracked: id of proxy/server if tracking is enabled 32. type (0=frontend, 1=backend, 2=server, 3=socket) 33. rate: number of sessions per second over last elapsed second 34. rate_lim: limit on new sessions per second 35. rate_max: max number of new sessions per second 36. check_status: status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TMOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx 37. check_code: layer5-7 code, if available 38. check_duration: time in ms took to finish last health check 39. hrsp_1xx: http responses with 1xx code 40. hrsp_2xx: http responses with 2xx code 41. hrsp_3xx: http responses with 3xx code 42. hrsp_4xx: http responses with 4xx code 43. hrsp_5xx: http responses with 5xx code 44. hrsp_other: http responses with other codes (protocol error) 45. hanafail: failed health checks details 46. req_rate: HTTP requests per second over last elapsed second 47. req_rate_max: max number of HTTP requests per second observed 48. req_tot: total number of HTTP requests received 49. cli_abrt: number of data transfers aborted by the client 50. srv_abrt: number of data transfers aborted by the server (inc. in eresp)
須要注意的是若是HAProxy是以多進程方式啓動即設置nbproc的值不爲1,那麼每一個進程均可以經過socket顯示它的狀態信息,因此看到的狀態信息是在多個進程間切換的。
2.監控腳本編寫
這裏有三個監控腳本
haproxy_info.sh 用於收集HAProxy的基本信息
haproxy_pool_discovery.py 用於zabbix經過LLD功能發現各個pool對,如login_pool:BACKEND,login_pool:web1_80等,經過低級發現能夠動態的根據配置文件中配置的後端主機監控各個後端主機的狀態
haproxy_stat.sh 經過向stat socket發送show stat命令收集各個狀態的值,腳本中會根據,進行判斷第二個字段的值,由於有些字段是隻有FRONTEND或BACKEND纔會有,或者除了FRONTEND和BACKEND,其餘都有等
haproxy_info.sh
#!/bin/bash #This script is used for getting haproxy info such as version ,uptime and number of processes etc metric=$1 stats_socket=/tmp/haproxy info_file=/tmp/haproxy_info.csv echo "show info"|/usr/bin/sudo /usr/bin/socat unix-connect:$stats_socket stdio > $info_file grep $metric $info_file|awk '{print $2}'
haproxy_pool_discovery.py
須要安裝socat而且要設置zabbxi客戶端用戶具備sudo權限執行socat
執行visudo命令更改
以下
# # Disable "ssh hostname sudo <cmd>", because it will show the password in clear. # You have to run "ssh -t hostname sudo <cmd>". # Defaults !requiretty zabbixagent ALL=(root) NOPASSWD:/usr/bin/socat
#/usr/bin/python #This script is used to discovery disk on the server import subprocess import json args='''echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy|egrep -v '^#|^$'|awk -F',' '{print $1":"$2}' ''' t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0] pools=[] for pool in t.split('\n'): if len(pool) != 0: pools.append({'{#POOL_NAME}':pool}) print json.dumps({'data':pools},indent=4,separators=(',',':'))
執行結果
{ "data":[ { "{#POOL_NAME}":"login_game_pool:FRONTEND" }, { "{#POOL_NAME}":"login_pool:web1_80" }, { "{#POOL_NAME}":"login_pool:web2_80" }, { "{#POOL_NAME}":"login_pool:BACKEND" }, ] }
haproxy_stat.sh
#!/bin/bash # login_game_pool:FRONTEND pool_name=$(echo $1|awk -F':' '{print $1}') server_name=$(echo $1|awk -F':' '{print $2}') metric=$2 stat_socket=/tmp/haproxy stat_file=/tmp/haproxy_stat.csv echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy > $stat_file case $metric in qcur) #current queued requests if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $3}' $stat_file else echo 0 fi ;; qmax) #max queued requests if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $4}' $stat_file else echo 0 fi ;; scur) #current sessions awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $5}' $stat_file ;; smax) #max sessions awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $6}' $stat_file ;; slim) #sessions limit awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $7}' $stat_file ;; stol) #total sessions awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $8}' $stat_file ;; bin) #bytes in awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $9}' $stat_file ;; bout) #bytes out awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $10}' $stat_file ;; dreq) #denied requests #only FRONTEND and BACKEND has this field if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $11}' $stat_file else echo 0 fi ;; dresp) #denied responses awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $12}' $stat_file ;; ereq) #request errors #only FRONTEND has this field if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $13}' $stat_file else echo 0 fi ;; econ) #connection errors #FRONTEND has not this field if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $14}' $stat_file else echo 0 fi ;; eresp) #response errors #FRONTEND has not this field if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $15}' $stat_file else echo 0 fi ;; status) #status awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $18}' $stat_file ;; chkfail) #number of failed checks #FRONTEND and BACKEND has not this field if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then echo 0 else awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $22}' $stat_file fi ;; chkdown) #number of UP->DOWN transitions #FRONTEND has not this field will return 0 if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $23}' $stat_file else echo 0 fi ;; lastchg) #last status change in seconds #FRONTEND has not this field will return 0 if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $24}' $stat_file else echo 0 fi ;; downtime) #total downtime in seconds #FRONTEND has not this field will return 0 if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $25}' $stat_file else echo 0 fi ;; lbtot) #total number of times a server was selected #FRONTEND has not this field if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $31}' $stat_file else echo 0 fi ;; rate) #number of sessions per second over last elapsed second awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $34}' $stat_file ;; rate_limit) #limit on new sessions per second #only FRONTEND has this field if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $35}' $stat_file else echo 0 fi ;; rate_max) #max number of new sessions per second awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $36}' $stat_file ;; check_status) #status of last health check if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then echo "NULL" else awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $37}' $stat_file fi ;; hrsp_1xx) #http response with 1xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $40}' $stat_file ;; hrsp_2xx) #http response with 2xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $41}' $stat_file ;; hrsp_3xx) #http response with 3xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $42}' $stat_file ;; hrsp_4xx) #http response with 4xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $43}' $stat_file ;; hrsp_5xx) #http response with 5xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $44}' $stat_file ;; req_rate) #HTTP requests per second over last elapsed second #only FRONTEND has this field,others will return 0 if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $47}' $stat_file else echo 0 fi ;; req_rate_max) #max number of HTTP requests per second observed #only FRONTEND has this field,others will return 0 if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $48}' $stat_file else echo 0 fi ;; req_tot) #total number of HTTP requests recevied #only FRONTEND has this field,others will return 0 if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $49}' $stat_file else echo 0 fi ;; *) echo "please input the correct argument" ;; esac
3.zabbix配置文件更改
添加haproxy_status.conf
### Option: UserParameter # User-defined parameter to monitor. There can be several user-defined parameters. # Format: UserParameter=<key>,<shell command> # See 'zabbix_agentd' directory for examples. # # Mandatory: no # Default: # UserParameter= UserParameter=haproxy.info[*],/usr/local/zabbix/bin/haproxy_info.sh $1 UserParameter=haproxy.discovery,/usr/bin/python /usr/local/zabbix/bin/haproxy_pool_discovery.py UserParameter=haproxy.stat[*],/usr/local/zabbix/bin/haproxy_stat.sh $1 $2
4.添加zabbix模板
詳細模板參考附件