之前整理的內容今天整理了一下,從新發一下,知足大家的各類奇葩的需求
1、 應用場景描述java
在目前公司的業務中,沒有太多使用ZooKeeper做爲協同服務的場景。可是咱們將使用Codis做爲Redis的集羣部署方案,Codis依賴ZooKeeper來存儲配置信息。因此作好ZooKeeper的監控也很重要。node
二 、ZooKeeper監控要點python
系統監控web
內存使用量 ZooKeeper應當徹底運行在內存中,不能使用到SWAP。Java Heap大小不能超過可用內存。shell
Swap使用量 使用Swap會下降ZooKeeper的性能,設置vm.swappiness = 0服務器
網絡帶寬佔用 若是發現ZooKeeper性能下降關注下網絡帶寬佔用狀況和丟包狀況,一般狀況下ZooKeeper是20%寫入80%讀入網絡
磁盤使用量 ZooKeeper數據目錄使用狀況須要注意session
磁盤I/O ZooKeeper的磁盤寫入是異步的,因此不會存在很大的I/O請求,若是ZooKeeper和其餘I/O密集型服務公用應該關注下磁盤I/O狀況app
ZooKeeper監控異步
zk_avg/min/max_latency 響應一個客戶端請求的時間,建議這個時間大於10個Tick就報警 zk_outstanding_requests 排隊請求的數量,當ZooKeeper超過了它的處理能力時,這個值會增大,建議設置報警閥值爲10 zk_packets_received 接收到客戶端請求的包數量 zk_packets_sent 發送給客戶單的包數量,主要是響應和通知 zk_max_file_descriptor_count 最大容許打開的文件數,由ulimit控制 zk_open_file_descriptor_count 打開文件數量,當這個值大於容許值得85%時報警 Mode 運行的角色,若是沒有加入集羣就是standalone,加入集羣式follower或者leader zk_followers leader角色纔會有這個輸出,集合中follower的個數。正常的值應該是集合成員的數量減1 zk_pending_syncs leader角色纔會有這個輸出,pending syncs的數量 zk_znode_count znodes的數量 zk_watch_count watches的數量 Java Heap Size ZooKeeper Java進程的
3、首先你要了解怎麼獲取zookeeper的狀態,具體內容以下:
查看哪一個節點被選擇做爲follower或者leader echo stat|nc 127.0.0.1 2181
測試是否啓動了該Server,若回覆imok表示已經啓動。 echo ruok|nc 127.0.0.1 2181
列出未經處理的會話和臨時節點。 echo dump| nc 127.0.0.1 2181
關掉server echo kill | nc 127.0.0.1 2181
輸出相關服務配置的詳細信息。 echo conf | nc 127.0.0.1 2181
列出全部鏈接到服務器的客戶端的徹底的鏈接 / 會話的詳細信息。 echo cons | nc 127.0.0.1 2181
輸出關於服務環境的詳細信息(區別於 conf 命令)。 echo envi |nc 127.0.0.1 2181
列出未經處理的請求。 echo reqs | nc 127.0.0.1 2181
列出服務器 watch 的詳細信息。 echo wchs | nc 127.0.0.1 2181
經過 session 列出服務器 watch 的詳細信息,它的輸出是一個與 watch 相關的會話的列表。 echo wchc | nc 127.0.0.1 2181
經過路徑列出服務器 watch 的詳細信息。它輸出一個與 session 相關的路徑。 echo wchp | nc 127.0.0.1 2181
#echo ruok|nc 127.0.0.1 2181 imok #echo mntr|nc 127.0.0.1 2181 zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 11 zk_packets_sent 10 zk_num_alive_connections 1 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 17159 zk_watch_count 0 zk_ephemerals_count 1 zk_approximate_data_size 6666471 zk_open_file_descriptor_count 29 zk_max_file_descriptor_count 102400 zk_followers 2 zk_synced_followers 2 zk_pending_syncs 0
#echo srvr|nc 127.0.0.1 2181 Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT Latency min/avg/max: 0/0/0 Received: 26 Sent: 25 Connections: 1 Outstanding: 0 Zxid: 0x500000000 Mode: leader Node count: 17159
4、 編寫Zabbix監控ZooKeeper的腳本和配置文件
參考了一片文章他用的是zabbix_sender去監控的他的方法我會先介紹,最後邊我修改了他的腳本和模板用zabbix_agent方式去監控在最後邊會給你們:
(1)其餘做者的文章:要讓Zabbix收集到這些監控數據,有兩種方法一種是每一個監控項目經過zabbix agent單獨獲取,主動監控和被動監控均可以。還有一種方法就是將這些監控數據一次性使用zabbix_sender所有發送給zabbix。這裏咱們選擇第二種方式。那麼採用zabbix_sender一次性發送所有監控數據的腳本就不能像經過zabbix agent這樣逐個獲取監控項目來編寫腳本。
首先想辦法將監控項目聚集成一個字典,而後遍歷這個字典,將字典中的key:value對經過zabbix_sender的-k和-o參數指定發送出去
echo mntr|nc 127.0.0.1 2181
這條命令可使用Python的subprocess模塊調用,也可使用socket模塊去訪問2181端口而後發送命令獲取數據,獲取到mntr執行的數據後還須要將其轉化成爲字典數據
即須要將這種樣式的數據
zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 91
zk_packets_sent 90
zk_num_alive_connections 1
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count 17159
zk_watch_count 0
zk_ephemerals_count 1
zk_approximate_data_size 6666471
zk_open_file_descriptor_count 27
zk_max_file_descriptor_count 102400
轉換成爲這樣的數據
{'zk_followers': 2, 'zk_outstanding_requests': 0, 'zk_approximate_data_size': 6666471, 'zk_packets_sent': 2089, 'zk_pending_syncs': 0, 'zk_avg_latency': 0, 'zk_version': '3.4.6-1569965, built on 02/20/2014 09:09 GMT', 'zk_watch_count': 2, 'zk_packets_received': 2090, 'zk_open_file_descriptor_count': 30, 'zk_server_ruok': 'imok', 'zk_server_state': 'leader', 'zk_synced_followers': 2, 'zk_max_latency': 28, 'zk_num_alive_connections': 2, 'zk_min_latency': 0, 'zk_ephemerals_count': 1, 'zk_znode_count': 17159, 'zk_max_file_descriptor_count': 102400}
到最後須要使用zabbix_sender發送的數據格式這個樣子的
zookeeper.status[zk_version]這是key的名稱
zookeeper.status[zk_outstanding_requests]:0
zookeeper.status[zk_approximate_data_size]:6666471
zookeeper.status[zk_packets_sent]:48
zookeeper.status[zk_avg_latency]:0
zookeeper.status[zk_version]:3.4.6-1569965, built on 02/20/2014 09:09 GMT
zookeeper.status[zk_watch_count]:0
zookeeper.status[zk_packets_received]:49
zookeeper.status[zk_open_file_descriptor_count]:27
zookeeper.status[zk_server_ruok]:imok
zookeeper.status[zk_server_state]:follower
zookeeper.status[zk_max_latency]:0
zookeeper.status[zk_num_alive_connections]:1
zookeeper.status[zk_min_latency]:0
zookeeper.status[zk_ephemerals_count]:1
zookeeper.status[zk_znode_count]:17159
zookeeper.status[zk_max_file_descriptor_count]:102400
精簡代碼以下:
#!/usr/bin/python import socket #from StringIO import StringIO from cStringIO import StringIO s=socket.socket() s.connect(('localhost',2181)) s.send('mntr') data_mntr=s.recv(2048) s.close() #print data_mntr h=StringIO(data_mntr) result={} zresult={} for line in h.readlines(): key,value=map(str.strip,line.split('\t')) zkey='zookeeper.status' + '[' + key + ']' zvalue=value result[key]=value zresult[zkey]=zvalue print result print '\n\n' print zresult
#python test.py
{'zk_outstanding_requests': '0', 'zk_approximate_data_size': '6666471', 'zk_max_latency': '0', 'zk_avg_latency': '0', 'zk_version': '3.4.6-1569965, built on 02/20/2014 09:09 GMT', 'zk_watch_count': '0', 'zk_num_alive_connections': '1', 'zk_open_file_descriptor_count': '27', 'zk_server_state': 'follower', 'zk_packets_sent': '542', 'zk_packets_received': '543', 'zk_min_latency': '0', 'zk_ephemerals_count': '1', 'zk_znode_count': '17159', 'zk_max_file_descriptor_count': '102400'}
{'zookeeper.status[zk_watch_count]': '0', 'zookeeper.status[zk_avg_latency]': '0', 'zookeeper.status[zk_max_latency]': '0', 'zookeeper.status[zk_approximate_data_size]': '6666471', 'zookeeper.status[zk_server_state]': 'follower', 'zookeeper.status[zk_num_alive_connections]': '1', 'zookeeper.status[zk_min_latency]': '0', 'zookeeper.status[zk_outstanding_requests]': '0', 'zookeeper.status[zk_packets_received]': '543', 'zookeeper.status[zk_ephemerals_count]': '1', 'zookeeper.status[zk_znode_count]': '17159', 'zookeeper.status[zk_packets_sent]': '542', 'zookeeper.status[zk_open_file_descriptor_count]': '27', 'zookeeper.status[zk_max_file_descriptor_count]': '102400', 'zookeeper.status[zk_version]': '3.4.6-1569965, built on 02/20/2014 09:09 GMT'}
詳細代碼以下:
#!/usr/bin/python """ Check Zookeeper Cluster zookeeper version should be newer than 3.4.x #echo mntr|nc 127.0.0.1 2181 zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT zk_avg_latency 0 zk_max_latency 4 zk_min_latency 0 zk_packets_received 84467 zk_packets_sent 84466 zk_num_alive_connections 3 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 17159 zk_watch_count 2 zk_ephemerals_count 1 zk_approximate_data_size 6666471 zk_open_file_descriptor_count 29 zk_max_file_descriptor_count 102400 #echo ruok|nc 127.0.0.1 2181 imok """ import sys import socket import re import subprocess from StringIO import StringIO import os zabbix_sender = '/opt/app/zabbix/sbin/zabbix_sender' zabbix_conf = '/opt/app/zabbix/conf/zabbix_agentd.conf' send_to_zabbix = 1 ############# get zookeeper server status class ZooKeeperServer(object): def __init__(self, host='localhost', port='2181', timeout=1): self._address = (host, int(port)) self._timeout = timeout self._result = {} def _create_socket(self): return socket.socket() def _send_cmd(self, cmd): """ Send a 4letter word command to the server """ s = self._create_socket() s.settimeout(self._timeout) s.connect(self._address) s.send(cmd) data = s.recv(2048) s.close() return data def get_stats(self): """ Get ZooKeeper server stats as a map """ data_mntr = self._send_cmd('mntr') data_ruok = self._send_cmd('ruok') if data_mntr: result_mntr = self._parse(data_mntr) if data_ruok: result_ruok = self._parse_ruok(data_ruok) self._result = dict(result_mntr.items() + result_ruok.items()) if not self._result.has_key('zk_followers') and not self._result.has_key('zk_synced_followers') and not self._result.has_key('zk_pending_syncs'): ##### the tree metrics only exposed on leader role zookeeper server, we just set the followers' to 0 leader_only = {'zk_followers':0,'zk_synced_followers':0,'zk_pending_syncs':0} self._result = dict(result_mntr.items() + result_ruok.items() + leader_only.items() ) return self._result def _parse(self, data): """ Parse the output from the 'mntr' 4letter word command """ h = StringIO(data) result = {} for line in h.readlines(): try: key, value = self._parse_line(line) result[key] = value except ValueError: pass # ignore broken lines return result def _parse_ruok(self, data): """ Parse the output from the 'ruok' 4letter word command """ h = StringIO(data) result = {} ruok = h.readline() if ruok: result['zk_server_ruok'] = ruok return result def _parse_line(self, line): try: key, value = map(str.strip, line.split('\t')) except ValueError: raise ValueError('Found invalid line: %s' % line) if not key: raise ValueError('The key is mandatory and should not be empty') try: value = int(value) except (TypeError, ValueError): pass return key, value def get_pid(self): #ps -ef|grep java|grep zookeeper|awk '{print $2}' pidarg = '''ps -ef|grep java|grep zookeeper|grep -v grep|awk '{print $2}' ''' pidout = subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE) pid = pidout.stdout.readline().strip('\n') return pid def send_to_zabbix(self, metric): key = "zookeeper.status[" + metric + "]" if send_to_zabbix > 0: #print key + ":" + str(self._result[metric]) try: subprocess.call([zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", str(self._result[metric]) ], stdout=FNULL, stderr=FNULL, shell=False) except OSError, detail: print "Something went wrong while exectuting zabbix_sender : ", detail else: print "Simulation: the following command would be execucted :\n", zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", self._result[metric], "\n" def usage(): """Display program usage""" print "\nUsage : ", sys.argv[0], " alive|all" print "Modes : \n\talive : Return pid of running zookeeper\n\tall : Send zookeeper stats as well" sys.exit(1) accepted_modes = ['alive', 'all'] if len(sys.argv) == 2 and sys.argv[1] in accepted_modes: mode = sys.argv[1] else: usage() zk = ZooKeeperServer() #print zk.get_stats() pid = zk.get_pid() if pid != "" and mode == 'all': zk.get_stats() #print zk._result FNULL = open(os.devnull, 'w') for key in zk._result: zk.send_to_zabbix(key) FNULL.close() print pid elif pid != "" and mode == "alive": print pid else: print 0
zabbix配置文件check_zookeeper.conf
UserParameter=zookeeper.status[*],/usr/bin/python /opt/app/zabbix/sbin/check_zookeeper.py $1
重啓agentd完成監控
5、注意上述方法不做爲大家參考,想知道agentd的作法怎麼作嗎?
理論部分再也不復述那麼腳本部分
#!/usr/bin/python #Author:Lin hu chong chong chong """ Check Zookeeper Cluster zookeeper version should be newer than 3.4.x # echo mntr|nc 127.0.0.1 2181 zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT zk_avg_latency 0 zk_max_latency 4 zk_min_latency 0 zk_packets_received 84467 zk_packets_sent 84466 zk_num_alive_connections 3 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 17159 zk_watch_count 2 zk_ephemerals_count 1 zk_approximate_data_size 6666471 zk_open_file_descriptor_count 29 zk_max_file_descriptor_count 102400 # echo ruok|nc 127.0.0.1 2181 imok """ import sys import socket import re import subprocess from StringIO import StringIO import os zabbix_sender = '/data/zabbix/bin/zabbix_sender' zabbix_conf = '/data/zabbix/etc/zabbix_agentd.conf' send_to_zabbix = 1 ############# get zookeeper server status class ZooKeeperServer(object): def __init__(self, host='localhost', port='2181', timeout=1): self._address = (host, int(port)) self._timeout = timeout self._result = {} def _create_socket(self): return socket.socket() def _send_cmd(self, cmd): """ Send a 4letter word command to the server """ s = self._create_socket() s.settimeout(self._timeout) s.connect(self._address) s.send(cmd) data = s.recv(2048) s.close() return data def get_stats(self): """ Get ZooKeeper server stats as a map """ data_mntr = self._send_cmd('mntr') data_ruok = self._send_cmd('ruok') if data_mntr: result_mntr = self._parse(data_mntr) if data_ruok: result_ruok = self._parse_ruok(data_ruok) self._result = dict(result_mntr.items() + result_ruok.items()) if not self._result.has_key('zk_followers') and not self._result.has_key('zk_synced_followers') and not self._result.has_key('zk_pending_syncs'): ##### the tree metrics only exposed on leader role zookeeper server, we just set the followers' to 0 leader_only = {'zk_followers':0,'zk_synced_followers':0,'zk_pending_syncs':0} self._result = dict(result_mntr.items() + result_ruok.items() + leader_only.items() ) return self._result def _parse(self, data): """ Parse the output from the 'mntr' 4letter word command """ h = StringIO(data) result = {} for line in h.readlines(): try: key, value = self._parse_line(line) result[key] = value except ValueError: pass # ignore broken lines return result def _parse_ruok(self, data): """ Parse the output from the 'ruok' 4letter word command """ h = StringIO(data) result = {} ruok = h.readline() if ruok: result['zk_server_ruok'] = ruok return result def _parse_line(self, line): try: key, value = map(str.strip, line.split('\t')) except ValueError: raise ValueError('Found invalid line: %s' % line) if not key: raise ValueError('The key is mandatory and should not be empty') try: value = int(value) except (TypeError, ValueError): pass return key, value def get_pid(self): arg_dict ={} pidarg = '''echo mntr | nc 127.0.0.1 2181 ''' pidout = subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE) line = pidout.stdout.readline().strip('\n') while line: al = line.split("\t") if al[0] == 'zk_version': value = al[1][:al[1].find('-')] arg_dict[al[0]] = value else: arg_dict[al[0]] = al[1] line = pidout.stdout.readline().strip('\n') pidarg = '''echo srvr | nc 127.0.0.1 2181 ''' pidout = subprocess.Popen(pidarg, shell=True, stdout=subprocess.PIPE) line = pidout.stdout.readline().strip('\n') while line: al = line.split(":") arg_dict[al[0].strip(" ")] = al[1].strip(" ") line = pidout.stdout.readline().strip('\n') pidarg = '''echo ruok | nc 127.0.0.1 2181 ''' pidout = subprocess.Popen(pidarg, shell=True, stdout=subprocess.PIPE) line = pidout.stdout.readline().strip('\n') arg_dict['ruok'] = line pidarg = '''ps -ef|grep java|grep zookeeper|grep -v grep|awk '{print $2}' ''' pidout = subprocess.Popen(pidarg, shell=True, stdout=subprocess.PIPE) line = pidout.stdout.readline().strip('\n') arg_dict['all'] = line arg_dict['alive'] = line return arg_dict def send_to_zabbix(self, metric): key = "zookeeper.status[" + metric + "]" if send_to_zabbix > 0: #print key + ":" + str(self._result[metric]) try: subprocess.call([zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", str(self._result[metric]) ], stdout=FNULL, stderr=FNULL, shell=False) except OSError, detail: print "Something went wrong while exectuting zabbix_sender : ", detail else: print "Simulation: the following command would be execucted :\n", zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", self._result[metric], "\n" def usage(): """Display program usage""" print "\nUsage : ", sys.argv[0], " alive|all" print "Modes : \n\talive : Return pid of running zookeeper\n\tall : Send zookeeper stats as well" sys.exit(1) if len(sys.argv) == 2: mode = sys.argv[1] else: usage() zk = ZooKeeperServer() pid = zk.get_pid() if pid and mode in pid.keys(): print pid.get(mode)
腳本給你們了添加key
UserParameter=zookeeper.status[*],/usr/bin/python/etc/zabbix/scripts/check_zookeeper.py $1
重啓agentd去服務端zabbix_get一下,返回正常
6、web上添加模板完成監控
模板修改過的直接去個人百度網盤拉吧:
https://pan.baidu.com/s/1eI-A74h4egXEqgbvO-JiiA 密碼:bxw0