Open-Falcon 是小米運維部開源的一款互聯網企業級監控系統解決方案,具體的安裝和使用說明請見官網:http://open-falcon.org/,是一款比較全的監控。並且提供各類API,只須要把數據按照規定給出就能出圖,以及報警、集羣支持等等。python
1) MySQL 收集信息腳本(mysql_monitor.py)mysql
#!/bin/env python # -*- encoding: utf-8 -*- from __future__ import division import MySQLdb import datetime import time import os import sys import fileinput import requests import json import re class MySQLMonitorInfo(): def __init__(self,host,port,user,password): self.host = host self.port = port self.user = user self.password = password def stat_info(self): try: m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8') query = "SHOW GLOBAL STATUS" cursor = m.cursor() cursor.execute(query) Str_string = cursor.fetchall() Status_dict = {} for Str_key,Str_value in Str_string: Status_dict[Str_key] = Str_value cursor.close() m.close() return Status_dict except Exception, e: print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S") print e Status_dict = {} return Status_dict def engine_info(self): try: m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8') _engine_regex = re.compile(ur'(History list length) ([0-9]+\.?[0-9]*)\n') query = "SHOW ENGINE INNODB STATUS" cursor = m.cursor() cursor.execute(query) Str_string = cursor.fetchone() a,b,c = Str_string cursor.close() m.close() return dict(_engine_regex.findall(c)) except Exception, e: print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S") print e return dict(History_list_length=0) if __name__ == '__main__': open_falcon_api = 'http://192.168.200.86:1988/v1/push' db_list= [] for line in fileinput.input(): db_list.append(line.strip()) for db_info in db_list: # host,port,user,password,endpoint,metric = db_info.split(',') host,port,user,password,endpoint = db_info.split(',') timestamp = int(time.time()) step = 60 # tags = "port=%s" %port tags = "" conn = MySQLMonitorInfo(host,int(port),user,password) stat_info = conn.stat_info() engine_info = conn.engine_info() mysql_stat_list = [] monitor_keys = [ ('Com_select','COUNTER'), ('Qcache_hits','COUNTER'), ('Com_insert','COUNTER'), ('Com_update','COUNTER'), ('Com_delete','COUNTER'), ('Com_replace','COUNTER'), ('MySQL_QPS','COUNTER'), ('MySQL_TPS','COUNTER'), ('ReadWrite_ratio','GAUGE'), ('Innodb_buffer_pool_read_requests','COUNTER'), ('Innodb_buffer_pool_reads','COUNTER'), ('Innodb_buffer_read_hit_ratio','GAUGE'), ('Innodb_buffer_pool_pages_flushed','COUNTER'), ('Innodb_buffer_pool_pages_free','GAUGE'), ('Innodb_buffer_pool_pages_dirty','GAUGE'), ('Innodb_buffer_pool_pages_data','GAUGE'), ('Bytes_received','COUNTER'), ('Bytes_sent','COUNTER'), ('Innodb_rows_deleted','COUNTER'), ('Innodb_rows_inserted','COUNTER'), ('Innodb_rows_read','COUNTER'), ('Innodb_rows_updated','COUNTER'), ('Innodb_os_log_fsyncs','COUNTER'), ('Innodb_os_log_written','COUNTER'), ('Created_tmp_disk_tables','COUNTER'), ('Created_tmp_tables','COUNTER'), ('Connections','COUNTER'), ('Innodb_log_waits','COUNTER'), ('Slow_queries','COUNTER'), ('Binlog_cache_disk_use','COUNTER') ] for _key,falcon_type in monitor_keys: if _key == 'MySQL_QPS': _value = int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0)) elif _key == 'MySQL_TPS': _value = int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0)) elif _key == 'Innodb_buffer_read_hit_ratio': try: _value = round((int(stat_info.get('Innodb_buffer_pool_read_requests',0)) - int(stat_info.get('Innodb_buffer_pool_reads',0)))/int(stat_info.get('Innodb_buffer_pool_read_requests',0)) * 100,3) except ZeroDivisionError: _value = 0 elif _key == 'ReadWrite_ratio': try: _value = round((int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0)))/(int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))),2) except ZeroDivisionError: _value = 0 else: _value = int(stat_info.get(_key,0)) falcon_format = { 'Metric': '%s' % (_key), 'Endpoint': endpoint, 'Timestamp': timestamp, 'Step': step, 'Value': _value, 'CounterType': falcon_type, 'TAGS': tags } mysql_stat_list.append(falcon_format) #_key : History list length for _key,_value in engine_info.items(): _key = "Undo_Log_Length" falcon_format = { 'Metric': '%s' % (_key), 'Endpoint': endpoint, 'Timestamp': timestamp, 'Step': step, 'Value': int(_value), 'CounterType': "GAUGE", 'TAGS': tags } mysql_stat_list.append(falcon_format) print json.dumps(mysql_stat_list,sort_keys=True,indent=4) requests.post(open_falcon_api, data=json.dumps(mysql_stat_list))
指標說明:收集指標裏的COUNTER表示每秒執行次數,GAUGE表示直接輸出值。git
指標 | 類型 | 說明 |
Undo_Log_Length | GAUGE | 未清除的Undo事務數 |
Com_select | COUNTER | select/秒=QPS |
Com_insert | COUNTER | insert/秒 |
Com_update | COUNTER | update/秒 |
Com_delete | COUNTER | delete/秒 |
Com_replace | COUNTER | replace/秒 |
MySQL_QPS | COUNTER | QPS |
MySQL_TPS | COUNTER | TPS |
ReadWrite_ratio | GAUGE | 讀寫比例 |
Innodb_buffer_pool_read_requests | COUNTER | innodb buffer pool 讀次數/秒 |
Innodb_buffer_pool_reads | COUNTER | Disk 讀次數/秒 |
Innodb_buffer_read_hit_ratio | GAUGE | innodb buffer pool 命中率 |
Innodb_buffer_pool_pages_flushed | COUNTER | innodb buffer pool 刷寫到磁盤的頁數/秒 |
Innodb_buffer_pool_pages_free | GAUGE | innodb buffer pool 空閒頁的數量 |
Innodb_buffer_pool_pages_dirty | GAUGE | innodb buffer pool 髒頁的數量 |
Innodb_buffer_pool_pages_data | GAUGE | innodb buffer pool 數據頁的數量 |
Bytes_received | COUNTER | 接收字節數/秒 |
Bytes_sent | COUNTER | 發送字節數/秒 |
Innodb_rows_deleted | COUNTER | innodb表刪除的行數/秒 |
Innodb_rows_inserted | COUNTER | innodb表插入的行數/秒 |
Innodb_rows_read | COUNTER | innodb表讀取的行數/秒 |
Innodb_rows_updated | COUNTER | innodb表更新的行數/秒 |
Innodb_os_log_fsyncs | COUNTER | Redo Log fsync次數/秒 |
Innodb_os_log_written | COUNTER | Redo Log 寫入的字節數/秒 |
Created_tmp_disk_tables | COUNTER | 建立磁盤臨時表的數量/秒 |
Created_tmp_tables | COUNTER | 建立內存臨時表的數量/秒 |
Connections | COUNTER | 鏈接數/秒 |
Innodb_log_waits | COUNTER | innodb log buffer不足等待的數量/秒 |
Slow_queries | COUNTER | 慢查詢數/秒 |
Binlog_cache_disk_use | COUNTER | Binlog Cache不足的數量/秒 |
使用說明:讀取配置到都數據庫列表執行,配置文件格式以下(mysqldb_list.txt):github
IP,Port,User,Password,endpointredis
192.168.2.21,3306,root,123,mysql-21:3306 192.168.2.88,3306,root,123,mysql-88:3306
最後執行:sql
python mysql_monitor.py mysqldb_list.txt
2) Redis 收集信息腳本(redis_monitor.py)mongodb
#!/bin/env python #-*- coding:utf-8 -*- import json import time import re import redis import requests import fileinput import datetime class RedisMonitorInfo(): def __init__(self,host,port,password): self.host = host self.port = port self.password = password def stat_info(self): try: r = redis.Redis(host=self.host, port=self.port, password=self.password) stat_info = r.info() return stat_info except Exception, e: print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S") print e return dict() def cmdstat_info(self): try: r = redis.Redis(host=self.host, port=self.port, password=self.password) cmdstat_info = r.info('Commandstats') return cmdstat_info except Exception, e: print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S") print e return dict() if __name__ == '__main__': open_falcon_api = 'http://192.168.200.86:1988/v1/push' db_list= [] for line in fileinput.input(): db_list.append(line.strip()) for db_info in db_list: # host,port,password,endpoint,metric = db_info.split(',') host,port,password,endpoint = db_info.split(',') timestamp = int(time.time()) step = 60 falcon_type = 'COUNTER' # tags = "port=%s" %port tags = "" conn = RedisMonitorInfo(host,port,password) #查看各個命令每秒執行次數 redis_cmdstat_dict = {} redis_cmdstat_list = [] cmdstat_info = conn.cmdstat_info() for cmdkey in cmdstat_info: redis_cmdstat_dict[cmdkey] = cmdstat_info[cmdkey]['calls'] for _key,_value in redis_cmdstat_dict.items(): falcon_format = { 'Metric': '%s' % (_key), 'Endpoint': endpoint, 'Timestamp': timestamp, 'Step': step, 'Value': int(_value), 'CounterType': falcon_type, 'TAGS': tags } redis_cmdstat_list.append(falcon_format) #查看Redis各類狀態,根據須要增刪監控項,str的值須要轉換成int redis_stat_list = [] monitor_keys = [ ('connected_clients','GAUGE'), ('blocked_clients','GAUGE'), ('used_memory','GAUGE'), ('used_memory_rss','GAUGE'), ('mem_fragmentation_ratio','GAUGE'), ('total_commands_processed','COUNTER'), ('rejected_connections','COUNTER'), ('expired_keys','COUNTER'), ('evicted_keys','COUNTER'), ('keyspace_hits','COUNTER'), ('keyspace_misses','COUNTER'), ('keyspace_hit_ratio','GAUGE'), ('keys_num','GAUGE'), ] stat_info = conn.stat_info() for _key,falcon_type in monitor_keys: #計算命中率 if _key == 'keyspace_hit_ratio': try: _value = round(float(stat_info.get('keyspace_hits',0))/(int(stat_info.get('keyspace_hits',0)) + int(stat_info.get('keyspace_misses',0))),4)*100 except ZeroDivisionError: _value = 0 #碎片率是浮點數 elif _key == 'mem_fragmentation_ratio': _value = float(stat_info.get(_key,0)) #拿到key的數量 elif _key == 'keys_num': _value = 0 for i in range(16): _key = 'db'+str(i) _num = stat_info.get(_key) if _num: _value += int(_num.get('keys')) _key = 'keys_num' #其餘的都採集成counter,int else: try: _value = int(stat_info[_key]) except: continue falcon_format = { 'Metric': '%s' % (_key), 'Endpoint': endpoint, 'Timestamp': timestamp, 'Step': step, 'Value': _value, 'CounterType': falcon_type, 'TAGS': tags } redis_stat_list.append(falcon_format) load_data = redis_stat_list+redis_cmdstat_list print json.dumps(load_data,sort_keys=True,indent=4) requests.post(open_falcon_api, data=json.dumps(load_data))
指標說明:收集指標裏的COUNTER表示每秒執行次數,GAUGE表示直接輸出值。數據庫
指標 | 類型 | 說明 |
connected_clients | GAUGE | 鏈接的客戶端個數 |
blocked_clients | GAUGE | 被阻塞客戶端的數量 |
used_memory | GAUGE | Redis分配的內存的總量 |
used_memory_rss | GAUGE | OS分配的內存的總量 |
mem_fragmentation_ratio | GAUGE | 內存碎片率,used_memory_rss/used_memory |
total_commands_processed | COUNTER | 每秒執行的命令數,比較準確的QPS |
rejected_connections | COUNTER | 被拒絕的鏈接數/秒 |
expired_keys | COUNTER | 過時KEY的數量/秒 |
evicted_keys | COUNTER | 被驅逐KEY的數量/秒 |
keyspace_hits | COUNTER | 命中KEY的數量/秒 |
keyspace_misses | COUNTER | 未命中KEY的數量/秒 |
keyspace_hit_ratio | GAUGE | KEY的命中率 |
keys_num | GAUGE | KEY的數量 |
cmd_* | COUNTER | 各類名字都執行次數/秒 |
使用說明:讀取配置到都數據庫列表執行,配置文件格式以下(redisdb_list.txt):json
IP,Port,Password,endpointapi
192.168.1.56,7021,zhoujy,redis-56:7021 192.168.1.55,7021,zhoujy,redis-55:7021
最後執行:
python redis_monitor.py redisdb_list.txt
3) MongoDB 收集信息腳本(mongodb_monitor.py)
...後續添加
4)其餘相關的監控(須要裝上agent),好比下面的指標:
告警項 | 觸發條件 | 備註 |
---|---|---|
load.1min | all(#3)>10 | Redis服務器過載,處理能力降低 |
cpu.idle | all(#3)<10 | CPU idle太低,處理能力降低 |
df.bytes.free.percent | all(#3)<20 | 磁盤可用空間百分比低於20%,影響從庫RDB和AOF持久化 |
mem.memfree.percent | all(#3)<15 | 內存剩餘低於15%,Redis有OOM killer和使用swap的風險 |
mem.swapfree.percent | all(#3)<80 | 使用20% swap,Redis性能降低或OOM風險 |
net.if.out.bytes | all(#3)>94371840 | 網絡出口流量超90MB,影響Redis響應 |
net.if.in.bytes | all(#3)>94371840 | 網絡入口流量超90MB,影響Redis響應 |
disk.io.util | all(#3)>90 | 磁盤IO可能存負載,影響從庫持久化和阻塞寫 |
https://github.com/iambocai/falcon-monit-scripts(redis monitor)
https://github.com/ZhuoRoger/redismon(redis monitor)