安裝請看http://www.javashuo.com/article/p-bfwlieic-do.html ,最好是對應的版本組件,不然可能會有差異。html
(一)prometheus + grafana + alertmanager 配置主機監控node
(二)prometheus + grafana + alertmanager 配置Mysql監控python
(三)prometheus + grafana + alertmanager 配置Redis監控mysql
(四)prometheus + grafana + alertmanager 配置Kafka監控web
(五)prometheus + grafana + alertmanager 配置ES監控sql
(二)prometheus + grafana + alertmanager 配置Mysql監控json
mysqld_exporter安裝與配置vim
A. mysqld服務安裝在每臺Linux服務器上api
下載mysqld_exporter到每臺mysqld服務器上(下載地址: https://pan.baidu.com/s/1pW7RptzXa3LqFlO5zxJXPw ),並解壓到/data/monitor/下服務器
安裝go環境, yum install go -y
用root用戶鏈接當前mysql,受權監控用戶
mysql> GRANT REPLICATION CLIENT,PROCESS ON *.* TO 'mysql_monitor'@'localhost' identified by 'Jvsa09OodhvS0VKQ'; mysql> FLUSH PRIVILEGES; |
cd /data/monitor/mysqld_exporter下,建立.my.cnf文件,vim .my.cnf
[client]
host=10.8.4.126
port=3306
user=mysql_monitor
password=Jvsa09OodhvS0VKQ
啓動mysqld_exporter /data/monitor/mysqld_exporter/bin/mysqld_exporter -config.my-cnf="/data/monitor/.my.cnf" &
B. 使用的是雲商的mysql db(咱們使用的是ucloud的udb,下面的都按這個來實現,都差很少)
下載mysqld_exporter到prometheus服務器上((登錄到prometheus服務器,prometheus grafana alertmanager在同一臺服務器上)下載地址: https://pan.baidu.com/s/1MNPbhoZEvVV4lf1bVXWJ1g ),並解壓到/data/monitor/下
若是沒有安裝go環境, yum install go -y
用root用戶鏈接當前mysql,受權監控用戶
mysql> GRANT REPLICATION CLIENT,PROCESS ON *.* TO 'mysql_monitor'@'%' identified by 'Jvsa09OodhvS0VKQ'; mysql> FLUSH PRIVILEGES; |
cd /data/monitor/mysqld_exporter下,建立.my.cnf文件夾,而後在文件下建立每一個db的鏈接配置文件。如下是一個的實例,其它的請參照這個來建立。
cat /data/monitor/mysqld_exporter/.my.cnf/.ba_master_10.8.4.126_3306_15049.cnf
[client]
host=10.8.4.126
port=3306
user=mysql_monitor
password=Jvsa09OodhvS0VKQ
而後cd /data/monitor/mysqld_exporter/scripts下,建立各個mysqld_exporter的啓動腳本,下面是一個mysql db 的mysqld_exporter啓動腳本,其它請參照這個來建立,注意監聽的端口要不一樣和調用的.my.cnf文件要對應,
cat /data/monitor/mysqld_exporter/scripts/ba_master_10.8.4.126_3306_15049.sh
nohup /data/monitor/mysqld_exporter/bin/mysqld_exporter -web.listen-address=':15049' -config.my-cnf=/data/monitor/mysqld_exporter/.my.cnf/.ba_master_10.8.4.126_3306_15049.cnf -collect.info_schema.tables=false >> /data/monitor/mysqld_exporter/log/15049_10.8.4.126_3306.log 2>&1 &
因爲/data/monitor/mysqld_exporter/scripts/下有不少個mysql db 的mysqld_exporter啓動腳本,因此咱們cd /data/monitor/mysqld_exporter下,而後 sh start.sh進行啓動,而後檢查各個端口是否已監聽。
2. 配置prometheus
A. 將mysqld_exporter的配置增長到prometheus.yml文件中,vim /data/monitor/prometheus/conf/prometheus.yml
global:
# Server端抓取數據的時間間隔
scrape_interval: 1m
# 評估報警規則的時間間隔
evaluation_interval: 1m
# 數據抓取的超時時間
scrape_timeout: 20s
# 加全局標籤
#external_labels:
# monitor: "hk"
# 鏈接alertmanager
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
# 告警規則
rule_files:
- /data/monitor/prometheus/conf/rule/*.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# 監控prometheus本機
- job_name: 'prometheus'
scrape_interval: 15s
static_configs:
- targets: ['10.8.53.218:9090']
# 監控指定主機
- job_name: 'node_resources'
scrape_interval: 1m
static_configs:
file_sd_configs:
- files:
- /data/monitor/prometheus/conf/node_conf/node_host_info.json
honor_labels: true
# mysql採集器
- job_name: 'mysql_global_status'
scrape_interval: 60s
static_configs:
file_sd_configs:
- files:
- /data/monitor/prometheus/conf/node_conf/node_mysql_info.json
B. 編寫node_mysql_info.json,cat /data/monitor/prometheus/conf/node_conf/node_mysql_info.json
[
{
"labels": {
"desc": "slave_customer_10.8.31.101:3306",
"group": "ba",
"mysql_addr": "10.8.31.101:3306",
"role": "slave_customer"
},
"targets": [
"localhost:15050"
]
},
{
"labels": {
"desc": "slave_bi_10.8.150.188:3306",
"group": "ba",
"mysql_addr": "10.8.150.188:3306",
"role": "slave_bi"
},
"targets": [
"localhost:15221"
]
},
{
"labels": {
"desc": "slave_10.8.139.209:3306",
"group": "ba",
"mysql_addr": "10.8.139.209:3306",
"role": "slave"
},
"targets": [
"localhost:15052"
]
},
{
"labels": {
"desc": "slave_catalog_10.8.11.246:3306",
"group": "ba",
"mysql_addr": "10.8.11.246:3306",
"role": "slave_catalog"
},
"targets": [
"localhost:15053"
]
},
{
"labels": {
"desc": "master_10.8.4.126:3306",
"group": "ba",
"mysql_addr": "10.8.4.126:3306",
"role": "master"
},
"targets": [
"localhost:15049"
]
},
{
"labels": {
"desc": "slave_dc_10.8.17.124:3306",
"group": "ba",
"mysql_addr": "10.8.17.124:3306",
"role": "slave_dc"
},
"targets": [
"localhost:15051"
]
},
{
"labels": {
"desc": "master_10.8.115.3:3306",
"group": "openapi",
"mysql_addr": "10.8.115.3:3306",
"role": "master"
},
"targets": [
"localhost:15060"
]
}
]
B. 重啓prometheus,cd /data/monitor/prometheus下,而後 sh reload.sh
注意:因爲有不少指標沒法抓取,咱們用腳本再次獲取,我只有ucloud的api對接抓取的python腳本,若是有須要能夠加我qq: 761117826
3. 配置grafana
A. 下載mysql監控模板,下載地址: https://pan.baidu.com/s/1xWWceAQ_A4kKEn06dUlRBA
B. 如何導入請參考配置主機監控的文章中的2.配置grafana中的h至l步驟( http://www.javashuo.com/article/p-ybzkorax-mn.html )
4. 配置alertmanager
A. 在prometheus配置規則,cat /data/monitor/prometheus/conf/rule/mysql.yml ,下面是文件內容,而後重啓prometheus,cd /data/monitor/prometheus && sh reload.sh
groups:
- name: mysql_alert
rules:
### 慢查詢 ###
# 默認慢查詢告警策略
- alert: mysql慢查詢5分鐘100條
expr: floor(delta(mysql_global_status_slow_queries{mysql_addr!~"10.8.6.44:3306|10.8.9.20:3306|10.8.12.212:3306"}[5m])) >= 100
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}條],告警初始時長爲3分鐘."
### qps ###
# 默認qps告警策略
- alert: mysql_qps大於8000
expr: floor(sum(irate(mysql_global_status_commands_total{group!~"product|product_backend"}[5m])) by (group, role, mysql_addr)) > 8000
for: 6m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."
# 商品庫等qps告警策略
- alert: mysql_qps大於25000
expr: floor(sum(irate(mysql_global_status_commands_total{group=~"product|product_backend"}[5m])) by (group, role, mysql_addr)) > 25000
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲3分鐘."
### 內存 ###
# 默認內存告警策略
- alert: mysql內存99%
expr: mysql_mem_used_rate >= 99
for: 6m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲6分鐘."
### 磁盤 ###
# 默認磁盤告警策略
- alert: mysql磁盤85%
expr: mysql_disk_used_rate{mysql_addr!~"10.8.161.53:3306|10.8.115.31:3306"} >= 85
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲3分鐘."
# 磁盤95%告警策略
- alert: mysql磁盤95%
expr: mysql_disk_used_rate{mysql_addr=~"10.8.161.53:3306|10.8.115.31:3306"} >= 95
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲3分鐘."
#### IO上限告警 ###
## SSD盤IO上限告警策略
# - alert: mysqlSSD盤IO上限預警
# expr: (floor(mysql_ioops) >= mysql_disk_total_size * 50 * 0.9) and (mysql_ssd == 1) and on() hour() >= 0 < 16
# for: 6m
# labels:
# severity: warning
# annotations:
# description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."
#
## 普通盤IO上限告警策略
# - alert: mysql普通盤IO上限預警
# expr: (floor(mysql_ioops) >= mysql_disk_total_size * 10 * 0.9) and (mysql_ssd == 0) and on() hour() >= 0 < 16
# for: 6m
# labels:
# severity: warning
# annotations:
# description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."
### 鏈接數 ###
# 默認鏈接數告警策略
- alert: mysql鏈接數80%
expr: floor(mysql_global_status_threads_connected / mysql_global_variables_max_connections * 100) >= 80
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}%],告警初始時長爲3分鐘."
### 運行進程數 ###
# 默認運行進程數告警策略
- alert: mysql運行進程數5分鐘增加>150
expr: floor(delta(mysql_global_status_threads_running{mysql_addr!~"10.8.136.10:3306|10.10.129.116:3306|10.8.67.153:3306"}[5m])) >= 150
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲3分鐘."
# 6分鐘運行進程數告警策略
- alert: mysql運行進程數5分鐘增加>150
expr: floor(delta(mysql_global_status_threads_runningi{mysql_addr=~"10.8.136.10:3306|10.10.129.116:3306|10.8.67.153:3306"}[5m])) >= 150
for: 6m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}],告警初始時長爲6分鐘."
### 主從同步異常 ###
# 默認主從同步告警策略
- alert: mysql主從同步異常
expr: (mysql_slave_status_slave_io_running{role!="master"} == 0) or (mysql_slave_status_slave_sql_running{role!="master"} == 0)
for: 1m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],主從同步異常,告警初始時長爲1分鐘."
### 主從同步延時 ###
# 默認主從同步延時告警策略
- alert: mysql主從同步延時>30s
expr: floor(mysql_slave_status_seconds_behind_master{mysql_addr!~"10.8.137.173:3306|10.8.11.17:3306|10.8.2.17:3306|10.10.29.6:3306|10.8.61.153:3306"}) >= 30
for: 3m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}s],告警初始時長爲3分鐘."
# 主從同步延時較大告警策略
- alert: mysql主從同步延時>300s
expr: floor(mysql_slave_status_seconds_behind_master{mysql_addr=~"10.8.137.173:3306|10.8.11.17:3306|10.10.29.6:3306|10.8.61.153:3306"}) >= 300
for: 12m
labels:
severity: warning
annotations:
description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值爲:[{{ $value }}s],告警初始時長爲12分鐘."
B. 配置alertmanager, cat /data/prometheus/alertmanager/conf/alertmanager.yml ,若是是相同的接收人,能夠直接在原來的資源後面增長,若是是不一樣的接收人,就須要從新定義接收人模板,而後再定義資源規則並綁定到新的接收人模板
global:
resolve_timeout: 2m
smtp_auth_password: q5AYahvxi3WLDap3 #發送郵箱密碼
smtp_auth_username: itliuqs@163.com #發送郵箱
smtp_from: itliuqs@163.com #發送郵箱
smtp_require_tls: false
smtp_smarthost: smtp.163.com:465 #發送服務器
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/ #微信接口連接
inhibit_rules:
- equal:
- instance
source_match:
alertname: "主機CPU90%"
target_match:
alertname: "主機負載太高"
- equal:
- instance
source_match:
alertname: "mysql運行進程數5分鐘增加數>150"
target_match:
alertname: "mysql慢查詢5分鐘100條"
- equal:
- instance
source_match:
severity: error
target_match:
severity: warning
- equal:
- instance
source_match:
severity: fatal
target_match:
severity: error
- equal:
- service_name
source_match:
severity: error
target_match:
severity: warning
receivers:
- email_configs: #定義test發送人模塊
- html: '{{ template "email.default.html" . }}' #調用的模板
send_resolved: true
to: liuqs@126.com #將報警信息發給些郵箱,多人用|
name: test #發送人模板名
wechat_configs: #微信接收這些信息請看最下面的企業微信介紹
- agent_id: 1000002 #應用id
api_secret: hnyU1LTGnJUiBaCp47l3WVQLTEFF5RXyfNO751xlaHa #應用認證
corp_id: wwd397231fa801beaa #企業微信ID
send_resolved: true
to_user: LiuQingShan|liuqs #發送給企業微信通信人的Id 多我的就用|分開
- email_configs: #定義默認的發送人
- html: '{{ template "email.default.html" . }}'
send_resolved: true
to: liuqs@126.com
name: default_group
wechat_configs:
- agent_id: 1000002
api_secret: hnyU1LTGnJUiBaCp47l3WVQLTEFF5RXyfNO751xlaHa
corp_id: wwd397231fa801beaa
send_resolved: true
to_user: LiuQingShan
route: #定義資源報警規則
group_by:
- monitor
group_interval: 2m
group_wait: 30s
receiver: default_group
repeat_interval: 6h
routes:
- continue: true
match_re:
instance: 10.8.46.117:9100|10.8.80.126:9100|10.8.32.67:9100|10.8.9.35:9100|10.8.69.81:9100|localhost:15050|localhost:15221|localhost:15052|localhost:15053|localhost:15049|localhost:15051|localhost:15060 #定義使用的資源
receiver: test #使用test發送人模板
templates:
- /data/monitor/alertmanager/template/*.tmpl #調用報警內容模板的路徑