MongoDB監控及報警

時間 2019-11-24

原文原文鏈接

轉載請註明出處：https://www.cnblogs.com/shining5/p/11142357.html

MongoDB監控及報警

Prometheus是由SoundCloud開發的開源監控報警系統和時序列數據庫，其使用go語言開發。基本原理是經過HTTP協議週期性抓取被監控組件的狀態，任意組件只要提供對應HTTP接口就能夠接入監控。Prometheus服務端產生告警向Altermanager發送告警。html

Grafana是一個開源的度量分析和可視化套件，經常使用於可視化基礎設施和應用程序分析的時間序列數據。前端

監控

目標：可視化顯示MongoDB的運行狀態。
工具：Grafana，Prometheus
由於grafana自己數據源沒有mongodb，因此中間加上Prometheus來對mongodb進行監控。node

服務端組件：
Prometheus #服務端
Grafana #前端展現linux

客戶端組件：
node_exporter
mongodb_exportergit

步驟

安裝go環境

$ yum install go
$ go version
go version go1.6.3 linux/amd64

安裝Prometheus

$ wget https://github.com/prometheus/prometheus/releases/download/v2.11.0-rc.0/prometheus-2.11.0-rc.0.linux-amd64.tar.gz
$ tar xvf prometheus-2.11.0-rc.0.linux-amd64.tar.gz -C /usr/local/
$ ln -sv /usr/local/prometheus-2.11.0.linux-amd64/ /usr/local/prometheus
$ cd /usr/local/prometheus

備註：下載2.0以上版本，不然讀取規則的時候報錯github

修改配置文件
將監管的ip地址添加到prometheus.yml中web

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. 
  - job_name: 'mongo1'
    static_configs:
      - targets: ['10.13.72.26:9001']  
  - job_name: 'node'
    static_configs:
      - targets: ['10.13.72.26:9100']

其中'10.13.72.26:9001'是mongodb_exporter監聽端口，後面安裝mongodb_exporter會提到mongodb

啓動服務數據庫

nohup ./prometheus --web.enable-lifecycle &api

備註：啓動時添加–web.enable-lifecycle能夠自動加載配置文件，能夠經過 curl -X POST http://localhost:9090/-/reload 從新加載配置

prometheus內置列一個web界面，能夠經過http://install_host:9090訪問。在Status->Targets頁面中，咱們能夠看到配置的mongo1，狀態爲Down，說明未檢測到數據。如何解決呢？須要安裝好node_exporter,mongodb_exporter狀態纔會變爲Up

安裝node_exporter

node_exporter服務端agent，用go語言編寫，主要用於採集系統數據，如cup，內存，負載，磁盤，網絡等信息。
啓動後監聽9100端口。

$ wget https://github.com/prometheus/node_exporter/releases/download/v0.14.0/node_exporter-0.14.0.linux-amd64.tar.gz
$ tar xvf node_exporter-0.14.0.linux-amd64.tar.gz -C /usr/local/
$ nohup /usr/local/node_exporter-0.14.0.linux-amd64/node_exporter &

安裝mongodb_exporter

wget https://github.com/dcu/mongodb_exporter/releases/mongodb_exporter-linux-amd64
nohup ./mongodb_exporter-linux-amd64

啓動後佔用9001端口

安裝grafana

wget https://dl.grafana.com/oss/release/grafana-6.2.5-1.x86_64.rpm 
sudo yum localinstall grafana-6.2.5-1.x86_64.rpm

啓動後默認佔用3000端口

sudo service grafana-server start

grafana顯示mongodb數據

步驟1: 打開grafana前端頁面http://install_host:3000

步驟2:在Data Sources添加數據源

步驟3: Create dashboard

import一個已經作好的錶盤

https://grafana.com/dashboards/2583
DownloadJSON

improt此錶盤

效果圖

報警

Prometheus的報警分爲兩部分，經過在Prometheus服務端設置告警規則，Prometheus服務端產生報警向Alertmanager發送報警。Alertmanager管理這些報警，經過電子郵件，PagerDuty和HipChat等方法發送通知。

設置報警及通知的步驟：
* 配置Alertmanager
* 配置Prometheus對Alertmanager訪問
* 配置ruler報警規則

咱們的需求是能將報警信息發送到企業微信中。
* 註冊企業微信帳號（不須要企業認證）
* 建立第三方應用如Prometheus,填寫應用信息（此應用的配置信息用於altermanager.yml配置及接收報警信息）

下載Alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.18.0-rc.0/alertmanager-0.18.0-rc.0.linux-amd64.tar.gz

tar -xzvf alertmanager-0.18.0-rc.0.linux-amd64.tar.gz

建立或修改alertmanager.yml

global:
  resolve_timeout: 2m
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
route:
  group_by: ['alertname_wechat']
  group_wait: 10s
  group_interval: 10s
  receiver: 'wechat'
  repeat_interval: 1h
receivers:
- name: 'wechat'
  wechat_configs:
  - send_resolved: true
    to_party: '1'
    agent_id: '1000002'
    corp_id: 'w***'
    api_secret: 'W***'

參數說明：

corp_id: 企業微信帳號惟一 ID，能夠在個人企業中查看。
to_party: 須要發送的組。
agent_id: 第三方企業應用的 ID(上面步驟建立的應用），能夠在本身建立的第三方企業應用詳情頁面查看。
api_secret: 第三方企業應用的密鑰，能夠在本身建立的第三方企業應用詳情頁面查看

啓動altermanager

nohup ./altermanager &

修改prometheus配置文件

增長以下配置

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093
rule_files:
  - "rules.yml"

建立rules.yml文件

groups:
- name: node
  rules:
  - alert: server_status
    expr: up{job="node"} == 0
    for: 15s
    annotations:
      summary: "機器掛了"

中止node_explorer，企業微信就會收到消息
備註：起初困擾個人是如何設置rules，後來發現Prometheus的前端頁面能夠直接執行query，將規則設置好後，能夠在頁面中執行，如圖：

規則配置

磁盤佔用規則：node_filesystem_avail{device="/dev/sde1",fstype="ext3",instance="hostip:9100",job="node",mountpoint="/data4"} < 1073741824  (1G)
磁盤使用率：(1-  (node_filesystem_free{fstype=~"ext3|ext4|xfs",mountpoint="/data4"} / node_filesystem_size{fstype=~"ext3|ext4|xfs",mountpoint="/data4"}) ) * 100

數據同步：

進程存在
mongodb_connections{instance="hostip:9001",job="mongo1",state="available"}==0

內存使用率：
((node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached))/node_memory_MemTotal) * 100

cpu使用率
 (100 - (avg by (instance)(irate(node_cpu{mode="idle"}[5m])) * 100))

使用promtool驗證規則是否準確

./promtool check rules alert_rule_test.yml