[k8s]prometheus+alertmanager二進制安裝實現簡單郵件告警

本次任務是用alertmanaer發一個報警郵件
本次環境採用二進制普羅組件
本次準備監控一個節點的內存,當使用率大於2%時候(測試),發郵件報警.

環境準備

下載二進制https://prometheus.io/download/node

https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.windows-amd64.tar.gz
https://github.com/prometheus/alertmanager/releases/download/v0.12.0/alertmanager-0.12.0.windows-amd64.tar.gz
https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz

解壓mysql

/root/
├── alertmanager -> alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64
├── alertmanager-0.12.0.linux-amd64.tar.gz
├── node_exporter-0.15.2.linux-amd64
├── node_exporter-0.15.2.linux-amd64.tar.gz
├── prometheus -> prometheus-2.0.0.linux-amd64
├── prometheus-2.0.0.linux-amd64
└── prometheus-2.0.0.linux-amd64.tar.gz

實驗架構

配置alertmanager

建立 alert.ymllinux

[root@n1 alertmanager]# ls
alertmanager  alert.yml  amtool  data  LICENSE  NOTICE  simple.yml

alert.yml 裏面定義下: 誰發送什麼事件發給誰怎麼發等.git

cat alert.yml 
global:
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: 'maotai@163.com'
  smtp_auth_username: 'maotai@163.com'
  smtp_auth_password: '123456'


templates:
  - '/root/alertmanager/template/*.tmpl'

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 10m
  receiver: default-receiver


receivers:
- name: 'default-receiver'
  email_configs:
  - to: 'maotai@foxmail.com'
  
  
- 配置好後啓動便可
./alertmanager -config.file=./alert.yml

配置prometheus

報警規則rule.yml配置(將被prometheus.yml調用)

當使用率大於2%時候(測試),發郵件報警github

$ cat rule.yml 
groups:
- name: test-rule
  rules:
  - alert: NodeMemoryUsage
    expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2
    for: 1m
    labels:
      severity: warning 
    annotations:
      summary: "{{$labels.instance}}: High Memory usage detected"
      description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"

關鍵在於這個公式sql

(node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 2

labels 給這個規則打個標籤windows

annotations(報警說明)這部分是報警內容api

監控k從哪裏獲取?(後面有說) node_memory_MemTotal/node_memory_Buffers/node_memory_Cached微信

prometheus.yml配置

添加node_expolore這個job
添加rule_files的報警規則,rule_files部分調用rule.yml

$ cat prometheus.yml 
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

alerting:
  alertmanagers:
  - static_configs:
    - targets: ["localhost:9093"]

rule_files:
  - /root/prometheus/rule.yml

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['192.168.14.11:9090']
  - job_name: linux
    static_configs:
      - targets: ['192.168.14.11:9100']
        labels:
          instance: db1

配置好後啓動普羅而後訪問,能夠看到了node target了.

查看node_explore拋出的metric

查看alert,能夠看到告警規則發生的狀態

這些公式的key從這裏能夠看到(前提是當你安裝了對應的explore),按照這個k來寫告警公式

查看收到的郵件

微信報警配置

global:
  # The smarthost and SMTP sender used for mail notifications.
  resolve_timeout: 6m
  smtp_smarthost: '172.16.100.14:25'
  smtp_from: 'svnbuild_yf@iflytek.com'
  smtp_auth_username: 'svnbuild_yf'
  smtp_auth_password: 'tag#write@2015313'
  smtp_require_tls: false

  # The auth token for Hipchat.
  hipchat_auth_token: '1234556789'
  # Alternative host for Hipchat. 
  hipchat_api_url: 'https://hipchat.foobar.org/'
  wechat_api_url: "https://qyapi.weixin.qq.com/cgi-bin/"
  wechat_api_secret: "4tQroVeB0xUcccccccc65Yfkj2Nkt90a80MH3ayI"
  wechat_api_corp_id: "wxaf5acxxxx5f8eb98"
  

# The directory from which notification templates are read.
templates:
- 'templates/*.tmpl'

# The root route on which each incoming alert enters.
route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first 
  # notification.
  group_wait: 3s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 5m

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 1h

  # A default receiver
  receiver: ybyang2


  routes:
  - match:
      job: "11"
      #service: "node_exporter"
    routes:
    - match:
        status: yellow
      receiver: ybyang2
    - match:
        status: orange
      receiver: berlin


# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is 
# already critical.
inhibit_rules:
- source_match:
    service: 'up'
  target_match:
    service: 'mysql'
  # Apply inhibition if the alerqtname is the same.
  equal: ["instance"]

- source_match:
    service: "mysql"
  target_match:
    service: "mysql-query"
  equal: ['instance']

- source_match:
    service: "A"
  target_match:
    service: "B"
  equal: ["instance"]

- source_match:
    service: "B"
  target_match:
    service: "C"
  equal: ["instance"]

receivers:
- name: 'ybyang2'
  email_configs:
  - to: 'ybyang2@iflytek.com'
    send_resolved: true
    html: '{{ template "email.default.html" . }}'
    headers: { Subject: "[mail] 測試技術部監控告警郵件" }
    
- name: "berlin"
  wechat_configs:
  - send_resolved: true
    to_user: "@all"
    to_party: ""
    to_tag: ""
    agent_id: "1"
    corp_id: "wxaf5a99ccccc5f8eb98"