【監控系統】配合Graphite使用的報警系統

咱們的監控系統採用的是 collectd收集,graphite存儲,grafana展現的架構,好處是新添加的服務能夠自動接入,而且圖形化的展現頁比較直觀,可是不足之處是整個體系中沒有可以在指標出現問題的時候進行報警通知的功能。通過嘗試了多種報警系統,咱們最終選定了seyren做爲報警組件,本文會介紹嘗試過的其餘組件以及優缺點。git

對報警系統的要求

對報警系統的要求總結起來有以下幾點:github

  1. 可以從graphite中讀取數據:既然架構上使用了graphite做爲監控指標的存儲,咱們不但願再引入另一個存儲組件web

  2. 集中配置:全部的監控指標閥值須要再一個地方進行統一的配置和管理,方便進行調整,所以在每臺機器上進行檢查報警的相關組件不做爲候選方案json

  3. 自動化配置:新的服務進行接入或者已有的服務進行擴容的時候可以自動化進行接入,這裏就要求報警系統可以進行通用化的配置或者提供接口進行報警規則的添加服務器

  4. 報警通道支持擴展:須要接入工資提供的短信報警平臺,必須可以進行定製架構

選型

Grafana報警功能

咱們首選的是Grafana的報警功能,由於已經使用Grafana進行繪圖和dashboard展現了。Grafana從4.X開始添加了報警功能,能夠對一個查詢配置報警條件並選擇一個報警通道進行報警,配置界面以下圖:app

圖片描述

報警通道的選擇也比較多,包括Slack,Mail 以及 WebHook 等。其中WebHook能夠做爲擴展報警通道的方法,當觸發一個報警的時候會以POST方法訪問WebHook,把報警的具體信息上傳,咱們能夠本身實現一個HTTP接口處理請求,以便實現和不一樣報警系統的對接。POST消息體以下(摘自Grafana文檔):post

{
  "title": "My alert",
  "ruleId": 1,
  "ruleName": "Load peaking!",
  "ruleUrl": "http://url.to.grafana/db/dashboard/my_dashboard?panelId=2",
  "state": "alerting",
  "imageUrl": "http://s3.image.url",
  "message": "Load is peaking. Make sure the traffic is real and spin up more webfronts",
  "evalMatches": [
    {
      "metric": "requests",
      "tags": {},
      "value": 122
    }
  ]
}

觸發報警之後在dashboard上會以不一樣的顏色展示:url

圖片描述

存在的問題

第一個問題是報警的查詢不可以支持Grafana模版。Grafana的模版功能很好的解決了新項目接入時候複雜的操做,只要按照預設的規則進行上報,新接入項目的時候徹底不用建立新的Dashboard。因爲報警模塊缺乏對模版的支持,使用上就須要每個服務器的報警查詢都必須明肯定義,不能包含模版變量,這樣就致使接入一個新的項目的時候須要大量的手工/半自動化操做纔可以完成報警的配置。spa

第二個問題是查詢表達式不可以對一個單獨的Meter單獨維護報警狀態。例如定一個報警查詢 collectd.*.cpu.percent-idle,若是咱們有2臺服務器,這個查詢就對應了2個meter:collectd.host2.cpu.percent-idle 和 collectd.host1.cpu.percent-idle,當host1的cpu idle 達到報警閥值的時候這個check的狀態會被改成ALERTING並觸發發送報警信息,可是當host2觸發到報警閥值的時候就不會發送報警了。Grafana的文檔中提到這個功能後面會有支持的計劃,可是暫時還沒法使用。

cabot

cabot 是一個主要爲Graphite數據源設計的報警系統,和Grafana相似,能夠經過定義一個grafana的metric查詢以及閥值進行報警,能夠經過本身實現插件進行報警的發送。與Grafana的報警組件相似,對於一個查詢包含了多個metric的狀況沒法單獨對每一個Metric進行報警狀態的追蹤。

seyren

seyren 也是爲Graphite數據源設置的報警系統,優勢是在metric查詢中包含多個metric的狀況下可以單獨爲每一個metric追蹤報警狀態。

首先咱們定義一個check,metric查詢是 collectd.base.control.jy.*.cpu.percent-idle, 中間*匹配的是全部服務器的IP地址。下圖是check的配置界面,定義查詢之後須要定義warn的閥值和error的閥值,定義之後會展現出最近一段時間的監控圖。

圖片描述

保存了Check之後就可以從dashboard中查看到報警的狀況,能夠看到全部匹配的metric都有一個獨立的狀態進行追蹤。這個特性使得自動化添加服務器和服務成爲可能,新擴容的機器只要按照約定進行監控數據的上報就可以被上述check涵蓋。

圖片描述

警報的發送方面,seyren支持的報警通道也比較多,例如 Email, Flowdock, HipChat, HTTP, Hubot等,這裏咱們只關心HTTP。下圖是一個HTTP報警通道的設置,只要定義一個URL就好,這個URL要可以接受POST請求,報警的具體信息會用json的方式經過post body上傳。

圖片描述

報警的POST Body 關鍵節點摘錄以下:

"alerts": [
    {
      "checkId": "59327b84e4b0a957ebb25f77",
      "targetHash": "\ufffd\u0006LC\ufffd\ufffd\ufffd\ufffd\u0002\u007f\u0002\ufffd\ufffd\ufffdkE",
      "fromType": "OK",
      "toType": "WARN",
      "warn": 57,
      "timestamp": 1496484513227,
      "error": 62,
      "value": 58.1702216645755,
      "id": "59328aa1e4b0a957ebb26201",
      "target": "collectd.base.control.jy.host1.cpu.percent-idle"
    },
    {
      "checkId": "59327b84e4b0a957ebb25f77",
      "targetHash": "\ufffd\ufffd\ufffd\ufffd\ufffd>G\u001c\ufffd\ufffd\ufffd\u001c\ufffd9\ufffd\ufffd",
      "fromType": "WARN",
      "toType": "OK",
      "warn": 57,
      "timestamp": 1496484513227,
      "error": 62,
      "value": 52.6318006613729,
      "id": "59328aa1e4b0a957ebb26209",
      "target": "collectd.base.control.jy.host2.cpu.percent-idle"
    }
  ],

能夠看到alert節點裏面爲每個host單獨維護和上報了檢查狀態。

下面是POST Body的所有內容:

{
  "preview": "<br /><img src=http://192.168.1.1/render/?target=collectd.base.control.jy.*.cpu.percent-idle&from=10:08_20170603&until=09:08_20170603&target=alias(dashed(color(constantLine(57),%22yellow%22)),%22warn%20level%22)&target=alias(dashed(color(constantLine(62),%22red%22)),%22error%20level%22)&width=500&height=225></img>",
  "subscription": {
    "su": true,
    "mo": true,
    "tu": true,
    "we": true,
    "th": true,
    "fr": true,
    "sa": true,
    "ignoreWarn": false,
    "ignoreError": false,
    "ignoreOk": false,
    "fromTime": {
      "chronology": {
        "zone": {
          "fixed": true,
          "id": "UTC"
        }
      },
      "millisOfSecond": 0,
      "millisOfDay": 0,
      "secondOfMinute": 0,
      "hourOfDay": 0,
      "minuteOfHour": 0,
      "fieldTypes": [
        {
          "durationType": {
            "name": "hours"
          },
          "rangeDurationType": {
            "name": "days"
          },
          "name": "hourOfDay"
        },
        {
          "durationType": {
            "name": "minutes"
          },
          "rangeDurationType": {
            "name": "hours"
          },
          "name": "minuteOfHour"
        },
        {
          "durationType": {
            "name": "seconds"
          },
          "rangeDurationType": {
            "name": "minutes"
          },
          "name": "secondOfMinute"
        },
        {
          "durationType": {
            "name": "millis"
          },
          "rangeDurationType": {
            "name": "seconds"
          },
          "name": "millisOfSecond"
        }
      ],
      "values": [
        0,
        0,
        0,
        0
      ],
      "fields": [
        {
          "range": 24,
          "rangeDurationField": {
            "unitMillis": 86400000,
            "precise": true,
            "name": "days",
            "type": {
              "name": "days"
            },
            "supported": true
          },
          "maximumValue": 23,
          "lenient": false,
          "unitMillis": 3600000,
          "durationField": {
            "unitMillis": 3600000,
            "precise": true,
            "name": "hours",
            "type": {
              "name": "hours"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "hourOfDay",
          "type": {
            "durationType": {
              "name": "hours"
            },
            "rangeDurationType": {
              "name": "days"
            },
            "name": "hourOfDay"
          },
          "supported": true
        },
        {
          "range": 60,
          "rangeDurationField": {
            "unitMillis": 3600000,
            "precise": true,
            "name": "hours",
            "type": {
              "name": "hours"
            },
            "supported": true
          },
          "maximumValue": 59,
          "lenient": false,
          "unitMillis": 60000,
          "durationField": {
            "unitMillis": 60000,
            "precise": true,
            "name": "minutes",
            "type": {
              "name": "minutes"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "minuteOfHour",
          "type": {
            "durationType": {
              "name": "minutes"
            },
            "rangeDurationType": {
              "name": "hours"
            },
            "name": "minuteOfHour"
          },
          "supported": true
        },
        {
          "range": 60,
          "rangeDurationField": {
            "unitMillis": 60000,
            "precise": true,
            "name": "minutes",
            "type": {
              "name": "minutes"
            },
            "supported": true
          },
          "maximumValue": 59,
          "lenient": false,
          "unitMillis": 1000,
          "durationField": {
            "unitMillis": 1000,
            "precise": true,
            "name": "seconds",
            "type": {
              "name": "seconds"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "secondOfMinute",
          "type": {
            "durationType": {
              "name": "seconds"
            },
            "rangeDurationType": {
              "name": "minutes"
            },
            "name": "secondOfMinute"
          },
          "supported": true
        },
        {
          "range": 1000,
          "rangeDurationField": {
            "unitMillis": 1000,
            "precise": true,
            "name": "seconds",
            "type": {
              "name": "seconds"
            },
            "supported": true
          },
          "maximumValue": 999,
          "lenient": false,
          "unitMillis": 1,
          "durationField": {
            "unitMillis": 1,
            "precise": true,
            "name": "millis",
            "type": {
              "name": "millis"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "millisOfSecond",
          "type": {
            "durationType": {
              "name": "millis"
            },
            "rangeDurationType": {
              "name": "seconds"
            },
            "name": "millisOfSecond"
          },
          "supported": true
        }
      ]
    },
    "toTime": {
      "chronology": {
        "zone": {
          "fixed": true,
          "id": "UTC"
        }
      },
      "millisOfSecond": 0,
      "millisOfDay": 86340000,
      "secondOfMinute": 0,
      "hourOfDay": 23,
      "minuteOfHour": 59,
      "fieldTypes": [
        {
          "durationType": {
            "name": "hours"
          },
          "rangeDurationType": {
            "name": "days"
          },
          "name": "hourOfDay"
        },
        {
          "durationType": {
            "name": "minutes"
          },
          "rangeDurationType": {
            "name": "hours"
          },
          "name": "minuteOfHour"
        },
        {
          "durationType": {
            "name": "seconds"
          },
          "rangeDurationType": {
            "name": "minutes"
          },
          "name": "secondOfMinute"
        },
        {
          "durationType": {
            "name": "millis"
          },
          "rangeDurationType": {
            "name": "seconds"
          },
          "name": "millisOfSecond"
        }
      ],
      "values": [
        23,
        59,
        0,
        0
      ],
      "fields": [
        {
          "range": 24,
          "rangeDurationField": {
            "unitMillis": 86400000,
            "precise": true,
            "name": "days",
            "type": {
              "name": "days"
            },
            "supported": true
          },
          "maximumValue": 23,
          "lenient": false,
          "unitMillis": 3600000,
          "durationField": {
            "unitMillis": 3600000,
            "precise": true,
            "name": "hours",
            "type": {
              "name": "hours"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "hourOfDay",
          "type": {
            "durationType": {
              "name": "hours"
            },
            "rangeDurationType": {
              "name": "days"
            },
            "name": "hourOfDay"
          },
          "supported": true
        },
        {
          "range": 60,
          "rangeDurationField": {
            "unitMillis": 3600000,
            "precise": true,
            "name": "hours",
            "type": {
              "name": "hours"
            },
            "supported": true
          },
          "maximumValue": 59,
          "lenient": false,
          "unitMillis": 60000,
          "durationField": {
            "unitMillis": 60000,
            "precise": true,
            "name": "minutes",
            "type": {
              "name": "minutes"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "minuteOfHour",
          "type": {
            "durationType": {
              "name": "minutes"
            },
            "rangeDurationType": {
              "name": "hours"
            },
            "name": "minuteOfHour"
          },
          "supported": true
        },
        {
          "range": 60,
          "rangeDurationField": {
            "unitMillis": 60000,
            "precise": true,
            "name": "minutes",
            "type": {
              "name": "minutes"
            },
            "supported": true
          },
          "maximumValue": 59,
          "lenient": false,
          "unitMillis": 1000,
          "durationField": {
            "unitMillis": 1000,
            "precise": true,
            "name": "seconds",
            "type": {
              "name": "seconds"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "secondOfMinute",
          "type": {
            "durationType": {
              "name": "seconds"
            },
            "rangeDurationType": {
              "name": "minutes"
            },
            "name": "secondOfMinute"
          },
          "supported": true
        },
        {
          "range": 1000,
          "rangeDurationField": {
            "unitMillis": 1000,
            "precise": true,
            "name": "seconds",
            "type": {
              "name": "seconds"
            },
            "supported": true
          },
          "maximumValue": 999,
          "lenient": false,
          "unitMillis": 1,
          "durationField": {
            "unitMillis": 1,
            "precise": true,
            "name": "millis",
            "type": {
              "name": "millis"
            },
            "supported": true
          },
          "minimumValue": 0,
          "leapDurationField": null,
          "name": "millisOfSecond",
          "type": {
            "durationType": {
              "name": "millis"
            },
            "rangeDurationType": {
              "name": "seconds"
            },
            "name": "millisOfSecond"
          },
          "supported": true
        }
      ]
    },
    "enabled": true,
    "id": "59328a65e4b0a957ebb26200",
    "type": "HTTP",
    "target": "http://10.153.74.117:8083/sonar/1.0/alarm_str"
  },
  "check": {
    "subscriptions": [
      {
        "su": true,
        "mo": true,
        "tu": true,
        "we": true,
        "th": true,
        "fr": true,
        "sa": true,
        "ignoreWarn": false,
        "ignoreError": false,
        "ignoreOk": false,
        "fromTime": {
          "chronology": {
            "zone": {
              "fixed": true,
              "id": "UTC"
            }
          },
          "millisOfSecond": 0,
          "millisOfDay": 0,
          "secondOfMinute": 0,
          "hourOfDay": 0,
          "minuteOfHour": 0,
          "fieldTypes": [
            {
              "durationType": {
                "name": "hours"
              },
              "rangeDurationType": {
                "name": "days"
              },
              "name": "hourOfDay"
            },
            {
              "durationType": {
                "name": "minutes"
              },
              "rangeDurationType": {
                "name": "hours"
              },
              "name": "minuteOfHour"
            },
            {
              "durationType": {
                "name": "seconds"
              },
              "rangeDurationType": {
                "name": "minutes"
              },
              "name": "secondOfMinute"
            },
            {
              "durationType": {
                "name": "millis"
              },
              "rangeDurationType": {
                "name": "seconds"
              },
              "name": "millisOfSecond"
            }
          ],
          "values": [
            0,
            0,
            0,
            0
          ],
          "fields": [
            {
              "range": 24,
              "rangeDurationField": {
                "unitMillis": 86400000,
                "precise": true,
                "name": "days",
                "type": {
                  "name": "days"
                },
                "supported": true
              },
              "maximumValue": 23,
              "lenient": false,
              "unitMillis": 3600000,
              "durationField": {
                "unitMillis": 3600000,
                "precise": true,
                "name": "hours",
                "type": {
                  "name": "hours"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "hourOfDay",
              "type": {
                "durationType": {
                  "name": "hours"
                },
                "rangeDurationType": {
                  "name": "days"
                },
                "name": "hourOfDay"
              },
              "supported": true
            },
            {
              "range": 60,
              "rangeDurationField": {
                "unitMillis": 3600000,
                "precise": true,
                "name": "hours",
                "type": {
                  "name": "hours"
                },
                "supported": true
              },
              "maximumValue": 59,
              "lenient": false,
              "unitMillis": 60000,
              "durationField": {
                "unitMillis": 60000,
                "precise": true,
                "name": "minutes",
                "type": {
                  "name": "minutes"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "minuteOfHour",
              "type": {
                "durationType": {
                  "name": "minutes"
                },
                "rangeDurationType": {
                  "name": "hours"
                },
                "name": "minuteOfHour"
              },
              "supported": true
            },
            {
              "range": 60,
              "rangeDurationField": {
                "unitMillis": 60000,
                "precise": true,
                "name": "minutes",
                "type": {
                  "name": "minutes"
                },
                "supported": true
              },
              "maximumValue": 59,
              "lenient": false,
              "unitMillis": 1000,
              "durationField": {
                "unitMillis": 1000,
                "precise": true,
                "name": "seconds",
                "type": {
                  "name": "seconds"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "secondOfMinute",
              "type": {
                "durationType": {
                  "name": "seconds"
                },
                "rangeDurationType": {
                  "name": "minutes"
                },
                "name": "secondOfMinute"
              },
              "supported": true
            },
            {
              "range": 1000,
              "rangeDurationField": {
                "unitMillis": 1000,
                "precise": true,
                "name": "seconds",
                "type": {
                  "name": "seconds"
                },
                "supported": true
              },
              "maximumValue": 999,
              "lenient": false,
              "unitMillis": 1,
              "durationField": {
                "unitMillis": 1,
                "precise": true,
                "name": "millis",
                "type": {
                  "name": "millis"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "millisOfSecond",
              "type": {
                "durationType": {
                  "name": "millis"
                },
                "rangeDurationType": {
                  "name": "seconds"
                },
                "name": "millisOfSecond"
              },
              "supported": true
            }
          ]
        },
        "toTime": {
          "chronology": {
            "zone": {
              "fixed": true,
              "id": "UTC"
            }
          },
          "millisOfSecond": 0,
          "millisOfDay": 86340000,
          "secondOfMinute": 0,
          "hourOfDay": 23,
          "minuteOfHour": 59,
          "fieldTypes": [
            {
              "durationType": {
                "name": "hours"
              },
              "rangeDurationType": {
                "name": "days"
              },
              "name": "hourOfDay"
            },
            {
              "durationType": {
                "name": "minutes"
              },
              "rangeDurationType": {
                "name": "hours"
              },
              "name": "minuteOfHour"
            },
            {
              "durationType": {
                "name": "seconds"
              },
              "rangeDurationType": {
                "name": "minutes"
              },
              "name": "secondOfMinute"
            },
            {
              "durationType": {
                "name": "millis"
              },
              "rangeDurationType": {
                "name": "seconds"
              },
              "name": "millisOfSecond"
            }
          ],
          "values": [
            23,
            59,
            0,
            0
          ],
          "fields": [
            {
              "range": 24,
              "rangeDurationField": {
                "unitMillis": 86400000,
                "precise": true,
                "name": "days",
                "type": {
                  "name": "days"
                },
                "supported": true
              },
              "maximumValue": 23,
              "lenient": false,
              "unitMillis": 3600000,
              "durationField": {
                "unitMillis": 3600000,
                "precise": true,
                "name": "hours",
                "type": {
                  "name": "hours"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "hourOfDay",
              "type": {
                "durationType": {
                  "name": "hours"
                },
                "rangeDurationType": {
                  "name": "days"
                },
                "name": "hourOfDay"
              },
              "supported": true
            },
            {
              "range": 60,
              "rangeDurationField": {
                "unitMillis": 3600000,
                "precise": true,
                "name": "hours",
                "type": {
                  "name": "hours"
                },
                "supported": true
              },
              "maximumValue": 59,
              "lenient": false,
              "unitMillis": 60000,
              "durationField": {
                "unitMillis": 60000,
                "precise": true,
                "name": "minutes",
                "type": {
                  "name": "minutes"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "minuteOfHour",
              "type": {
                "durationType": {
                  "name": "minutes"
                },
                "rangeDurationType": {
                  "name": "hours"
                },
                "name": "minuteOfHour"
              },
              "supported": true
            },
            {
              "range": 60,
              "rangeDurationField": {
                "unitMillis": 60000,
                "precise": true,
                "name": "minutes",
                "type": {
                  "name": "minutes"
                },
                "supported": true
              },
              "maximumValue": 59,
              "lenient": false,
              "unitMillis": 1000,
              "durationField": {
                "unitMillis": 1000,
                "precise": true,
                "name": "seconds",
                "type": {
                  "name": "seconds"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "secondOfMinute",
              "type": {
                "durationType": {
                  "name": "seconds"
                },
                "rangeDurationType": {
                  "name": "minutes"
                },
                "name": "secondOfMinute"
              },
              "supported": true
            },
            {
              "range": 1000,
              "rangeDurationField": {
                "unitMillis": 1000,
                "precise": true,
                "name": "seconds",
                "type": {
                  "name": "seconds"
                },
                "supported": true
              },
              "maximumValue": 999,
              "lenient": false,
              "unitMillis": 1,
              "durationField": {
                "unitMillis": 1,
                "precise": true,
                "name": "millis",
                "type": {
                  "name": "millis"
                },
                "supported": true
              },
              "minimumValue": 0,
              "leapDurationField": null,
              "name": "millisOfSecond",
              "type": {
                "durationType": {
                  "name": "millis"
                },
                "rangeDurationType": {
                  "name": "seconds"
                },
                "name": "millisOfSecond"
              },
              "supported": true
            }
          ]
        },
        "enabled": true,
        "id": "59328a65e4b0a957ebb26200",
        "type": "HTTP",
        "target": "http://192.168.1.1/sonar/1.0/alarm_str"
      }
    ],
    "warn": 57,
    "until": null,
    "from": null,
    "lastCheck": 1496484513253,
    "description": null,
    "enabled": true,
    "error": 62,
    "name": "cpu",
    "id": "59327b84e4b0a957ebb25f77",
    "state": "ERROR",
    "target": "collectd.base.control.jy.*.cpu.percent-idle",
    "live": false
  },
  "alerts": [
    {
      "checkId": "59327b84e4b0a957ebb25f77",
      "targetHash": "\ufffd\u0006LC\ufffd\ufffd\ufffd\ufffd\u0002\u007f\u0002\ufffd\ufffd\ufffdkE",
      "fromType": "OK",
      "toType": "WARN",
      "warn": 57,
      "timestamp": 1496484513227,
      "error": 62,
      "value": 58.1702216645755,
      "id": "59328aa1e4b0a957ebb26201",
      "target": "collectd.base.control.jy.host1.cpu.percent-idle"
    },
    {
      "checkId": "59327b84e4b0a957ebb25f77",
      "targetHash": "\ufffd\ufffd\ufffd\ufffd\ufffd>G\u001c\ufffd\ufffd\ufffd\u001c\ufffd9\ufffd\ufffd",
      "fromType": "WARN",
      "toType": "OK",
      "warn": 57,
      "timestamp": 1496484513227,
      "error": 62,
      "value": 52.6318006613729,
      "id": "59328aa1e4b0a957ebb26209",
      "target": "collectd.base.control.jy.host2.cpu.percent-idle"
    }
  ],
  "seyrenUrl": "http://localhost:8080/seyren"
}
相關文章
相關標籤/搜索