1. 使用Prometheus Operator監控kubetnetes集羣node
2. 使用Prometheus Operator實現應用自定義監控python
Alertmanager與Prometheus是相互分離的兩個組件。Prometheus服務器根據報警規則將警報發送給Alertmanager,而後Alertmanager將silencing、inhibition、aggregation等消息經過電子郵件、dingtalk和HipChat發送通知。linux
Alertmanager處理由例如Prometheus服務器等客戶端發來的警報。它負責刪除重複數據、分組,並將警報經過路由發送到正確的接收器,好比電子郵件、Slack、dingtalk等。Alertmanager還支持groups,silencing和警報抑制的機制。git
釘釘做爲內部通信工具,基本上你們在電腦和手機上都能用,消息能夠第一時間查看,報警消息的即時性要求比較高,因此適合用釘釘通知。github
請參考官方文檔:自定義機器人web
添加機器人後獲取機器人的hook(機器人好像只能在釘釘羣裏面添加),在後面部署會用到。docker
機器人hook:https://oapi.dingtalk.com/robot/send?access_token=xxxxxxjson
Alertmanager官方文檔:https://github.com/prometheus/docs/blob/db2a09a8a7e193d6e474f37055908a6d432b88b5/content/docs/alerting/configuration.md#webhook_configflask
修改Alertmanager報警配置,因上面的官方文檔已經給出來每一個參數的詳細信息,就再也不一一解釋了。vim
[root@node-01 prometheus]# vim prometheus-operator/values.yaml config: global: resolve_timeout: 2m route: group_by: ['job'] group_wait: 30s group_interval: 2m repeat_interval: 12h receiver: 'webhook' routes: - match: alertname: DeadMansSwitch receiver: 'webhook' receivers: - name: 'webhook' webhook_configs: - url: http://webhook-dingtalk/dingtalk/send/ send_resolved: true
更新prometheus-operator
[root@node-01 prometheus]# helm upgrade p ./prometheus-operator
修改爲功後能夠在alertmanager的status頁面看到相關配置
Alertmanager會如下列JSON格式的數據經過HTTP POST請求發送到端點:
{ "version": "4", "groupKey": <string>, // key identifying the group of alerts (e.g. to deduplicate) "status": "<resolved|firing>", "receiver": <string>, "groupLabels": <object>, "commonLabels": <object>, "commonAnnotations": <object>, "externalURL": <string>, // backlink to the Alertmanager. "alerts": [ { "labels": <object>, "annotations": <object>, "startsAt": "<rfc3339>", "endsAt": "<rfc3339>" }, ... ] }
這是測試報警數據的示例:
b'{ "receiver":"webhook", "status":"firing", "alerts":[{ "status":"firing", "labels":{ "alertname":"DeadMansSwitch", "prometheus":"monitoring/p-prometheus", "severity":"none" }, "annotations":{ "message":"This is a DeadMansSwitch meant to ensure that the entire alerting pipeline is functional." }, "startsAt":"2019-03-08T10:02:28.680317737Z", "endsAt":"0001-01-01T00:00:00Z", "generatorURL":"http://prom.cnlinux.club/graph?g0.expr=vector%281%29\\u0026g0.tab=1" }], "groupLabels":{}, "commonLabels":{ "alertname":"DeadMansSwitch", "prometheus":"monitoring/p-prometheus", "severity":"none" }, "commonAnnotations":{ "message":"This is a DeadMansSwitch meant to ensure that the entire alerting pipeline is functional." }, "externalURL":"http://alert.cnlinux.club","version":"4", "groupKey":"{}/{alertname=\\"DeadMansSwitch\\"}:{}"}\n'
釘釘對數據的格式是有要求的(具體要求在上面釘釘官方文檔),因此須要將Alertmanager傳過來的數據進行格式轉化。
如下咱們用本身寫的python腳原本轉換。
腳本說明:
[root@node-01 prometheus]# cat app.py #!/usr/bin/env python import io, sys sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8') sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding='utf-8') from flask import Flask, Response from flask import request import requests import logging import json import locale #locale.setlocale(locale.LC_ALL,"en_US.UTF-8") app = Flask(__name__) console = logging.StreamHandler() fmt = '%(asctime)s - %(filename)s:%(lineno)s - %(name)s - %(message)s' formatter = logging.Formatter(fmt) console.setFormatter(formatter) log = logging.getLogger("flask_webhook_dingtalk") log.addHandler(console) log.setLevel(logging.DEBUG) EXCLUDE_LIST = ['prometheus', 'endpoint'] @app.route('/') def index(): return 'Webhook Dingtalk by Billy https://blog.51cto.com/billy98' @app.route('/dingtalk/send/',methods=['POST']) def hander_session(): profile_url = sys.argv[1] post_data = request.get_data() post_data = json.loads(post_data.decode("utf-8"))['alerts'] post_data = post_data[0] messa_list = [] messa_list.append('### 報警類型: %s' % post_data['status'].upper()) messa_list.append('**startsAt:** %s' % post_data['startsAt']) for i in post_data['labels'].keys(): if i in EXCLUDE_LIST: continue else: messa_list.append("**%s:** %s" % (i, post_data['labels'][i])) messa_list.append('**Describe:** %s' % post_data['annotations']['message']) messa = (' \\n\\n > '.join(messa_list)) status = alert_data(messa, post_data['labels']['alertname'], profile_url ) log.info(status) return status def alert_data(data,title,profile_url): headers = {'Content-Type':'application/json'} send_data = '{"msgtype": "markdown","markdown": {"title": \"%s\" ,"text": \"%s\" }}' %(title,data) # type: str send_data = send_data.encode('utf-8') reps = requests.post(url=profile_url, data=send_data, headers=headers) return reps.text if __name__ == '__main__': app.debug = False app.run(host='0.0.0.0', port='8080')
將上面的python腳本作成鏡像,而後把他們以服務的形式運行在k8s集羣中,保證高可用。
你們也能夠用我已經制做成功的鏡像:docker pull billy98/webhook-dingtalk:latest
,直接pull便可。
[root@node-01 prometheus]# cat Dockerfile FROM centos:7 as build MAINTAINER billy98 5884625@qq.com RUN curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo && yum install -y python36 python36-pip && pip3.6 install flask requests werkzeug ADD app.py /usr/local/alert-dingtalk.py FROM gcr.io/distroless/python3 COPY --from=build /usr/local/alert-dingtalk.py /usr/local/alert-dingtalk.py COPY --from=build usr/local/lib64/python3.6/site-packages usr/local/lib64/python3.6/site-packages COPY --from=build usr/local/lib/python3.6/site-packages usr/local/lib/python3.6/site-packages ENV PYTHONPATH=usr/local/lib/python3.6/site-packages:usr/local/lib64/python3.6/site-packages EXPOSE 8080 ENTRYPOINT ["python","/usr/local/alert-dingtalk.py"]
[root@node-01 prometheus]# docker build -t billy98/webhook-dingtalk:latest .
我這樣build出來的鏡像只有50多M,具體的使用方法參考:
distroless:https://github.com/GoogleContainerTools/distroless
[root@node-01 prometheus]# cat webhook-dingtalk.yaml apiVersion: apps/v1beta2 kind: Deployment metadata: labels: app: webhook-dingtalk name: webhook-dingtalk namespace: monitoring #須要和alertmanager在同一個namespace spec: replicas: 1 selector: matchLabels: app: webhook-dingtalk template: metadata: labels: app: webhook-dingtalk spec: containers: - image: billy98/webhook-dingtalk:latest name: webhook-dingtalk args: - "https://oapi.dingtalk.com/robot/send?access_token=xxxxxx" #上面建立的釘釘機器人hook ports: - containerPort: 8080 protocol: TCP resources: requests: cpu: 100m memory: 100Mi limits: cpu: 500m memory: 500Mi livenessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 tcpSocket: port: 8080 readinessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 httpGet: port: 8080 path: / imagePullSecrets: - name: IfNotPresent --- apiVersion: v1 kind: Service metadata: labels: app: webhook-dingtalk name: webhook-dingtalk namespace: monitoring #須要和alertmanager在同一個namespace spec: ports: - name: http port: 80 protocol: TCP targetPort: 8080 selector: app: webhook-dingtalk type: ClusterIP
釘釘中報警信息以下:
報警恢復的消息
至此全部的操做已完成。
若有問題歡迎在下面留言交流。但願你們多多關注和點贊,謝謝!