1、prometheus-webhook-daingtalakhtml
github地址:[Releases · timonwong/prometheus-webhook-dingtalk · GitHub](https://github.com/timonwong/prometheus-webhook-dingtalk/releases)
下載地址:[](https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz)node
本身去GitHub上下載須要的版本,而後解壓:mysql
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz tar xf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz -C /data; cd /data mv prometheus-webhook-dingtalk-0.3.0.linux-amd64 prometheus-webhook-dingtalk
修改配置文件:
# cat default.tmpllinux
{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }} {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }} {{ define "__text_alert_list" }}{{ range . }} **Labels** {{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }} {{ end }} **Annotations** {{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }} {{ end }} **Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }}) {{ end }}{{ end }} {{ define "ding.link.title" }}{{ template "__subject" . }}{{ end }} {{ define "ding.link.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})** {{ template "__text_alert_list" .Alerts.Firing }} {{ end }}
啓動服務:
# cat prometheus-webhook-dingtalk.shnginx
#!/bin/bash nohup prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="test=https://oapi.dingtalk.com/robot/send?access_token=89f3cedfb3c3cdb031bdf10f8fc52bf1add575e9b5fb6f462a8cca6859af4" >>/data/prometheus-webhook-daingtalak/nohub.out 2>&1 &
--ding.profile是釘釘機器人生成的,本身建立個釘釘機器人。git
2、Alertmanager
github地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)
下載地址:[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)
本身去GitHub上下載須要的版本,而後解壓:github
wget https://github.com/prometheus/alertmanager/releases/download/v0.15.1/alertmanager-0.15.1.linux-amd64.tar.gz tar xf alertmanager-0.15.1.linux-amd64.tar.gz -C /data ;cd /data mv alertmanager-0.15.1.linux-amd64 alertmanager
修改配置文件,因爲我本身使用的是釘釘告警,因此本文使用的釘釘:
# cat alertmanager.ymlweb
global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'test' receivers: - name: 'test' webhook_configs: - url: "http://127.0.0.1:8060/dingtalk/test/send" send_resolved: true
此處的url是prometheus-webhook-daingtalak的地址,用於將告警信息轉換成釘釘能夠接受的消息格式。redis
啓動alertmanager:
# cat alertmanager.shsql
#!/bin/bash nohup alertmanager --config.file="/data/alertmanager/alertmanager.yml" --storage.path="/data/alertmanager/data" --web.listen-address="0.0.0.0:9093" >>/data/alertmanager/nohub.out 2>&1 &
alertmanager訪問地址:
http://ip:9093
3、Prometheus
github地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)
一、prometheus組成
1)prometheus:主程序,主要負責採集數據以及數據存儲,而且對外提供PromQL實現監控數據的查詢以及聚合分析;
2)*_exporter:於向Prometheus Server暴露數據採集的endpoint,Prometheus輪訓這些Exporter採集而且保存數據;
3)alertManager: 負責實現告警,結合郵件或釘釘
4)pushgateway: Prometheus爲一些臨時存在的進程,如批處理任務,提供了Push Gateway,這些客戶端能夠將數據push到Push Gateway中,而後由Push Gateway提供pull接口將數據暴露給PrometheusServer。
5)prometheus主要經過pull的方式獲取數據,這樣就大大減小了被監控端的壓力和系統資源的佔用。
二、安裝
下載地址:[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)
本身去GitHub上下載須要的版本,而後解壓:
wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz tar xf prometheus-2.3.2.linux-amd64.tar.gz -C /data ;cd /data mv prometheus-2.3.2.linux-amd64 prometheus
而後修改配置文件,定義相應的監控項job:
# cat prometheus.yml
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). #remote_write: # - url: "http://10.2.79.208:9201/write" #remote_read: # - url: "http://10.2.79.208:9201/read" # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 127.0.0.1:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" - "/data/prometheus/mongodb-rules.yml" - "/data/prometheus/consul-rules.yml" - "/data/prometheus/redis-rules.yml" - "/data/prometheus/nginx-rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] - job_name: 'mongodb1' static_configs: - targets: ['10.10.8.70:9218'] - job_name: 'mongodb1-system' static_configs: - targets: ['10.10.8.70:9100'] - job_name: 'mongodb2' static_configs: - targets: ['10.10.5.108:9218']
rule_files:指定告警規則文件的路徑,能夠定義本身的告警規則
# cat consul-rules.yml
--- groups: - name: consul rules: - alert: consul_catalog_service_node_healthy expr: consul_catalog_service_node_healthy < 1 for: 60s labels: serverity: critical annotations: descrition: '{{ $labels.node }} {{ $labels.service_id }} is Unhealth' summary: 'some service is unhealth,you must chek it out by consul' - alert: consul_node_health expr: consul_exporter_build_info < 1 for: 60s labels: serverity: critical annotations: descrition: '{{ $labels.instance }} consul server is down ' summary: 'consul server is down' - alert: consul_health_service_status expr: consul_health_service_status < 1 for: 60s labels: serverity: critical annotations: descrition: '{{ $labels.node }} {{ $labels.service_id }} is Unhealth' summary: 'some service is unhealth,you must chek it out by consul'
# cat mongodb-rules.yml
--- groups: - name: mongodb rules: - alert: mongodb_mongod_connections expr: mongodb_mongod_connections{state='current'} and mongodb_mongod_connections < 0 for: 10s labels: serverity: critical annotations: description: '{{ $labels.instance }} of {{ $labels.job }} connections is low 11' summary: 'connections is too Low,Mongodb mybe is Down!' - alert: mongodb_mongod_connections expr: mongodb_mongod_connections{state='current'} and mongodb_mongod_connections > 600 for: 10s labels: serverity: warning annotations: description: '{{ $labels.instance }} of {{ $labels.job }} connections is high 570' summary: 'connections is too much' - alert: mongodb_mongod_memory expr: mongodb_mongod_memory{type='virtual'} and mongodb_mongod_memory < 5000 for: 5s labels: serverity: critical annotations: description: '{{ $labels.instance }} of {{ $labels.job }} {{ $labels.type }} is too low' summary: 'mongodb mybe is down' - alert: mongodb_mongod_replset_member_health expr: mongodb_mongod_replset_member_health != 1 for: 5s labels: serverity: critical annotations: description: ' {{ $labels.name }} {{ $labels.state}} is down' summary: 'one of replsets node is down' - alert: mongodb_mongod_replset_my_state expr: mongodb_mongod_replset_my_state{job='mongodb3'} and mongodb_mongod_replset_my_state != 1 for: 5s labels: serverity: critical annotations: description: ' replsets master have been changed, {{ $labels.job }} is not master' summary: 'mongodb3 master is down,chek the status'
#cat redis-rules.yml
--- groups: - name: redis rules: - alert: redis_instantaneous_ops_per_sec expr: redis_instantaneous_ops_per_sec < 50 for: 120s labels: serverity: critical annotations: descrition: '{{ $labels.job }} is Unhealth' summary: 'redis-prod options/sec is too low,redis maybe traffic jam ,you must check it out by "redis-cli slowlog get"'
#cat nginx-rules.yml
--- groups: - name: nginx-exporter rules: - alert: status_code_499 expr: status_code_499 > 300 for: 60s labels: serverity: critical annotations: descrition: ' status_code_499:{{ status_code_499 }}' summary: 'nginx status code 499 is too much,check loadbalance /var/log/nginx/share.log' - alert: status_code_400 expr: status_code_400 > 50 for: 60s labels: serverity: critical annotations: descrition: 'status_code_400: {{ status_code_400 }}' summary: 'nginx status code 400 is too much,check loadbalance /var/log/nginx/share.log'
nginx是我本身寫的一個exportor,地址:https://github.com/cuishuaigit/nginx_exporter
啓動:
# cat prometheus.sh
#!/bin/bash nohup prometheus --config.file="/data/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090" --storage.tsdb.path="/data/prometheus/data" --web.console.libraries="/data/prometheus/console_libraries" --web.console.templates="/data/prometheus/consoles" --web.enable-admin-api --log.level=info >>/data/prometheus/nohub.out 2>&1 &
prometheus_ui訪問:
http://ip:9090
4、exporter
一、https://github.com/prometheus/node_exporter
二、https://github.com/prometheus/influxdb_exporter
三、https://github.com/prometheus/mysqld_exporter
四、https://github.com/prometheus/jmx_exporter
五、https://github.com/prometheus/consul_exporter
六、https://github.com/prometheus/haproxy_exporter