環境:所有服務都是基於docker運行html
本文略微草率,好文章在這裏,特別好以下:前端
http://www.javashuo.com/article/p-xikbiroq-dh.htmlnode
這是個系列文章,值得一看: https://yunlzheng.gitbook.io/prometheus-book/part-ii-prometheus-jin-jie/exporter/commonly-eporter-usage/use-prometheus-monitor-containerlinux
注意:每個操做建議結合狀況使用,文章裏的也會有不少錯誤,只是給一個思路方便理解git
prometheus經過node-exporter收集當前主機運行的狀況,由於本環境全部都使用的容器,因此對於node-exporter來講咱們要將對應的目錄進行映射,由於node-exporter是跑在容器裏,可是咱們要讓他監控的是宿主機的各個狀態github
再而後,部署了alertmanager容器服務,使之映射在主機的9093端口;prometheus會週期性的對告警規則進行計算,若是知足告警觸發條件就會向alertmanager發送告警信號,alertmanager收到告警信號以後,發送給相應的接受者(已經在配置文件定義好的)web
docker pull prom/prometheus #拉取prometheus鏡像 docker pull prom/node-exporter #拉取node-exporter鏡像 docker pull grafana/grafana #拉取grafana鏡像
cat prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] #prometheus運行端口 - job_name: 'linux' static_configs: - targets: ['172.21.71.50:9100'] #node節點端口 labels: instance: node
運行容器docker
$ docker run -d \ > --net="host" \ > --pid="host" \ > -v "/:/host:ro,rslave" \ > prom/node-exporter \ > --path.rootfs /host #運行node-exporter,這個比較特殊,在不是特別瞭解以前,先這樣操做着 $ sudo docker run -d \ -p 9090:9090 \ -v /usr/local/prometheus/file/prometheus.yml:/usr/local/prometheus/file/prometheus.yml \ prom/prometheus \ --config.file=/usr/local/prometheus/file/prometheus.yml --web.enable-lifecycle #運行prometheus容器 $ git clone https://github.com/grafana/piechart-panel.git #餅圖插件 $ docker run -d --name=grafana -v /usr/local/prometheus/grafana/plugin/:/var/lib/grafana/plugins/ -p 3333:3000 grafana/grafana #運行grafana,grafana的默認帳號密碼是admin/admin
下載鏡像json
$ docker pull google/cadvisor
運行vim
cadvisor咱們須要運行在docker宿主機上(與node_exporter相似),而後經過HTTP方式供Prometheus獲取數據
$ docker run \ --volume=/:/roos:ro \ --volume=/var/run:/var/run:rw \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --publish=9101:8080 \ --detach=true \ --name=cadvisor \ google/cadvisor:latest #這個cadvisor也是比較特殊,在你不是很熟悉它以前,按照個人操做作下去
注意:這裏是把容器8080端口映射到主機9101,cadvisor有web界面地址:http://IP:9101
接入Grafana展現容器監控數據
這裏咱們去Grafana官網,找別人作好的Dashboard模板,地址:https://grafana.com/dashboards/4170,下載模板json文件而後導入本地Grafana。關於導入Dashbozrd模板參考https://www.cnblogs.com/tchua/p/11115146.html
接下來進行的操做是修改下該模板文件的一個變量,由於它原本是爲cadvisor定作的;
修改爲我這個樣子便可(在你對它不是很瞭解以前,按照文檔的作下去,再變通)
若是一切順暢,那麼就會出現下圖這樣
如今這個程度還不行,由於版本的問題,由於該模板不是基於最新版Node_exporter開發,有些值並不適用,咱們須要修改對應的值,具體咱們也能夠經過Prometheus查詢界面肯定value值。
$ docker pull prom/alertmanager(linuxtips/alertmanager_alpine) #拉取alertmanager鏡像
$ cat /usr/local/prometheus/alertmanager/alertmanager.yml global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receivers: ying.qiao receivers: - name: 'ying.qiao' webhook_configs: - url: 'https://hook.bearychat.com/=bwD9B/prometheus/2e31f72d81f31d322db49e85d22e1cee' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
prometheus添加告警規則
$ sudo mkdir /usr/local/prometheus/rules $ sudo vim /usr/local/prometheus/rules/node_alerts.yml groups: - name: node_alerts rules: - alert: InstanceDown ## alert名稱 expr: up{job='node'} == 0 ## 報警條件 for: 1m ## 超過1分鐘,prometheus會把報警信息發送至alertmanger labels: severity: "warning" annotations: summary: Host {{ $labels.instance }} of {{ $labels.job }} is Down!
這裏有一個很坑的問題,花括號裏的job後面那個node,必需要和在prometheus.yml裏定義的job名稱嚴格一致
$ sudo vim /usr/local/prometheus/file/prometheus.yml rule_files: - /usr/local/prometheus/rules/node_alerts.yml #指定對應的規則文件 alerting: alertmanagers: - static_configs: - targets: - 172.21.71.50:9093 ## alertmanager服務地址 ## 添加prometheus對alertmanager服務的監控 #以上配置文件,注意下添加的位置 - job_name: 'alertmanager' static_configs: - targets: ['172.21.71.50:9093']
重啓prometheus,並啓動alertmanager
$ docker rm -f c1473106d0f0 $ docker run -d -p 9090:9090\ -v /usr/local/prometheus/file/prometheus.yml:/usr/local/prometheus/file/prometheus.yml\ -v "/usr/local/prometheus/file/alertmanager_rules.yml:/usr/local/prometheus/file/alertmanager_rules.yml:ro"\ prom/prometheus\ --config.file=/usr/local/prometheus/file/prometheus.yml\ --web.enable-lifecycle $ docker run -d -p 9093:9093 \ -v /usr/local/prometheus/alertmanager/:/usr/local/prometheus/alertmanager/ \ -v /var/lib/alertmanager:/alertmanager \ --name alertmanager prom/alertmanager \ --config.file="/usr/local/prometheus/alertmanager/alertmanager.yml" \ --storage.path=/alertmanager