prometheus 監控學習

時間 2019-11-17

標籤 prometheus 監控學習简体版

原文原文鏈接

什麼是 prometheus

1. 簡單介紹

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.html

Prometheus是一個開源監控系統，它前身是SoundCloud的警告工具包。從2012年開始，許多公司和組織開始使用Prometheus。該項目的開發人員和用戶社區很是活躍，愈來愈多的開發人員和用戶參與到該項目中。目前它是一個獨立的開源項目，且不依賴與任何公司。爲了強調這點和明確該項目治理結構，Prometheus在2016年繼Kurberntes以後，加入了Cloud Native Computing Foundation。node

官網文檔

我的翻譯中文文檔

項目源碼

簡易安裝

2. 特色

多維度數據模型

靈活的查詢語言

不依賴分佈式存儲，單個服務器節點是自主的

以HTTP方式，經過pull模型拉去時間序列數據

也經過中間網關支持push模型

經過服務發現或者靜態配置，來發現目標服務對象

支持多種多樣的圖表和界面展現，grafana也支持它

prometheus 的安裝

1. 預編譯後的二進制壓縮文件安裝

下載源碼解壓安裝 tar -xzvf prohetheus-xxx.tar.gz 進入解壓後的目錄運行修改配置文件: prometheus.yml 命令行運行: ./prometheus --config.file=prometheus.ymlmysql

運行時控制檯可能會報: transport: http2Client.notifyError got notified that the client transport was broken unexpected EOF。官方issue給出的解決修改bash配置文件後重啓就能生效。git

2. Docker鏡像安裝

3. 源碼安裝(須要go環境)

4. 三方配置管理系統

prometheus 的使用

1. 組件介紹

prometheus的監控服務和報警服務都是以組件和配置的形式進行的。要使客戶端被監控，須要安裝相應的組件來獲取監控數據或是去本身去實現prometheus的數據接口，並將監控數據傳遞給監控服務器。在沒有特殊需求的狀況下選擇官方提供的官方組件或是官方推薦的三方組件庫，使用起來更便捷穩定性也有保障。github

alertmanager 報警組件，接收prometheus發送的報警數據，並依據配置的報警方式(email,hipchat,webhook,wechat等)發送報警信息。

node_exporter 節點監控組件(官方提供的用於監控硬件的核心組件)，運行在機器上後，能收集機器的硬件信息，網絡信息等

pushgateway 數據發送的代理組件，官方建議用在監控短時間任務的監控數據收發:任務將數據發送給pushgateway，prometheus從pushgateway獲取監控數據

blackbox_exporter 黑盒監控組件，主要用於監控網絡是否通暢，服務是否可用:示例

mysqld_exporter mysql官方提供的監控組件，用於收集mysql的監控數據，供prometheus服務器獲取。不少軟件都提供了用於獲取本身數據的監控組件來供prometheus獲取數據，咱們要作的只是將組件配置好而後運行起來收集數據就行。

grafana 不屬於prometueus的組件，主要用戶展現監控數據。原生的prometheus界面展現效果不太好，目前官方已經棄用，而改用展現效果更好的grafana，*2.5.0 (2015-10-28)*以及以後版本的ganfana能很好的支持prometheus。

2. prometheus的配置

因 prmetheus 是進行組件化管理，組件組合全靠配置文件來完成，這裏不羅列如何進行具體配置了，詳情請見 premetheus配置 配置被監控對象 配置報警組件 安裝配置grafana 安裝grafana: 連接地址web

3. 監控數據管理

數據保存位置管理官方文檔本地存儲的,在啓動時指定存儲位置: ./prometheus --storage.tsdb.path=somePath 遠程存儲的,在配置文件中指定 數據保存時間管理(默認15天)，需注意服務器的磁盤是否能存儲下監控數據文件，遠程訪問的，須要注意prometheus是否有對遠程文件的讀寫權限 啓動時指定: ./prometheus --storage.tsdb.retention=15dsql

4. 環境搭建示例

prometheus + grafana + node + alertshell

下載壓縮軟件包:

在prometheus官網下載最新版的 prometheus, node_exporter, alertmanager 在grafana官網上下載最新版本的grafana的壓縮文件 vim

將各個文件解壓.

將node_exporter配置到監控服務器中

進入 prometheus 解壓目錄,vim prometheus.yml，加入以下設置:api

scrape_configs:
 # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['127.0.0.1:9090']

  - job_name: 'node' # 將node添加到監控中
    static_configs:
      - targets: ['127.0.0.1:9100']
複製代碼

啓動 node_exporter: 到 node_exporter 解壓目錄下，執行: ./node_exporter 啓動成功後打開瀏覽器，輸入: http://127.0.0.1:9100/metrics 成功後會有以下顯示:

啓動prometheus: 到 prometheus 解壓目錄下，執行: ./prometheus --config.file=prometheus.yml 啓動成功後在瀏覽器輸入: http://127.0.0.1:9090/graph 成功後會顯示:

點擊 Status -> Targets, 若是能看到當前有兩個正在被監控的程序，且都處於 UP狀態，則配置成功

爲 prometheus 添加報警

配置文件處理

到 alertmanager 解壓目錄下， vim alertmanager.yml 中加入:

global:
 # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'you email host:587'
  smtp_from: 'email_name@qq.com'
  smtp_auth_username: 'email_name@qq.com'
  smtp_auth_password: 'email_password'
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  email_configs:
  - to: 'receive alert email account'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
複製代碼

到 prometheus 解壓目錄下, 新建文件 alert.rules vim alert.rules 在文件中加入:

groups:
- name: example
  rules:
 # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
 # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
複製代碼

將報警規則和報警組件配置到 prometheus 中: vim prometheus.yml，加入以下設置:

rule_files:
  - "test_alert.rules"
 # Alerting specifies settings related to the Alertmanager.
alerting:
  alertmanagers:
    - static_configs:
      - targets: ['127.0.0.1:9093']
複製代碼

啓動 alertmanager 到 alertmanager 解壓目錄下，輸入 ./alertmanager --config.file=alertmanager.yml 啓動成功後在瀏覽器輸入: http://127.0.0.1:9093/#/alerts 看到以下界面，代表啓動成功:

重啓 premotheus(ctrl + c關掉後按上面的命令重啓就行)

加入 grafana 美化輸出界面

啓動grafana 到 grafana 解壓目錄下輸入: ./bin/grafana-server web 啓動程序瀏覽器中輸入: http://localhost:3000 用戶名: admin 密碼: admin 直接跳過修改密碼進入界面
將 prometheus 配置到 DataSource 中:
添加模板: 點擊 + -> import