5分鐘學會搭建prometheus

時間 2021-01-11

標籤 node linux git github docker windows 瀏覽器微信網絡架構欄目 Linux 简体版

原文原文鏈接

簡介

prometheus是一個開源的系統監控和警報工具包，最初由SoundCloud開發。自2012年始，許多公司和組織已經採用了prometheus，該項目擁有活躍的開發人員和用戶社區。
它如今是一個獨立的開源項目，獨立於任何公司進行維護。着重於此，prometheus在2016年加入CNCF，是繼kubernetes以後第二個託管的項目。node

官網地址： Prometheuslinux

github地址： githubgit

架構圖github

下載與安裝

安裝方式有不少種，若是你是windows用戶，那麼只須要在本地起個二進制服務就能夠。若是你是linux用戶，能夠經過docker等更加靈活方式部署。docker

二進制

二進制下載地址windows

tar xvfz prometheus-*.tar.gz
cd prometheus-*
./prometheus --config.file=prometheus.yml

固然你能夠下載最新的源碼進行編譯獲取最新的二進制文件。瀏覽器

mkdir -p $GOPATH/src/github.com/prometheus
cd $GOPATH/src/github.com/prometheus
git clone https://github.com/prometheus/prometheus.git
cd prometheus
make build
./prometheus -config.file=your_config.yml

docker

# 使用 /opt/prometheus/prometheus.yml 的配置
docker run --name prometheus -d -p 127.0.0.1:9090:9090 -v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add stable https://charts.helm.sh/stable
helm repo update

# Helm 3
$ helm install [RELEASE_NAME] prometheus-community/prometheus

# Helm 2
$ helm install --name [RELEASE_NAME] prometheus-community/prometheus

配置文件

prometheus已經可以起來了，咱們也須要對服務作一些個性化的配置，讓prometheus可以獲取到數據。微信

global:
  scrape_interval: 15s # 默認抓取間隔，15s向目標抓取一次數據
  external_labels:
    monitor: 'prometheus-monitor'
# 抓取對象
scrape_configs:
  - job_name: 'prometheus' # 名稱，會在每一條metrics添加標籤{job_name:"prometheus"}
    scrape_interval: 5s # 抓取時間
    static_configs: # 抓取對象
      - targets: ['localhost:9090']

重啓完畢後，咱們能夠看到這兩個界面。網絡

安裝exporter

如何獲取數據源？從下面的連接你能夠挑選一些官方或非官方的exporter來監控你的服務。架構

exporters and integrations

例如：Node Exporter 暴露了如linux等UNIX系統的內核和機器級別的指標(windows用戶應用wmi_exporter)。它提供了不少標準的指標如CPU、內存、磁盤空間、硬盤I/O和網絡帶寬。此外，它還提供了從負載率平均值到主板溫度等不少內核暴露的問題。

下載運行以後，咱們須要更新prometheus.yml，而後重啓 prometheus加載新的配置

global:
  scrape_interval: 15s # 默認抓取間隔，15s向目標抓取一次數據
  external_labels:
    monitor: 'codelab-monitor'
# 抓取對象
scrape_configs:
  - job_name: 'prometheus' # 名稱，會在每一條metrics添加標籤{job_name:"prometheus"}
    scrape_interval: 5s # 抓取時間
    static_configs: # 抓取對象
      - targets: ['localhost:9090']
  - job_name: 'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']

告警通知

若是你須要設定特定的規則，例如cpu/內存超過了設定值，須要將告警數據發送到你的郵件、微信、釘釘等，那麼你就須要Alertmanager。

告警分爲兩個部分。首先須要在prometheus中添加告警規則，定義告警產生的邏輯，其次Altermanager將觸發的警報轉化爲通知，例如郵件，呼叫和聊天消息。

global:
  scrape_interval: 15s # 默認抓取間隔，15s向目標抓取一次數據
  evaluation_interval: 10s
  external_labels:
    monitor: 'codelab-monitor'
# 規則文件
rule_files:
  - rules.yml
  
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093
      
# 抓取對象
scrape_configs:
  - job_name: 'prometheus' # 名稱，會在每一條metrics添加標籤{job_name:"prometheus"}
    scrape_interval: 5s # 抓取時間
    static_configs: # 抓取對象
      - targets: ['localhost:9090']
  - job_name: 'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']

# 規則文件rules.yml
groups:
  - name: example
    rules:
    - alert: InstanceDown
      expr: up == 0
      for: 1m

按照 evaluation_interval 的配置，InstanceDown告警每10s將被執行1次。若是持續1m收到數據，那麼這個告警就會被觸發。在達到設定的時間長度前，這個告警處於 pending 狀態，在 Alerts 頁面能夠單擊警告查看包括它的標籤在內的更多詳細信息。

注：一般建議至少5min以減小噪聲從而減輕固有監控的各類狀況。

既然有一個被觸發的告警，須要 Alertmanager 針對它作一些事。

Alertmanager

如何管理告警通知？
好比我只想工做時間收到告警，那麼能夠設置告警事件爲09:00-21:00。
好比我某個服務不想收到通知，那麼能夠暫時關閉通知。

下載地址

如今須要爲 Alertmanager 建立一個配置文件。這裏有不少中方式讓Alertmanager 通知到你。這裏使用SMTP。

global:
  smtp_smarthost: 'localhost:25'
  smtp_from: 'youraddress@example.org'

route:
  receiver: example-email
receivers:
- name: 'example-email'
  email_configs:
  - to: 'youraddress@example.org'

啓動Alertmanager，如今能夠在瀏覽器輸入 http://localhost:9093 來訪問 Alertmanager，在這個頁面你將看到觸發的告警，若是全部的配置正確並正常啓動，一兩分鐘後就會收到郵件告警通知。

總結

這個prometheus由exporter、prometheus server、Alertmanager構成。 exporter收集數據，prometheus server 拉取exporter數據，而後根據告警規則，將告警推送到Alertmanager處理。中間還衍生了許多其餘組件，例如pushgateway(客戶端將數據push到pushgateway，由prometheus按期拉取)，grafana圖標頁面等。