容器監控實踐—node-exporter

時間 2020-11-23

標籤 node mysql git github golang web sql docker 數據庫 windows 欄目 MySQL 简体版

原文原文鏈接

概述

Prometheus從2016年加入CNCF，到2018年8月畢業，如今已經成爲Kubernetes的官方監控方案，接下來的幾篇文章將詳細解讀Promethues(2.x)node

Prometheus能夠從Kubernetes集羣的各個組件中採集數據，好比kubelet中自帶的cadvisor，api-server等，而node-export就是其中一種來源mysql

Exporter是Prometheus的一類數據採集組件的總稱。它負責從目標處蒐集數據，並將其轉化爲Prometheus支持的格式。與傳統的數據採集組件不一樣的是，它並不向中央服務器發送數據，而是等待中央服務器主動前來抓取，默認的抓取地址爲http://CURRENT_IP:9100/metricsgit

node-exporter用於採集服務器層面的運行指標，包括機器的loadavg、filesystem、meminfo等基礎監控，相似於傳統主機監控維度的zabbix-agentgithub

node-export由prometheus官方提供、維護，不會捆綁安裝，但基本上是必備的exportergolang

功能

node-exporter用於提供*NIX內核的硬件以及系統指標。web

若是是windows系統，可使用WMI exporter
若是是採集NVIDIA的GPU指標，可使用prometheus-dcgm

根據不一樣的*NIX操做系統，node-exporter採集指標的支持也是不同的，如：sql

diskstats 支持 Darwin, Linux
cpu 支持Darwin, Dragonfly, FreeBSD, Linux, Solaris等，

詳細信息參考：node_exporterdocker

咱們可使用 --collectors.enabled參數指定node_exporter收集的功能模塊,或者用--no-collector指定不須要的模塊，若是不指定，將使用默認配置。數據庫

部署

二進制部署：

下載地址：從https://github.com/prometheus...
解壓文件：tar -xvzf **.tar.gz
開始運行：./node_exporter

./node_exporter -h 查看幫助windows

usage: node_exporter [<flags>]

Flags:
  -h, --help
  --collector.diskstats.ignored-devices
  --collector.filesystem.ignored-mount-points
  --collector.filesystem.ignored-fs-types      
  --collector.netdev.ignored-devices      
  --collector.netstat.fields      
  --collector.ntp.server="127.0.0.1"
  .....

./node_exporter運行後，能夠訪問http://${IP}:9100/metrics，就會展現對應的指標列表

Docker安裝：

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter \
  --path.rootfs /host

k8s中安裝：

node-exporter.yaml文件：

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  labels:
    app: node-exporter
    name: node-exporter
  name: node-exporter
spec:
  clusterIP: None
  ports:
  - name: scrape
    port: 9100
    protocol: TCP
  selector:
    app: node-exporter
  type: ClusterIP
----
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  template:
    metadata:
      labels:
        app: node-exporter
      name: node-exporter
    spec:
      containers:
      - image: registry.cn-hangzhou.aliyuncs.com/tryk8s/node-exporter:latest
        name: node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: scrape
      hostNetwork: true
      hostPID: true

kubectl create -f node-exporter.yaml

獲得一個daemonset和一個service對象，部署後，爲了可以讓Prometheus可以從當前node exporter獲取到監控數據，這裏須要修改Prometheus配置文件。編輯prometheus.yml並在scrape_configs節點下添加如下內容:

scrape_configs:
  # 採集node exporter監控數據
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

也可使用prometheus.io/scrape: 'true'標識來自動獲取service的metric接口

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

配置完成後，重啓prometheus就能看到對應的指標

查看指標：

直接查看：

若是是二進制或者docker部署，部署成功後能夠訪問：http://${IP}:9100/metrics

會輸出下面格式的內容，包含了node-exporter暴露的全部指標：

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 6.1872e-05
go_gc_duration_seconds{quantile="0.25"} 0.000119463
go_gc_duration_seconds{quantile="0.5"} 0.000151156
go_gc_duration_seconds{quantile="0.75"} 0.000198764
go_gc_duration_seconds{quantile="1"} 0.009889647
go_gc_duration_seconds_sum 0.257232201
go_gc_duration_seconds_count 1187

# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="guest"} 0
node_cpu{cpu="cpu0",mode="guest_nice"} 0
node_cpu{cpu="cpu0",mode="idle"} 68859.19
node_cpu{cpu="cpu0",mode="iowait"} 167.22
node_cpu{cpu="cpu0",mode="irq"} 0
node_cpu{cpu="cpu0",mode="nice"} 19.92
node_cpu{cpu="cpu0",mode="softirq"} 17.05
node_cpu{cpu="cpu0",mode="steal"} 28.1

Prometheus查看：

相似go_gc_duration_seconds和node_cpu就是metric的名稱，若是使用了Prometheus,則能夠在http://${IP}:9090/頁面的指標中搜索到以上的指標：

經常使用指標類型有：

node_cpu：系統CPU使用量
node_disk*：磁盤IO
node_filesystem*：文件系統用量
node_load1：系統負載
node_memeory*：內存使用量
node_network*：網絡帶寬
node_time：當前系統時間
go_*：node exporter中go相關指標
process_*：node exporter自身進程相關運行指標

Grafana查看：

Prometheus雖然自帶了web頁面，但通常會和更專業的Grafana配套作指標的可視化，Grafana有不少模板，用於更友好地展現出指標的狀況，如Node Exporter for Prometheus

在grafana中配置好變量、導入模板就會有上圖的效果。

深刻解讀

node-exporter是Prometheus官方推薦的exporter，相似的還有

官方推薦的都會在https://github.com/prometheus下，在exporter推薦頁，也會有不少第三方的exporter，由我的或者組織開發上傳，若是有自定義的採集需求，能夠本身編寫exporter，具體的案例能夠參考後續的[自定義Exporter]文章

版本問題

由於node_exporter是比較老的組件，有一些最佳實踐並無merge進去，好比符合Prometheus命名規範(https://prometheus.io/docs/pr...，目前(2019.1)最新版本爲0.17

一些指標名字的變化（詳細比對）

* node_cpu ->  node_cpu_seconds_total
* node_memory_MemTotal -> node_memory_MemTotal_bytes
* node_memory_MemFree -> node_memory_MemFree_bytes
* node_filesystem_avail -> node_filesystem_avail_bytes
* node_filesystem_size -> node_filesystem_size_bytes
* node_disk_io_time_ms -> node_disk_io_time_seconds_total
* node_disk_reads_completed -> node_disk_reads_completed_total
* node_disk_sectors_written -> node_disk_written_bytes_total
* node_time -> node_time_seconds
* node_boot_time -> node_boot_time_seconds
* node_intr -> node_intr_total

解決版本問題的方法有兩種：

一是在機器上啓動兩個版本的node-exporter，都讓prometheus去採集。
二是使用指標轉換器,他會將舊指標名稱轉換爲新指標

對於grafana的展現，能夠找同時支持兩套指標的dashboard模板

Collector

node-exporter的主函數：

// Package collector includes all individual collectors to gather and export system metrics.
package collector

import (
    "fmt"
    "sync"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/common/log"
    "gopkg.in/alecthomas/kingpin.v2"
)

// Namespace defines the common namespace to be used by all metrics.
const namespace = "node"

能夠看到exporter的實現須要引入github.com/prometheus/client_golang/prometheus庫，client_golang是prometheus的官方go庫，既能夠用於集成現有應用，也能夠做爲鏈接Prometheus HTTP API的基礎庫。

好比定義了基礎的數據類型以及對應的方法：

Counter：收集事件次數等單調遞增的數據
Gauge：收集當前的狀態，好比數據庫鏈接數
Histogram：收集隨機正態分佈數據，好比響應延遲
Summary：收集隨機正態分佈數據，和 Histogram 是相似的

switch metricType {
        case dto.MetricType_COUNTER:
            valType = prometheus.CounterValue
            val = metric.Counter.GetValue()

        case dto.MetricType_GAUGE:
            valType = prometheus.GaugeValue
            val = metric.Gauge.GetValue()

        case dto.MetricType_UNTYPED:
            valType = prometheus.UntypedValue
            val = metric.Untyped.GetValue()

client_golang庫的詳細解析能夠參考：theory-source-code

本文爲容器監控實踐系列文章，完整內容見：container-monitor-book