Prometheus + Grafana 快速上手,監控主機的 CPU, GPU, MEM, IO 等狀態。node
用於採集 UNIX 內核主機的數據,這裏下載並解壓:linux
wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz tar xvfz node_exporter-1.1.2.linux-amd64.tar.gz cd node_exporter-1.1.2.linux-amd64 nohup ./node_exporter &
查看數據:git
$ curl http://localhost:9100/metrics # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 0 go_gc_duration_seconds{quantile="0.25"} 0 go_gc_duration_seconds{quantile="0.5"} 0 ...
用於採集 NVIDIA GPU 的數據,以 Docker 鏡像運行:github
docker run -d --restart=always --gpus all -p 9400:9400 nvidia/dcgm-exporter
查看數據:docker
$ curl localhost:9400/metrics # HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz). # TYPE DCGM_FI_DEV_SM_CLOCK gauge # HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz). # TYPE DCGM_FI_DEV_MEM_CLOCK gauge # HELP DCGM_FI_DEV_MEMORY_TEMP Memory temperature (in C). ...
配置 ~/prometheus.yml
:bash
global: scrape_interval: 15s scrape_configs: # Node Exporter - job_name: node static_configs: - targets: ['192.167.200.91:9100'] # DCGM Exporter - job_name: dcgm static_configs: - targets: ['192.167.200.91:9400']
運行 Docker 鏡像:服務器
docker run -d --restart=always \ -p 9090:9090 \ -v ~/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus
訪問 http://localhost:9090/ :curl
訪問 http://localhost:9090/targets :url
運行 Docker 鏡像:spa
docker run -d --restart=always -p 3000:3000 grafana/grafana
以 admin/admin
登陸。
新增 Prometheus
:
點擊 Save & Test
:
導入 8919
Node Exporter for Prometheus Dashboard by StarsL.cn:
查看儀表盤:
導入 12239
NVIDIA DCGM Exporter Dashboard by nvidia:
查看儀表盤:
GoCoding 我的實踐的經驗分享,可關注公衆號!