Prometheus監控服務器node
監控服務器CPU、內存、磁盤、I/O等信息,首先須要安裝node_exporter。node_exporter的做用是用於機器系統數據收集。linux
下載地址: https://github.com/prometheus/node_exporter/releases/
https://prometheus.io/download/git
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.0.linux-amd64.tar.gz tar xvf node_exporter-0.18.0.linux-amd64.tar.gz mv node_exporter-0.18.0.linux-amd64 /usr/local/node_exporter
建立用戶github
groupadd prometheus useradd -g prometheus -m -d /var/lib/prometheus -s /sbin/nologin prometheus chown prometheus.prometheus -R /usr/local/prometheus
建立Systemd服務vim
cat > /etc/systemd/system/node_exporter.service << EOF [Unit] Description=node_exporter Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/node_exporter/node_exporter Restart=on-failure [Install] WantedBy=multi-user.target EOF
啓動服務器
systemctl start node_exporter systemctl status node_exporter ● node_exporter.service - node_exporter Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled) Active: active (running) since 三 2019-06-05 09:18:56 GMT; 3s ago Main PID: 11050 (node_exporter) CGroup: /system.slice/node_exporter.service └─11050 /usr/local/prometheus/node_exporter/node_exporter systemctl enable node_exporter
Node Exporter默認的抓取地址爲http://IP:9100/metricsrest
配置prometheuscode
vim /usr/local/prometheus/prometheus.yml - job_name: 'linux' static_configs: - targets: ['localhost:9100'] labels: instance: node1
prometheus.yml中一共定義了兩個監控:一個是監控prometheus自身服務,另外一個是監控Linux服務器。這裏給個完整的示例:ip
scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'linux' static_configs: - targets: ['NODE_IP:9100'] labels: instance: node1
重啓Prometheus內存
systemctl restart prometheus
訪問Prometheus Web,在Status->Targets頁面下,咱們能夠看到咱們配置的兩個Target,它們的State爲UP。
Prometheus針對nodes告警規則配置
groups: - name: example rules: - alert: 實例丟失 expr: up{job="node-exporter"} == 0 for: 1m labels: severity: page annotations: summary: "服務器實例 {{ $labels.instance }} 丟失" description: "{{ $labels.instance }} 上的任務 {{ $labels.job }} 已經中止了 1 分鐘已上了" - alert: 磁盤容量小於 5% expr: 100 - ((node_filesystem_avail_bytes{job="node-exporter",mountpoint=~".*",fstype=~"ext4|xfs|ext2|ext3"} * 100) / node_filesystem_size_bytes {job="node-exporter",mountpoint=~".*",fstype=~"ext4|xfs|ext2|ext3"}) > 95 for: 30s annotations: summary: "服務器實例 {{ $labels.instance }} 磁盤不足 告警通知" description: "{{ $labels.instance }}磁盤 {{ $labels.device }} 資源 已不足 5%, 當前值: {{ $value }}" - alert: "內存容量小於 20%" expr: ((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / (node_memory_MemTotal_bytes )) * 100 > 80 for: 30s labels: severity: warning annotations: summary: "服務器實例 {{ $labels.instance }} 內存不足 告警通知" description: "{{ $labels.instance }}內存資源已不足 20%,當前值: {{ $value }}" - alert: "CPU 平均負載大於 4 個" expr: node_load5 > 4 for: 30s annotations: sumary: "服務器實例 {{ $labels.instance }} CPU 負載 告警通知" description: "{{ $labels.instance }}CPU 平均負載(5 分鐘) 已超過 4 ,當前值: {{ $value }}" - alert: "磁盤讀 I/O 超過 30MB/s" expr: irate(node_disk_read_bytes_total{device="sda"}[1m]) > 30000000 for: 30s annotations: sumary: "服務器實例 {{ $labels.instance }} I/O 讀負載 告警通知" description: "{{ $labels.instance }}I/O 每分鐘讀已超過 30MB/s,當前值: {{ $value }}" - alert: "磁盤寫 I/O 超過 30MB/s" expr: irate(node_disk_written_bytes_total{device="sda"}[1m]) > 30000000 for: 30s annotations: sumary: "服務器實例 {{ $labels.instance }} I/O 寫負載 告警通知" description: "{{ $labels.instance }}I/O 每分鐘寫已超過 30MB/s,當前值: {{ $value }}" - alert: "網卡流出速率大於 10MB/s" expr: (irate(node_network_transmit_bytes_total{device!~"lo"}[1m]) / 1000) > 1000000 for: 30s annotations: sumary: "服務器實例 {{ $labels.instance }} 網卡流量負載 告警通知" description: "{{ $labels.instance }}網卡 {{ $labels.device }} 流量已經超過 10MB/s, 當前值: {{ $value }}" - alert: "CPU 使用率大於 90%" expr: 100 - ((avg by (instance,job,env)(irate(node_cpu_seconds_total{mode="idle"}[30s]))) *100) > 90 for: 30s annotations: sumary: "服務器實例 {{ $labels.instance }} CPU 使用率 告警通知" description: "{{ $labels.instance }}CPU 使用率已超過 90%, 當前值: {{ $value }}"
小禮物走一走,來簡書關注我
做者:fish_man 連接:https://www.jianshu.com/p/7bec152d1a1f 來源:簡書 簡書著做權歸做者全部,任何形式的轉載都請聯繫做者得到受權並註明出處。