Prometheus 監控 Redis 集羣的正確姿式

Prometheus 監控Redis的正確姿式(redis集羣)

Prometheus 監控 Redis cluster,其實套路都是同樣的,使用 exporter
exporter 負責採集指標,經過 http 暴露給 Prometheus 拉取。granafa 則經過這些指標繪圖展現數據。Prometheus 收集的數據還會根據你設置的告警規則判斷是否要發送給 AlertmanagerAlertmanager 則要判斷是否要發出告警。html

Alertmanager 告警分爲三個階段linux

  • Inactive 觸發告警的規則會被髮送到這來。
  • Pending 你設置的等待時間,即規則裏面的 for
  • Firing 發送告警到郵件、釘釘之類的

扯遠了,開始監控 Redis clustergit

redis_exporter 監控 Redis cluster

監控什麼應用,使用的相應的 exporter,能夠在官網查到。EXPORTERS AND INTEGRATIONS
github

Redis 使用 redis_exporter ,連接:redis_exporterweb

支持 Redis 2.x - 5.xredis

安裝及參數

下載地址api

wget https://github.com/oliver006/redis_exporter/releases/download/v1.3.5/redis_exporter-v1.3.5.linux-amd64.tar.gz   
tar zxvf redis_exporter-v1.3.5.linux-amd64.tar.gz
cd redis_exporter-v1.3.5.linux-amd64/
./redis_exporter <flags>

redis_exporter 支持的參數不少,對咱們有用的就幾個。ruby

./redis_exporter --help
Usage of ./redis_exporter:
    -redis.addr string
    	Address of the Redis instance to scrape (default "redis://localhost:6379")
    -redis.password string
    	Password of the Redis instance to scrape
    -web.listen-address string
    	Address to listen on for web interface and telemetry. (default ":9121")

單實例 redis 監控

nohup ./redis_exporter -redis.addr 172.18.11.138:6379 -redis.password xxxxx &

Prometheus 添加單實例bash

- job_name: redis_since
    static_configs:
    - targets: ['172.18.11.138:9121']

Redis 集羣監控方案

這個挺費勁的,網上查了不少資料,大都是監控單實例的,就這個是集羣的,恰恰他的集羣是沒密碼的。
prometheus監控redis集羣
post

我試過的方案:
如下兩種都會提示認證失敗

level=error msg="Redis INFO err: NOAUTH Authentication required."

方法一

nohup ./redis_exporter -redis.addr 172.18.11.139:7000 172.18.11.139:7001 172.18.11.140:7002 172.18.11.140:7003 172.18.11.141:7004 172.18.11.141:7005 -redis.password xxxxx &

方法二

nohup ./redis_exporter -redis.addr redis://h:Lcsmy.312==/@172.18.11.139:7000 redis://h:Lcsmy.312==/@172.18.11.139:7001 redis://h:Lcsmy.312==/@172.18.11.140:7002 redis://h:Lcsmy.312==/@172.18.11.140:7003 redis://h:Lcsmy.312==/@172.18.11.141:7004 redis://h:Lcsmy.312==/@172.18.11.141:7005 -redis.password xxxxx &

原本想採起最low 的方法,一個實例啓一個 redis_exporter。這樣子的話,集羣那裏不少語句都用不了,好比 cluster_slot_fail。放棄該方法

nohup ./redis_exporter -redis.addr 172.18.11.139:7000  -redis.password xxxxxx  -web.listen-address 172.18.11.139:9121 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.139:7001  -redis.password xxxxxx  -web.listen-address 172.18.11.139:9122 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.140:7002  -redis.password xxxxxx  -web.listen-address 172.18.11.139:9123 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.140:7003  -redis.password xxxxxx  -web.listen-address 172.18.11.139:9124 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.141:7004  -redis.password xxxxxx  -web.listen-address 172.18.11.139:9125 > /dev/null 2>&1 &
nohup ./redis_exporter -redis.addr 172.18.11.141:7005  -redis.password xxxxxx  -web.listen-address 172.18.11.139:9126 > /dev/null 2>&1 &

最後只好去 githubissue。用個人中國式英語和做者交流,終於明白了。。。其實官方文檔已經寫了。

scrape_configs:
  ## config for the multiple Redis targets that the exporter will scrape
  - job_name: 'redis_exporter_targets'
    static_configs:
      - targets:
        - redis://first-redis-host:6379
        - redis://second-redis-host:6379
        - redis://second-redis-host:6380
        - redis://second-redis-host:6381
    metrics_path: /scrape
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: <<REDIS-EXPORTER-HOSTNAME>>:9121
  
  ## config for scraping the exporter itself
  - job_name: 'redis_exporter'
    static_configs:
      - targets:
        - <<REDIS-EXPORTER-HOSTNAME>>:9121

Redis 集羣實際操做

啓動 redis_exporter

nohup ./redis_exporter -redis.password xxxxx  &

重點
prometheus 裏面如何配置:

- job_name: 'redis_exporter_targets'
    static_configs:
      - targets:
        - redis://172.18.11.139:7000
        - redis://172.18.11.139:7001
        - redis://172.18.11.140:7002
        - redis://172.18.11.140:7003
        - redis://172.18.11.141:7004
        - redis://172.18.11.141:7005
    metrics_path: /scrape
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 172.18.11.139:9121
  ## config for scraping the exporter itself
  - job_name: 'redis_exporter'
    static_configs:
      - targets:
        - 172.18.11.139:9121

這樣子就能採集到集羣的數據了。可是日誌裏提示

time="2019-12-17T09:10:49+08:00" level=error msg="Couldn't connect to redis instance"

午休的時候忽然想明白了,只要能鏈接到一個集羣的一個節點,天然就能查詢其餘節點的指標了。因而啓動命令改成:

nohup ./redis_exporter -redis.addr 172.18.11.141:7005  -redis.password xxxxx &

Prometheus 配置不變

送上幾張圖片:

告警規則

groups:
- name:  Redis
  rules: 
    - alert: RedisDown
      expr: redis_up  == 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Redis down (instance {{ $labels.instance }})"
        description: "Redis 掛了啊,mmp\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: MissingBackup
      expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Missing backup (instance {{ $labels.instance }})"
        description: "Redis has not been backuped for 24 hours\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"       
    - alert: OutOfMemory
      expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Out of memory (instance {{ $labels.instance }})"
        description: "Redis is running out of memory (> 90%)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: ReplicationBroken
      expr: delta(redis_connected_slaves[1m]) < 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Replication broken (instance {{ $labels.instance }})"
        description: "Redis instance lost a slave\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: TooManyConnections
      expr: redis_connected_clients > 1000
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Too many connections (instance {{ $labels.instance }})"
        description: "Redis instance has too many connections\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"       
    - alert: NotEnoughConnections
      expr: redis_connected_clients < 5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Not enough connections (instance {{ $labels.instance }})"
        description: "Redis instance should have more connections (> 5)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
    - alert: RejectedConnections
      expr: increase(redis_rejected_connections_total[1m]) > 0
      for: 5m
      labels:
        severity: error
      annotations:
        summary: "Rejected connections (instance {{ $labels.instance }})"
        description: "Some connections to Redis has been rejected\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"
相關文章
相關標籤/搜索