Prometheus + Granafa 構建高大上的MySQL監控平臺

時間 2020-04-24

標籤 prometheus granafa 構建高大 mysql 監控平臺欄目 MySQL 简体版

原文原文鏈接

來源： https://blog.51cto.com/xiaolu...
做者：小羅ge11

概述

對於MySQL的監控平臺，相信你們實現起來有不少了：基於天兔的監控，還有基於zabbix相關的二次開發。相信不少同行都應該已經開始玩起來了。我這邊的選型是Prometheus + Granafa的實現方式。簡而言之就是我如今的生產環境使用的是prometheus，還有就是granafa知足的個人平常工做須要。在入門的簡介和安裝，你們能夠參考這裏：java

[ https://blog.51cto.com/cloumn...
]( https://blog.51cto.com/cloumn...

一、首先看下咱們的監控效果、mysql主從mysql

二、mysql狀態： linux

三、緩衝池狀態：git

exporter 相關部署

一、安裝exportergithub

[root@controller2 opt]# https://github.com/prometheus/mysqld_exporter/releases/download/v0.10.0/mysqld_exporter-0.10.0.linux-amd64.tar.gz
    [root@controller2 opt]# tar -xf mysqld_exporter-0.10.0.linux-amd64.tar.gz

二、添加mysql 帳戶：web

GRANT SELECT, PROCESS, SUPER, REPLICATION CLIENT, RELOAD ON *.* TO 'exporter'@'%' IDENTIFIED BY 'localhost';
    flush privileges;

三、編輯配置文件：spring

[root@controller2 mysqld_exporter-0.10.0.linux-amd64]# cat /opt/mysqld_exporter-0.10.0.linux-amd64/.my.cnf 
    [client]
    user=exporter
    password=123456

四、設置配置文件：sql

[root@controller2 mysqld_exporter-0.10.0.linux-amd64]# cat /etc/systemd/system/mysql_exporter.service 
    [Unit]
    Description=mysql Monitoring System
    Documentation=mysql Monitoring System
    
    [Service]
    ExecStart=/opt/mysqld_exporter-0.10.0.linux-amd64/mysqld_exporter \
             -collect.info_schema.processlist \
             -collect.info_schema.innodb_tablespaces \
             -collect.info_schema.innodb_metrics  \
             -collect.perf_schema.tableiowaits \
             -collect.perf_schema.indexiowaits \
             -collect.perf_schema.tablelocks \
             -collect.engine_innodb_status \
             -collect.perf_schema.file_events \
             -collect.info_schema.processlist \
             -collect.binlog_size \
             -collect.info_schema.clientstats \
             -collect.perf_schema.eventswaits \
             -config.my-cnf=/opt/mysqld_exporter-0.10.0.linux-amd64/.my.cnf
    
    [Install]
    WantedBy=multi-user.target

五、添加配置到prometheus server數據庫

- job_name: 'mysql'
        static_configs:
         - targets: ['192.168.1.11:9104','192.168.1.12:9104']

六、測試看有沒有返回數值：緩存

http://192.168.1.12:9104/metrics

正常咱們經過mysql_up能夠查詢倒mysql監控是否已經生效，是否起起來

#HELP mysql_up Whether the MySQL server is up.
    #TYPE mysql_up gauge
    mysql_up 1

監控相關指標

在作任何一個東西監控的時候，咱們要時刻明白咱們要監控的是什麼，指標是啥才能更好的去監控咱們的服務，在mysql裏面咱們一般能夠經過一下指標去衡量mysql的運行狀況：mysql主從運行狀況、查詢吞吐量、慢查詢狀況、鏈接數狀況、緩衝池使用狀況以及查詢執行性能等。

主從複製運行指標：

一、主從複製線程監控：

大部分狀況下，不少企業使用的都是主從複製的環境，監控兩個線程是很是重要的，在mysql裏面咱們一般是經過命令：

MariaDB [(none)]> show slave status\G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 172.16.1.1
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000045
              Read_Master_Log_Pos: 72904854
                   Relay_Log_File: mariadb-relay-bin.000127
                    Relay_Log_Pos: 72905142
            Relay_Master_Log_File: mysql-bin.000045
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

Slave_IO_Running、Slave_SQL_Running兩個線程正常那麼說明咱們的複製集羣是健康狀態的。

MySQLD Exporter中返回的樣本數據中經過mysql_slave_status_slave_sql_running來獲取主從集羣的健康情況。

# HELP mysql_slave_status_slave_sql_running Generic metric from SHOW SLAVE STATUS.
    # TYPE mysql_slave_status_slave_sql_running untyped
    mysql_slave_status_slave_sql_running{channel_name="",connection_name="",master_host="172.16.1.1",master_uuid=""} 1

二、主從複製落後時間：

在使用show slave status
裏面還有一個關鍵的參數Seconds_Behind_Master。Seconds_Behind_Master表示slave上SQL thread與IO thread之間的延遲，咱們都知道在MySQL的複製環境中，slave先從master上將binlog拉取到本地（經過IO thread），而後經過SQL
thread將binlog重放，而Seconds_Behind_Master表示本地relaylog中未被執行完的那部分的差值。因此若是slave拉取到本地的relaylog（實際上就是binlog，只是在slave上習慣稱呼relaylog而已）都執行完，此時經過show slave status看到的會是0

Seconds_Behind_Master: 0

MySQLD Exporter中返回的樣本數據中經過mysql_slave_status_seconds_behind_master 來獲取相關狀態。

# HELP mysql_slave_status_seconds_behind_master Generic metric from SHOW SLAVE STATUS.
    # TYPE mysql_slave_status_seconds_behind_master untyped
    mysql_slave_status_seconds_behind_master{channel_name="",connection_name="",master_host="172.16.1.1",master_uuid=""} 0

查詢吞吐量：

說到吞吐量，那麼咱們如何從那方面來衡量呢？
一般來講咱們能夠根據mysql 的插入、查詢、刪除、更新等操做來

爲了獲取吞吐量，MySQL 有一個名爲 Questions 的內部計數器（根據 MySQL
用語，這是一個服務器狀態變量），客戶端每發送一個查詢語句，其值就會加一。由 Questions 指標帶來的以客戶端爲中心的視角經常比相關的Queries
計數器更容易解釋。做爲存儲程序的一部分，後者也會計算已執行語句的數量，以及諸如PREPARE 和 DEALLOCATE PREPARE
指令運行的次數，做爲服務器端預處理語句的一部分。能夠經過命令來查詢：

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Questions";
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | Questions     | 15071 |
    +---------------+-------+

MySQLD Exporter中返回的樣本數據中經過mysql_global_status_questions反映當前Questions計數器的大小：

# HELP mysql_global_status_questions Generic metric from SHOW GLOBAL STATUS.
    # TYPE mysql_global_status_questions untyped
    mysql_global_status_questions 13253

固然因爲prometheus
具備很是豐富的查詢語言，咱們能夠經過這個累加的計數器來查詢某一短期內的查詢增加率狀況，能夠作相關的閾值告警處理、例如一下查詢2分鐘時間內的查詢狀況：

rate(mysql_global_status_questions[2m])

固然上面是總量，咱們能夠分別從監控讀、寫指令的分解狀況，從而更好地理解數據庫的工做負載、找到可能的瓶頸。一般，一般，讀取查詢會由 Com_select
指標抓取，而寫入查詢則可能增長三個狀態變量中某一個的值，這取決於具體的指令：

Writes = Com_insert + Com_update + Com_delete

下面咱們經過命令獲取插入的狀況：

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Com_insert";
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | Com_insert    | 10578 |
    +---------------+-------+

從MySQLD
Exporter的/metrics返回的監控樣本中，能夠經過global_status_commands_total獲取當前實例各種指令執行的次數：

# HELP mysql_global_status_commands_total Total number of executed MySQL commands.
    # TYPE mysql_global_status_commands_total counter
    mysql_global_status_commands_total{command="create_trigger"} 0
    mysql_global_status_commands_total{command="create_udf"} 0
    mysql_global_status_commands_total{command="create_user"} 1
    mysql_global_status_commands_total{command="create_view"} 0
    mysql_global_status_commands_total{command="dealloc_sql"} 0
    mysql_global_status_commands_total{command="delete"} 3369
    mysql_global_status_commands_total{command="delete_multi"} 0

慢查詢性能

查詢性能方面，慢查詢也是查詢告警的一個重要的指標。MySQL還提供了一個Slow_queries的計數器，當查詢的執行時間超過long_query_time的值後，計數器就會+1，其默認值爲10秒，能夠經過如下指令在MySQL中查詢當前long_query_time的設置：

MariaDB [(none)]> SHOW VARIABLES LIKE 'long_query_time';
    +-----------------+-----------+
    | Variable_name   | Value     |
    +-----------------+-----------+
    | long_query_time | 10.000000 |
    +-----------------+-----------+
    1 row in set (0.00 sec)

固然咱們也能夠修改時間

MariaDB [(none)]> SET GLOBAL long_query_time = 5;
    Query OK, 0 rows affected (0.00 sec)

而後咱們而已經過sql語言查詢MySQL實例中Slow_queries的數量：

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Slow_queries";
    +---------------+-------+
    | Variable_name | Value |
    +---------------+-------+
    | Slow_queries  | 0     |
    +---------------+-------+
    1 row in set (0.00 sec)

MySQLD
Exporter返回的樣本數據中，經過mysql_global_status_slow_queries指標展現當前的Slow_queries的值：

# HELP mysql_global_status_slow_queries Generic metric from SHOW GLOBAL STATUS.
    # TYPE mysql_global_status_slow_queries untyped
    mysql_global_status_slow_queries 0

一樣的，更具根據Prometheus 慢查詢語句咱們也能夠查詢倒他某段時間內的增加率：

rate(mysql_global_status_slow_queries[5m])

鏈接數監控

監控客戶端鏈接狀況至關重要，由於一旦可用鏈接耗盡，新的客戶端鏈接就會遭到拒絕。MySQL 默認的鏈接數限制爲 151。

MariaDB [(none)]> SHOW VARIABLES LIKE 'max_connections';
    +-----------------+-------+
    | Variable_name   | Value |
    +-----------------+-------+
    | max_connections | 151   |
    +-----------------+-------+

固然咱們能夠修改配置文件的形式來增長這個數值。與之對應的就是當前鏈接數量，當咱們當前鏈接出來超過系統設置的最大值以後常會出現咱們看到的Too many
connections(鏈接數過多)，下面我查找一下當前鏈接數：

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Threads_connected";
    +-------------------+-------+
    | Variable_name     | Value |
    +-------------------+-------+
    | Threads_connected | 41     |
    +-------------------+-------

固然mysql 還提供Threads_running 這個指標，幫助你分隔在任意時間正在積極處理查詢的線程與那些雖然可用可是閒置的鏈接。

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Threads_running";
    +-----------------+-------+
    | Variable_name   | Value |
    +-----------------+-------+
    | Threads_running | 10     |
    +-----------------+-------+

若是服務器真的達到 max_connections
限制，它就會開始拒絕新的鏈接。在這種狀況下，Connection_errors_max_connections
指標就會開始增長，同時，追蹤全部失敗鏈接嘗試的Aborted_connects 指標也會開始增長。

MySQLD Exporter返回的樣本數據中:

# HELP mysql_global_variables_max_connections Generic gauge metric from SHOW GLOBAL VARIABLES.
    # TYPE mysql_global_variables_max_connections gauge
    mysql_global_variables_max_connections 151

表示最大鏈接數

# HELP mysql_global_status_threads_connected Generic metric from SHOW GLOBAL STATUS.
    # TYPE mysql_global_status_threads_connected untyped
    mysql_global_status_threads_connected 41

表示當前的鏈接數

# HELP mysql_global_status_threads_running Generic metric from SHOW GLOBAL STATUS.
    # TYPE mysql_global_status_threads_running untyped
    mysql_global_status_threads_running 1

表示當前活躍的鏈接數

# HELP mysql_global_status_aborted_connects Generic metric from SHOW GLOBAL STATUS.
    # TYPE mysql_global_status_aborted_connects untyped
    mysql_global_status_aborted_connects 31

累計全部的鏈接數

# HELP mysql_global_status_connection_errors_total Total number of MySQL connection errors.
    # TYPE mysql_global_status_connection_errors_total counter
    mysql_global_status_connection_errors_total{error="internal"} 0
    #服務器內部引發的錯誤、如內存硬盤等
    mysql_global_status_connection_errors_total{error="max_connections"} 0
    #超出鏈接處引發的錯誤

固然根據prom表達式，咱們能夠查詢當前剩餘可用的鏈接數：

` mysql_global_variables_max_connections -
mysql_global_status_threads_connected `

查詢mysq拒絕鏈接數

mysql_global_status_aborted_connects

緩衝池狀況：

MySQL 默認的存儲引擎 InnoDB
使用了一片稱爲緩衝池的內存區域，用於緩存數據表與索引的數據。緩衝池指標屬於資源指標，而非工做指標，前者更多地用於調查（而非檢測）性能問題。若是數據庫性能開始下滑，而磁盤
I/O 在不斷攀升，擴大緩衝池每每能帶來性能回升。
默認設置下，緩衝池的大小一般相對較小，爲 128MiB。不過，MySQL 建議可將其擴大至專用數據庫服務器物理內存的 80% 大小。咱們能夠查看一下：

MariaDB [(none)]> show global variables like 'innodb_buffer_pool_size';
    +-------------------------+-----------+
    | Variable_name           | Value     |
    +-------------------------+-----------+
    | innodb_buffer_pool_size | 134217728 |
    +-------------------------+-----------+

MySQLD Exporter返回的樣本數據中，使用mysql_global_variables_innodb_buffer_pool_size來表示。

# HELP mysql_global_variables_innodb_buffer_pool_size Generic gauge metric from SHOW GLOBAL VARIABLES.
    # TYPE mysql_global_variables_innodb_buffer_pool_size gauge
    mysql_global_variables_innodb_buffer_pool_size 1.34217728e+08
    
    Innodb_buffer_pool_read_requests記錄了正常從緩衝池讀取數據的請求數量。能夠經過如下指令查看
    
    MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_read_requests";
    +----------------------------------+-------------+
    | Variable_name                    | Value       |
    +----------------------------------+-------------+
    | Innodb_buffer_pool_read_requests | 38465 |
    +----------------------------------+-------------+

MySQLD
Exporter返回的樣本數據中，使用mysql_global_status_innodb_buffer_pool_read_requests來表示。

# HELP mysql_global_status_innodb_buffer_pool_read_requests Generic metric from SHOW GLOBAL STATUS.
    # TYPE mysql_global_status_innodb_buffer_pool_read_requests untyped
    mysql_global_status_innodb_buffer_pool_read_requests 2.7711547168e+10

當緩衝池沒法知足時，MySQL只能從磁盤中讀取數據。Innodb_buffer_pool_reads即記錄了從磁盤讀取數據的請求數量。一般來講從內存中讀取數據的速度要比從磁盤中讀取快不少，所以，若是Innodb_buffer_pool_reads的值開始增長，可能意味着數據庫的性能有問題。
能夠經過如下只能查看Innodb_buffer_pool_reads的數量

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_reads";
    +--------------------------+-------+
    | Variable_name            | Value |
    +--------------------------+-------+
    | Innodb_buffer_pool_reads | 138  |
    +--------------------------+-------+
    1 row in set (0.00 sec)

MySQLD
Exporter返回的樣本數據中，使用mysql_global_status_innodb_buffer_pool_read_requests來表示。

# HELP mysql_global_status_innodb_buffer_pool_reads Generic metric from SHOW GLOBAL STATUS.
    # TYPE mysql_global_status_innodb_buffer_pool_reads untyped
    mysql_global_status_innodb_buffer_pool_reads 138

經過以上監控指標，以及實際監控的場景，咱們能夠利用PromQL快速創建多個監控項。能夠查看兩分鐘內讀取磁盤的增加率的增加率：

rate(mysql_global_status_innodb_buffer_pool_reads[2m])

官方模板ID

上面是咱們簡單列舉的一些指標，下面咱們使用granafa給 MySQLD_Exporter添加監控圖表：

主從主羣監控(模板7371)：
相關mysql 狀態監控7362：
緩衝池狀態7365：
簡單的告警規則

除了相關模板以外，沒有告警規則那麼咱們的監控就是不完美的，下面列一下咱們的監控告警規則

groups:
    - name: MySQL-rules
      rules:
      - alert: MySQL Status 
        expr: up == 0
        for: 5s 
        labels:
          severity: warning
        annotations:
          summary: "{{$labels.instance}}: MySQL has stop !!!"
          description: "檢測MySQL數據庫運行狀態"
    
      - alert: MySQL Slave IO Thread Status
        expr: mysql_slave_status_slave_io_running == 0
        for: 5s 
        labels:
          severity: warning
        annotations: 
          summary: "{{$labels.instance}}: MySQL Slave IO Thread has stop !!!"
          description: "檢測MySQL主從IO線程運行狀態"
    
      - alert: MySQL Slave SQL Thread Status 
        expr: mysql_slave_status_slave_sql_running == 0
        for: 5s 
        labels:
          severity: warning
        annotations: 
          summary: "{{$labels.instance}}: MySQL Slave SQL Thread has stop !!!"
          description: "檢測MySQL主從SQL線程運行狀態"
    
      - alert: MySQL Slave Delay Status 
        expr: mysql_slave_status_sql_delay == 30
        for: 5s 
        labels:
          severity: warning
        annotations: 
          summary: "{{$labels.instance}}: MySQL Slave Delay has more than 30s !!!"
          description: "檢測MySQL主從延時狀態"
    
      - alert: Mysql_Too_Many_Connections
        expr: rate(mysql_global_status_threads_connected[5m]) > 200
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "{{$labels.instance}}: 鏈接數過多"
          description: "{{$labels.instance}}: 鏈接數過多，請處理 ,(current value is: {{ $value }})"  
    
      - alert: Mysql_Too_Many_slow_queries
        expr: rate(mysql_global_status_slow_queries[5m]) > 3
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "{{$labels.instance}}: 慢查詢有點多，請檢查處理"
          description: "{{$labels.instance}}: Mysql slow_queries is more than 3 per second ,(current value is: {{ $value }})"

二、添加規則到prometheus：