故障分析 | binlog flush 失敗致使的 Crash

時間 2020-02-22

標籤故障分析 binlog flush 失敗致使 crash 简体版

原文原文鏈接

做者：xuty
開個坑，記錄本身平時因爲解決問題須要或是興趣研究進行的 MySQL 源碼跟蹤學習過程。

1、問題現象

某項目上出現 MySQL Crash，相關 errorlog 日誌以下，從日誌能夠看出是 binlog error 致使的問題，可是沒法確認具體緣由，網上大部分資料都說是因爲空間已滿致使，後來在系統日誌(/var/log/message)中確實找到了 / 分區空間已滿的信息，因此基本能夠確認 binlog error 是因爲磁盤空間已滿致使，進而形成 MySQL Crash。html

2019-09-24T14:17:47.786460+08:00 527621 [ERROR] /usr/local/mysql/bin/mysqld: Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.
06:17:47 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=16777216
max_used_connections=79
max_threads=2500
thread_count=35
connection_count=34
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 122921649 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fbc3400ef40
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fbcd01d9ea8 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x35)[0xf4fbe5]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x4a4)[0x7d1f54]
/lib64/libpthread.so.0(+0xf370)[0x7fc5897fe370]
/lib64/libc.so.6(gsignal+0x37)[0x7fc5881f21d7]
/lib64/libc.so.6(abort+0x148)[0x7fc5881f38c8]
/usr/local/mysql/bin/mysqld[0xee626a]
/usr/local/mysql/bin/mysqld(_ZN13MYSQL_BIN_LOG33handle_binlog_flush_or_sync_errorEP3THDb+0x17b)[0xef289b]
/usr/local/mysql/bin/mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x3ca)[0xef480a]
/usr/local/mysql/bin/mysqld(_ZN13MYSQL_BIN_LOG6commitEP3THDb+0x51d)[0xef4f7d]
/usr/local/mysql/bin/mysqld(_Z15ha_commit_transP3THDbb+0x174)[0x823434]
/usr/local/mysql/bin/mysqld(_Z12trans_commitP3THD+0x49)[0xdd38f9]
/usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x2c89)[0xd1a1a9]
/usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x40d)[0xd1c86d]
/usr/local/mysql/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x11a5)[0xd1da95]
/usr/local/mysql/bin/mysqld(_Z10do_commandP3THD+0x194)[0xd1e944]
/usr/local/mysql/bin/mysqld(handle_connection+0x2b4)[0xdeffa4]
/usr/local/mysql/bin/mysqld(pfs_spawn_thread+0x174)[0xfc8f64]
/lib64/libpthread.so.0(+0x7dc5)[0x7f097d411dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f097becf73d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f00a80008f0): is an invalid pointer
Connection ID (thread ID): 98525
Status: NOT_KILLED

2、問題疑惑

對於這個問題，一開始存在幾點疑惑，須要實驗和源碼追蹤驗證。mysql

binlog error 是因爲 / 分區空間已滿形成的，爲何會形成 MySQL Crash？
/ 分區明明有十幾個 G 的剩餘空間，爲何會滿？是什麼文件撐滿的？爲啥過後空間卻自動釋放了？

這裏就已知的知識先猜想下問題過程，是否正確則須要測試驗證，內部機制可能就須要去查看源碼。linux

大概過程如上，項目上 /tmp 默認爲 MySQL 的臨時文件目錄，其所在分區通常爲 / 分區(根分區)，而 MySQL 的數據目錄都存放在 /data/mysql_data 爲另外一個單獨的分區(/data 分區)，項目上通常均是如此。sql

#bin包默認值
datadir = /data/mysql_data
tmpdir = /tmp

當 /tmp 空間不足，也就是 / 分區已滿的狀況下，致使 binlog flush 沒法進行下去，拋出 binlog_error 的異常，致使 MySQL crash`!數據庫

3、資料查閱

針對這個問題，網上進行大量資料查閱後，相關知識點總結以下:緩存

與這次問題相關最主要的 2 個 MySQL 參數。安全

#bin包默認值
binlog_cache_size=32768
binlog_error_action = ABORT_SERVER

binlog_cache_size：當事務開始時，MySQL 會分配一個 binlog_cache_size 大小的緩衝區（默認 32K）給 session 用於存儲二進制日誌的緩存。若是事務大於 binlog_cache_size（默認 32K），線程將打開一個 臨時文件 用於存儲事務(默認 /tmp下），當線程結束時，臨時文件會自動被刪除。session

binlog_error_action：默認值爲 ABORT_SERVER，會使 MySQL 在寫 binlog 遇到嚴重錯誤時直接退出(即Crash動做)，好比磁盤滿了，文件系統不可寫入等。在 ABORT_SERVER 選項下，binlog 和從庫都是安全的，這是官方選擇此默認值的緣由。併發

此時咱們就已經能夠回答以前提的問題了~app

binlog error 是因爲 /分區空間已滿 形成的，爲何會形成 MySQL Crash？

由於 binlog_error_action 參數默認值爲 ABORT_SERVER，在 binlog 寫入錯誤時會選擇直接退出保證 binlog 安全。

/ 分區明明有十幾個 G 的剩餘空間，爲何會滿，是什麼文件撐滿的，爲啥過後空間卻自動釋放了。

猜想是因爲事務較大，超過了 binlog_cache_size，因此會在 /tmp(即/分區) 下生成臨時文件，致使 / 分區空間不足，引發 MySQL Crash 後，空間又自動釋放，因此過後看/分區空間是正常的。

四. 測試模擬

本地環境（爲方便測試，與上述分區略有不一樣）：

Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/cl-root   96G   57G   39G  60% /               <-------- datadir = /data/mysql_data
/dev/mapper/cl-tmp   3.8G   33M  3.8G   1% /data/tmp       <-------- tmpdir  = /data/tmp

模擬 多條 SQL 組成的超大事務，在存儲過程當中顯示開啓事務，循環插入 1kw 數據，觀察 tmpdir 目錄使用狀況.

直接在 /data/tmp 下沒法看到臨時文件，只有經過 lsof 命令才能夠看到在 tmpdir 目錄下確實生成了臨時文件，且一直在增加，佔用了 tmpdir 目錄下的大量空間。

存儲過程最終執行報錯，報 No space left on device 即臨時目錄空間不足，可是此時 MySQL 並無 Crash，errorlog 裏未出現報錯！！！

通過測試驗證，當報錯的時候，若是你在 同一個會話下 執行 commit 或者 再開啓一個事務 就會當即觸發 binlog_error_action，致使 MySQL 當即崩掉，且 errorlog 裏會出現 binlog error 報錯，與項目上現象一致。

PS：當 MySQL 解釋器遇到 start transaction 時候會觸發 commit… !!!

mysql> call test_insert();
ERROR 1026 (HY000): Error writing file '/data/tmp/ML1aShmH' (errno: 28 - No space left on device)
mysql> commit;
ERROR 1598 (HY000): Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.

mysql> call test_insert();
ERROR 1026 (HY000): Error writing file '/data/tmp/ML5b0Mj6' (errno: 28 - No space left on device)
mysql> start transaction;
ERROR 1598 (HY000): Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.

5、源碼跟蹤

跟蹤 MySQL 源碼驗證上述猜測及測試，不感興趣的童鞋能夠直接跳過~

爲了查看上述測試過程究竟是在源碼的哪一步最早報了錯誤，因此對比了 2 份 DBUG 日誌，左圖 是 報錯 的跟蹤日誌，右圖 爲 不報錯 的跟蹤日誌。

能夠看到報錯的跟蹤日誌最終是在 my_write函數 中拋出錯誤，而不報錯的日誌則正常執行完 my_write 函數。

//Write a chunk of bytes to a file
size_t my_write(File Filedes, const uchar *Buffer, size_t Count, myf MyFlags)
{
    size_t writtenbytes;
    size_t sum_written= 0;
    const size_t initial_count= Count;
    
    for (;;)
    {
        //...
        #ifdef _WIN32
            writtenbytes= my_win_write(Filedes, Buffer, Count);
        #else
            writtenbytes= write(Filedes, Buffer, Count); //writtenbytes爲實際寫入的字節數
        //...
        if (writtenbytes == Count)  //判斷實際寫入的字節數與要求寫入的字節數是否相符
        {
          //寫入正常則直接break跳出循環,正常結束.
          sum_written+= writtenbytes;
          break;
        }
        //...
    }
    //...
    if (MyFlags & (MY_NABP | MY_FNABP))
    {
        if (sum_written == initial_count)  //寫入正常走這個邏輯，返回0表明成功
          DBUG_RETURN(0);   
        if (MyFlags & (MY_WME | MY_FAE | MY_FNABP)) //寫入過程報錯走這個邏輯
        {
          char errbuf[MYSYS_STRERROR_SIZE];
          my_error(EE_WRITE, MYF(0), my_filename(Filedes),
               my_errno(), my_strerror(errbuf, sizeof(errbuf), my_errno())); //打印錯誤信息
        }
        DBUG_RETURN(MY_FILE_ERROR); //返回錯誤
    }
    //...
}

/*
源碼如上，MySQL最終會走Linux內核方法 write(Filedes, Buffer, Count)來刷新binlog緩衝至磁盤文件中。
1. Filedes 爲文件號（對應/proc/mysqld_pid/fd下的文件編號）
2. Buffer 爲指向所需寫入緩衝的指針。
3. Count 爲所須要寫入的字節數。

my: fd: 51  Buffer: 0x7f24c49e9e30  Count: 27

因爲 /data/tmp 磁盤已滿，沒法寫入 Count 所需的字節數，致使 writtenbytes!=Count，而後引發了後續一系列的代碼報錯，最終誘發 binlog_error_action->MySQL Crash.   
*/

MySQL 源碼執行 Commit 報錯過程精簡以下：

T@12: | >trans_commit
T@12: | | >ha_commit_trans
T@12: | | | >MYSQL_BIN_LOG::commit
T@12: | | | | >MYSQL_BIN_LOG::ordered_commit
T@12: | | | | | >MYSQL_BIN_LOG::process_flush_stage_queue
T@12: | | | | | | >binlog_cache_data::flush
T@12: | | | | | | | >MYSQL_BIN_LOG::write_cache(THD *, binlog_cache_data *, bool)
T@12: | | | | | | | | >MYSQL_BIN_LOG::do_write_cache
T@12: | | | | | | | | | >reinit_io_cache
T@12: | | | | | | | | | | >my_b_flush_io_cache
T@12: | | | | | | | | | | | >my_write
T@12: | | | | | | | | | | | | my: fd: 52  Buffer: 0x7f9ccc050db0  Count: 27  MyFlags: 20
T@12: | | | | | | | | | | | | error: Write only -1 bytes, error: 28
T@12: | | | | | | | | | | | | >my_error
T@12: | | | | | | | | | | | | | >my_message_sql
T@12: | | | | | | | | | | | | | | error: error: 3  message: 'Error writing file '/data/tmp/MLJRp8sa' (Errcode: 28 - No space left on device)'
T@12: | | | | | | | | | | | | | <my_message_sql 2281
T@12: | | | | | | | | | | | | <my_error 211
T@12: | | | | | | | | | | | <my_write 122
T@12: | | | | | | | | | | <my_b_flush_io_cache 1583
T@12: | | | | | | | | | <reinit_io_cache 367
T@12: | | | | | | | | <MYSQL_BIN_LOG::do_write_cache 7679
T@12: | | | | | | | <MYSQL_BIN_LOG::write_cache(THD *, binlog_cache_data *, bool) 8031
T@12: | | | | | | <binlog_cache_data::flush 1740
T@12: | | | | | <MYSQL_BIN_LOG::process_flush_stage_queue 8848
T@12: | | | | | >my_error
T@12: | | | | | | >my_message_sql
T@12: | | | | | | | error: error: 1598  message: 'Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.'
T@12: | | | | | | | >sql_print_error
T@12: | | | | | | | | >error_log_print
T@12: | | | | | | | | | >print_buffer_to_file
T@12: | | | | | | | | | | enter: buffer: /opt/mysql-5.7.24-linux-glibc2.12-x86_64/bin/mysqld-debug: Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.
2019-11-11T11:54:45.434783+08:00 12 [ERROR] /opt/mysql-5.7.24-linux-glibc2.12-x86_64/bin/mysqld-debug: Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.
T@12: | | | | | | | | | <print_buffer_to_file 2109
T@12: | | | | | | | | <error_log_print 2134
T@12: | | | | | | | <sql_print_error 2148
T@12: | | | | | | <my_message_sql 2281
T@12: | | | | | <my_error 211

6、擴展內容

下圖這個報錯其實就是本次問題的一個平常例子，不少童鞋經過 navicate 還原 MySQL 的時候可能就會碰到這個問題，本質就是 navicate 還原數據庫的時候默認會走 事務，當表數據庫較大時，事務大小超過 binlog_cache_size，就會用臨時表記錄事務數據，當 tmdir 目錄磁盤不足就會報錯，可是這種狀況並不會形成 MySQL Crash，由於 navicate 在還原報錯後並不會繼續去執行 commit 或 另開一個事務，而是直接斷開鏈接。

7、總結

這個問題目前在項目上不多碰到，此次也是出於好奇拿來學習探討，下面總結下這個問題出現的場景：

1.代碼中存在較大事務，超過 binlog_cache_size，高併發下生成大量臨時文件，佔滿 tmpdir。
2.代碼在事務執行過程當中碰到 tmpdir 磁盤已滿錯誤，未處理異常執行回滾，後續執行 Commit 致使。
3.代碼在事務執行過程當中碰到 tmpdir 磁盤已滿錯誤，未處理異常執行回滾，繼續執行碰到嵌套事務，引起 Commit 致使。

也許不少童鞋想到能夠 加大binlog_cache_size 來減小臨時文件的產生，可是這樣會增長內存消耗，試想如下，假如 binlog_cache_size 增長到 32MB，當有 300個數據庫鏈接 時，每一個鏈接都會分配 32MB 的 binlog_cache(無論你用多少)，那麼就是將近 10G，很容易致使內存溢出，被系統 OOM。

所以 binlog_cache_size 不可能增長不少，根本解決方法是減小事務大小，避免在高併發下同時產生大量臨時文件，撐滿 tmpdir；另外能夠增長 tmpdir 所在分區的大小。