王棟:攜程技術保障中心數據庫專家,對數據庫疑難問題的排查和數據庫自動化智能化運維工具的開發有強烈的興趣。node
最近碰到有臺MySQL實例出現了MySQL服務短暫hang死,表現爲瞬間的併發線程上升,鏈接數暴增。
排查Error Log文件中有page_cleaner超時的信息,引發咱們的關注:
mysql
2019-08-24T23:47:09.361836+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 24915ms. The settings might not be optimal. (flushed=182 and evicted=0, during the time.)
2019-08-24T23:47:16.211740+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4849ms. The settings might not be optimal. (flushed=240 and evicted=0, during the time.)
2019-08-24T23:47:23.362286+08:00 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 6151ms. The settings might not be optimal. (flushed=215 and evicted=0, during the time.)
經過源碼storage/innobase/buf/buf0flu.cc能夠看出,當知足curr_time > next_loop_time + 3000條件(大於4秒)時,會打印上面信息到error log中。next_loop_time爲1000ms,即1秒鐘作一次刷新頁的操做。sql
if (ret_sleep == OS_SYNC_TIME_EXCEEDED) { ulint curr_time = ut_time_ms(); if (curr_time > next_loop_time + 3000) { if (warn_count == 0) { ib::info() << "page_cleaner: 1000ms" " intended loop took " << 1000 + curr_time - next_loop_time << "ms. The settings might not" " be optimal. (flushed=" << n_flushed_last << " and evicted=" << n_evicted << ", during the time.)"; if (warn_interval > 300) { warn_interval = 600; } else { warn_interval *= 2; } warn_count = warn_interval; } else { --warn_count; } } else { /* reset counter */ warn_interval = 1; warn_count = 0; } next_loop_time = curr_time + 1000; n_flushed_last = n_evicted = 0; }
後半部分(flushed=182 and evicted=0, during the time.)日誌,對應n_flushed_last和n_evicted兩個變量,這兩個變量來自於下面2個變量。
數據庫
n_evicted += n_flushed_lru;
n_flushed_last += n_flushed_list;
從源碼註釋中能夠看出,n_flushed_lru表示從LRU list尾部刷新的頁數,也就是日誌中如evicted=0指標的所表示的值。
n_flushed_list:這個是從flush_list刷新列表中刷新的頁數,也就是髒頁數,日誌中flushed=182的值。服務器
/** Wait until all flush requests are finished. @param n_flushed_lru number of pages flushed from the end of the LRU list. @param n_flushed_list number of pages flushed from the end of the flush_list. @return true if all flush_list flushing batch were success. */ static bool pc_wait_finished( ulint* n_flushed_lru, ulint* n_flushed_list)
從pc_wait_finished函數能夠看出page_cleaner線程包含了LRU list和flush_list兩部分刷新,並且須要等待兩部分都完成刷新。多線程
1) LRU List刷髒在先,Flush list的刷髒在後,可是是互斥的。也就是說在進Flush list刷髒的時候,LRU list不能繼續去刷髒,必須等到下一個循環週期才能進行。
2) 另一個問題是,刷髒的時候,page cleaner coodinator會等待全部的page cleaner線程完成以後纔會繼續響應刷髒請求。這帶來的問題就是若是某個buffer pool instance比較熱的話,page cleaner就不能及時進行響應。
Percona版本對LRU list刷髒作了不少優化。併發
mysqlbinlog -vvv binlog --start-datetime='2019-08-24 23:46:15' --stop-datetime='2019-08-24 23:47:25'|lessmvc
從計數器指標能夠看出期間併發線程積壓,QPS處理能力降低,稍後MySQL服務恢復,積壓的請求集中釋放,致使併發鏈接進一步上升
less
innodb_buffer_pool_pages_misc與adaptive hash index有關,下面是官網的描述
運維
• Innodb_buffer_pool_pages_misc
The number of pages in the InnoDB buffer pool that are busy because they have been allocated for administrative overhead, such as row locks or the adaptive hash index. This value can also be calculated as Innodb_buffer_pool_pages_total − Innodb_buffer_pool_pages_free − Innodb_buffer_pool_pages_data. When using compressed tables, Innodb_buffer_pool_pages_misc may report an out-of-bounds value (Bug #59550).
爲了進一步驗證,咱們在測試環境下模擬測試了drop table對高併發MySQL性能的影響。
一、 使用sysbench工具作壓測,首先在測試環境下建立了8張2億條記錄的表,單表容量48G
二、 開啓innodb_adaptive_hash_index,用olap模板壓測1個小時,填充目標8張表所對應的AHI內存
三、 再次開啓壓測線程只對sbtest1表作訪問,觀察MySQL的訪問狀況
四、 新建會話運行drop table test.sbtest2,看到drop一張48G的表消耗了64.84秒
五、用自定義腳本檢測每秒Innodb_buffer_pool_pages_misc和Innodb_buffer_pool_pages_free指標的變化狀況,看到在drop table期間Innodb_buffer_pool_pages_misc大量釋放,Innodb_buffer_pool_pages_free值同時增加,釋放和增長的內容總量基本一致,以下圖所示
六、drop table期間,MySQL處於hang死狀態,QPS長時間跌0
七、errorlog中也重現了page_cleaner的日誌信息
至此復現了問題現象。
一、 壓測期間,抓取了pstack,show engine innodb status,以及performance_schema下events_waits_summary_global_by_event_name表中相關mutex等現場信息
二、 在SEMAPHORES相關信息中,能夠看到hang死期間大量Thread請求S-lock,被同一個線程140037411514112持有的鎖所阻塞,時間持續了64秒
--Thread 140037475792640 has waited at row0purge.cc line 862 for 64.00 seconds the semaphore:
S-lock on RW-latch at 0x966f6e38 created in file dict0dict.cc line 1183
a writer (thread id 140037411514112) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /mysql-5.7.23/storage/innobase/row/row0mysql.cc line 4253
--Thread 140037563102976 has waited at srv0srv.cc line 1982 for 57.00 seconds the semaphore:
X-lock on RW-latch at 0x966f6e38 created in file dict0dict.cc line 1183
a writer (thread id 140037411514112) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /mysql-5.7.23/storage/innobase/row/row0mysql.cc line 4253
三、經過ROW OPERATIONS相關信息,能夠看到MySQL的Main Thread也被同一個線程140037411514112阻塞,狀態處於enforcing dict cache limit狀態
3 queries inside InnoDB, 0 queries in queue
17 read views open inside InnoDB
Process ID=39257, Main thread ID=140037563102976, state: enforcing dict cache limit
Number of rows inserted 1870023456, updated 44052478, deleted 22022445, read 9301843315
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
四、能夠看到線程140037411514112執行的SQL就是drop table test.sbtest2語句,說明drop table期間持有的鎖,阻塞了Main Thread及其餘線程
---TRANSACTION 44734025, ACTIVE 64 sec dropping table——
10 lock struct(s), heap size 1136, 7 row lock(s), undo log entries 1
MySQL thread id 440836, OS thread handle 140037411514112, query id 475074428 localhost root checking permissions
drop table test.sbtest2
五、下面是抓到的drop table 的簡化後的調用棧信息,對比能夠看出,64秒的時間,drop table 都在執行函數btr_search_drop_page_hash_index,清空對應的AHI記錄信息
Thread 32 (Thread 0x7f5d002b2700 (LWP 8156)):
#0 ha_remove_all_nodes_to_page
#1 btr_search_drop_page_hash_index
#2 btr_search_drop_page_hash_when_freed
#3 fseg_free_extent
#4 fseg_free_step
#5 btr_free_but_not_root
#6 btr_free_if_exists
#7 dict_drop_index_tree
#8 row_upd_clust_step
#9 row_upd
#10 row_upd_step
#11 que_thr_step
#12 que_run_threads_low
#13 que_run_threads
#14 que_eval_sql
#15 row_drop_table_for_mysql
#16 ha_innobase::delete_table
#17 ha_delete_table
#18 mysql_rm_table_no_locks
#19 mysql_rm_table
#20 mysql_execute_command
#21 mysql_parse
#22 dispatch_command
#23 do_command
#24 handle_connection
#25 pfs_spawn_thread
#26 start_thread ()
#27 clone ()
六、經過代碼咱們能夠看到drop table調用row_drop_table_for_mysql函數時,在row_mysql_lock_data_dictionary(trx);位置對元數據加了排它鎖
row_drop_table_for_mysql(
const char* name,
trx_t* trx,
bool drop_db,
bool nonatomic,
dict_table_t* handler)
{
dberr_t err;
dict_foreign_t* foreign;
dict_table_t* table = NULL;
char* filepath = NULL;
char* tablename = NULL;
bool locked_dictionary = false;
pars_info_t* info = NULL;
mem_heap_t* heap = NULL;
bool is_intrinsic_temp_table = false;
DBUG_ENTER("row_drop_table_for_mysql");
DBUG_PRINT("row_drop_table_for_mysql", ("table: '%s'", name));
ut_a(name != NULL);
/* Serialize data dictionary operations with dictionary mutex:
no deadlocks can occur then in these operations /
trx->op_info = "dropping table";
if (handler != NULL && dict_table_is_intrinsic(handler)) {
table = handler;
is_intrinsic_temp_table = true;
}
if (table == NULL) {
if (trx->dict_operation_lock_mode != RW_X_LATCH) {
/ Prevent foreign key checks etc. while we are
dropping the table */
row_mysql_lock_data_dictionary(trx);
locked_dictionary = true;
nonatomic = true;
}
七、以Main Thread爲例,在調用srv_master_evict_from_table_cache函數獲取X-lock on RW-latch時被阻塞
/********************************************************************//**
Make room in the table cache by evicting an unused table.
@return number of tables evicted. /
static
ulint
srv_master_evict_from_table_cache(
/==============================/
ulint pct_check) /!< in: max percent to check */
{
ulint n_tables_evicted = 0;
rw_lock_x_lock(dict_operation_lock);
dict_mutex_enter_for_mysql();
n_tables_evicted = dict_make_room_in_cache(
innobase_get_table_cache_size(), pct_check);
dict_mutex_exit_for_mysql();
rw_lock_x_unlock(dict_operation_lock);
return(n_tables_evicted);
}
八、查看dict_operation_lock的註釋,create drop table操做時須要X鎖,而一些其餘後臺線程,好比Main Thread檢查dict cache時,也須要獲取dict_operation_lock的X鎖,所以被阻塞
/** @brief the data dictionary rw-latch protecting dict_sys
table create, drop, etc. reserve this in X-mode; implicit or
backround operations purge, rollback, foreign key checks reserve this
in S-mode; we cannot trust that MySQL protects implicit or background
operations a table drop since MySQL does not know of them; therefore
we need this; NOTE: a transaction which reserves this must keep book
on the mode in trx_t::dict_operation_lock_mode /
rw_lock_t dict_operation_lock;
九、期間用戶線程因爲獲取不到鎖而處於掛起狀態,當沒法馬上得到鎖時,會調用row_mysql_handle_errors將錯誤碼傳到上層進行處理
/****************************************************************//
Handles user errors and lock waits detected by the database engine.
@return true if it was a lock wait and we should continue running the
query thread and in that case the thr is ALREADY in the running state. /
bool
row_mysql_handle_errors
下面是簡化後的用戶線程調用棧信息:
Thread 29 (Thread 0x7f5d001ef700 (LWP 8159)): #0 pthread_cond_wait@@GLIBC_2.3.2 #1 wait #2 os_event::wait_low #3 os_event_wait_low #4 lock_wait_suspend_thread #5 row_mysql_handle_errors #6 row_search_mvcc #7 ha_innobase::index_read #8 handler::ha_index_read_map #9 handler::read_range_first #10 handler::multi_range_read_next #11 QUICK_RANGE_SELECT::get_next #12 rr_quick #13 mysql_update #14 Sql_cmd_update::try_single_table_update #15 Sql_cmd_update::execute #16 mysql_execute_command #17 mysql_parse #18 dispatch_command #19 do_command #20 handle_connection #21 pfs_spawn_thread #22 start_thread #23 clone
十、page_cleaner後臺線程沒有抓到明顯的阻塞關係,只看到以下正常的調用棧信息
Thread 55 (Thread 0x7f5c7fe15700 (LWP 39287)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1 os_event::timed_wait
#2 os_event::wait_time_low
#3 os_event_wait_time_low
#4 pc_sleep_if_needed
#5 buf_flush_page_cleaner_coordinator
#6 start_thread
#7 clone
Thread 54 (Thread 0x7f5c7f614700 (LWP 39288)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
#1 wait
#2 os_event::wait_low
#3 os_event_wait_low
#4 buf_flush_page_cleaner_worker
#5 start_thread
#6 clone
drop table引發的MySQL 短暫hang死的問題,是因爲drop 一張使用AHI空間較大的表時,調用執行AHI的清理動做,會消耗較長時間,執行期間長時間持有dict_operation_lock的X鎖,阻塞了其餘後臺線程和用戶線程; drop table執行結束鎖釋放,MySQL積壓的用戶線程集中運行,出現了併發線程和鏈接數瞬間上升的現象。 規避問題的方法,能夠考慮在drop table前關閉AHI。