原創: 洪斌php
本文目錄html
1. 問題背景mysql
2. 問題分析linux
2.1 SHOW GLOBAL VARIABLESgit
2.2 SLAVE SQL THREADgithub
2.3 SHOW GLOBAL STATUSsql
2.4 問題1: 新鏈接爲何沒法創建?數據庫
2.5 問題2:查詢操做爲何超時?app
3. 問題結論框架
4. 問題處理
某環境上有一組Percona MySQL 5.7.23-23的半同步主從,咱們採用Prometheus監控框架,按其接口規範自研了獨立的exporter用於監控數據採集,相似於mysqld_exporter,工做方式大體爲:
開啓一個http端口
採集本機MySQL狀態(簡單的show語句)
將狀態發佈到http端口
mysqld_exporter:
exporter工做模式比較簡單,理論上不會對數據庫形成嚴重影響.
但在數據庫節點上部署啓動exporter後,發現數據庫的鏈接狀態異常,對業務產生了必定影響.
1.檢查鏈接可用性
1.1經過TCP檢查鏈接可用性 -- 新鏈接沒法創建!!
1.2經過Socket檢查鏈接可用性 -- 仍然沒法創建鏈接
2.已鏈接的業務報錯查詢超時
3.檢查mysql-error.log -- 沒有任何內容
4.檢查系統日誌 -- 沒有任何相關信息
5.觀察系統資源佔用,無異常佔用
爲什麼再正常不過的狀態採集會致使如此大的問題? 因爲日誌中已經沒法繼續判斷緣由. 因此迅速採集調用棧信息:
pstack $(pgrep -xn mysqld)
Tips:
如何採集mysql調用棧信息:
https://www.ibm.com/developerworks/cn/linux/l-cn-deadlock/index.html
梳理exporter對數據庫發起的SQL語句:
show global variables
show global status
show master status
show slave status
經過SQL語句列表和調用棧信息,大體能夠觀測出MySQL內部出現了Mutex死鎖.死鎖關係以下:
1.SHOW GLOBAL VARIABLES
持有LOCK_global_system_variables, 等待LOCK_log
2.SLAVE SQL THREAD
持有LOCK_log, 等待LOCK_status
3.SHOW GLOBAL STATUS
持有LOCK_status, 等待LOCK_global_system_variables
咱們來看下每部分的調用棧信息:
源碼部分基於Percona MySQL 5.7.23-23
Thread 18 (Thread 0x7fad69737700 (LWP 3588)): #0 0x00007fdd3527642d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fdd35271dcb in _L_lock_812 () from /lib64/libpthread.so.0 #2 0x00007fdd35271c98 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000d60a5e in PolyLock_lock_log::rdlock() () #4 0x0000000000c3e4d5 in sys_var::value_ptr(THD*, THD*, enum_var_type, st_mysql_lex_string*) () #5 0x0000000000d1ce70 in get_one_variable_ext(THD*, THD*, st_mysql_show_var const*, enum_var_type, enum_mysql_show_type, system_status_var*, charset_info_st const**, char*, unsigned long*) () #6 0x0000000000d1cee1 in get_one_variable(THD*, st_mysql_show_var const*, enum_var_type, enum_mysql_show_type, system_status_var*, charset_info_st const**, char*, unsigned long*) () #7 0x0000000000d2234c in show_status_array(THD*, char const*, st_mysql_show_var*, enum_var_type, system_status_var*, char const*, TABLE_LIST*, bool, Item*) () #8 0x0000000000d27ec6 in fill_variables(THD*, TABLE_LIST*, Item*) () #9 0x0000000000d1550c in do_fill_table(THD*, TABLE_LIST*, QEP_TAB*) () #10 0x0000000000d286ac in get_schema_tables_result(JOIN*, enum_schema_table_state) () #11 0x0000000000d0afed in JOIN::prepare_result() () #12 0x0000000000c9a2cf in JOIN::exec() () #13 0x0000000000d0b93d in handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) () #14 0x00000000007625d8 in execute_sqlcom_select(THD*, TABLE_LIST*) () #15 0x0000000000ccd191 in mysql_execute_command(THD*, bool) () #16 0x0000000000cd06f5 in mysql_parse(THD*, Parser_state*) () #17 0x0000000000cd12bd in dispatch_command(THD*, COM_DATA const*, enum_server_command) () #18 0x0000000000cd2cff in do_command(THD*) () #19 0x0000000000d9c8f8 in handle_connection () #20 0x0000000000f18904 in pfs_spawn_thread () #21 0x00007fdd3526fe25 in start_thread () from /lib64/libpthread.so.0 #22 0x00007fdd3344434d in clone () from /lib64/libc.so.6
/**sql\set_var.cc:267**/ uchar *sys_var::value_ptr(THD *running_thd, ...) { mysql_mutex_assert_owner(&LOCK_global_system_variables); } /**sql\sys_vars.cc:3788**/ void PolyLock_lock_log::rdlock() { mysql_mutex_lock(mysql_bin_log.get_log_lock()); }
Thread 25 (Thread 0x7fad7c0f5700 (LWP 47842)): #0 0x00007fdd3527642d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fdd35271e01 in _L_lock_1022 () from /lib64/libpthread.so.0 #2 0x00007fdd35271da2 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000e92d37 in MYSQL_BIN_LOG::publish_coordinates_for_global_status() const () #4 0x0000000000e99791 in MYSQL_BIN_LOG::ordered_commit(THD*, bool, bool) () #5 0x0000000000e9ccbf in MYSQL_BIN_LOG::commit(THD*, bool) () #6 0x00000000008084d1 in ha_commit_trans(THD*, bool, bool) () #7 0x0000000000d7c9c9 in trans_commit(THD*) () #8 0x0000000000e65efb in Xid_log_event::do_commit(THD*) () #9 0x0000000000e69b45 in Xid_apply_log_event::do_apply_event(Relay_log_info const*) () #10 0x0000000000e6f5dd in Log_event::apply_event(Relay_log_info*) () #11 0x0000000000eb87f6 in apply_event_and_update_pos(Log_event**, THD*, Relay_log_info*) () #12 0x0000000000ec4e90 in handle_slave_sql () #13 0x0000000000f18904 in pfs_spawn_thread () #14 0x00007fdd3526fe25 in start_thread () from /lib64/libpthread.so.0 #15 0x00007fdd3344434d in clone () from /lib64/libc.so.6
關鍵代碼
/**sql\binlog.cc:10424*// void MYSQL_BIN_LOG::publish_coordinates_for_global_status(void) const { mysql_mutex_assert_owner(&LOCK_log); mysql_mutex_lock(&LOCK_status); }
Thread 14 (Thread 0x7fad69633700 (LWP 3613)): #0 0x00007fdd3527642d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fdd35271e01 in _L_lock_1022 () from /lib64/libpthread.so.0 #2 0x00007fdd35271da2 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000d222eb in show_status_array(THD*, char const*, st_mysql_show_var*, enum_var_type, system_status_var*, char const*, TABLE_LIST*, bool, Item*) () #4 0x0000000000d29860 in fill_status(THD*, TABLE_LIST*, Item*) () #5 0x0000000000d1550c in do_fill_table(THD*, TABLE_LIST*, QEP_TAB*) () #6 0x0000000000d286ac in get_schema_tables_result(JOIN*, enum_schema_table_state) () #7 0x0000000000d0afed in JOIN::prepare_result() () #8 0x0000000000c9a2cf in JOIN::exec() () #9 0x0000000000d0b93d in handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) () #10 0x00000000007625d8 in execute_sqlcom_select(THD*, TABLE_LIST*) () #11 0x0000000000cca8a6 in mysql_execute_command(THD*, bool) () #12 0x0000000000cd06f5 in mysql_parse(THD*, Parser_state*) () #13 0x0000000000cd12bd in dispatch_command(THD*, COM_DATA const*, enum_server_command) () #14 0x0000000000cd2cff in do_command(THD*) () #15 0x0000000000d9c8f8 in handle_connection () #16 0x0000000000f18904 in pfs_spawn_thread () #17 0x00007fdd3526fe25 in start_thread () from /lib64/libpthread.so.0 #18 0x00007fdd3344434d in clone () from /lib64/libc.so.6
/**sql\sql_show.cc:8020*// int fill_status(THD *thd, TABLE_LIST *tables, Item *cond) { ... mysql_mutex_lock(&LOCK_status); } /**sql\sql_show.cc:3061**/ static bool show_status_array(THD *thd, const char *wild,SHOW_VAR *variables, ...) { ... mysql_mutex_lock(&LOCK_global_system_variables);
Thread 6 (Thread 0x7fad6942b700 (LWP 4343)): #0 0x00007fdd3527642d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fdd35271e01 in _L_lock_1022 () from /lib64/libpthread.so.0 #2 0x00007fdd35271da2 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000c88ff7 in THD::init() () #4 0x0000000000c8be9f in THD::THD(bool) () #5 0x0000000000e54a1a in Channel_info::create_thd() () #6 0x0000000000d9fede in Channel_info_tcpip_socket::create_thd() () #7 0x0000000000d9c7c2 in handle_connection () #8 0x0000000000f18904 in pfs_spawn_thread () #9 0x00007fdd3526fe25 in start_thread () from /lib64/libpthread.so.0 #10 0x00007fdd3344434d in clone () from /lib64/libc.so.6
鏈接的初始化須要LOCK_global_system_variables, 此mutex被佔用,致使新鏈接沒法初始化
/**sql\sql_class.cc:1629**/ void THD::init(void) { mysql_mutex_lock(&LOCK_global_system_variables); ... mysql_mutex_unlock(&LOCK_global_system_variables); /*
新鏈接請求發起時一直被掛起,沒有任何響應,現象以下:
[root@localhost /]# mysql -uroot -p -S /mysql/data/mysqld.sock Enter password: ^C^C^C^C^C
Thread 16 (Thread 0x7fad696b5700 (LWP 3597)): #0 0x00007fdd3527642d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fdd35271e01 in _L_lock_1022 () from /lib64/libpthread.so.0 #2 0x00007fdd35271da2 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000876508 in Item_func_get_system_var::fix_length_and_dec() () #4 0x000000000087953d in Item_func::fix_fields(THD*, Item**) () #5 0x0000000000c72485 in setup_fields(THD*, Bounds_checked_array<Item*>, List<Item>&, unsigned long, List<Item>*, bool, bool) () #6 0x0000000000d072d5 in st_select_lex::prepare(THD*) () #7 0x0000000000d0ba79 in handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) () #8 0x00000000007625d8 in execute_sqlcom_select(THD*, TABLE_LIST*) () #9 0x0000000000ccd191 in mysql_execute_command(THD*, bool) () #10 0x0000000000cd06f5 in mysql_parse(THD*, Parser_state*) () #11 0x0000000000cd12bd in dispatch_command(THD*, COM_DATA const*, enum_server_command) () #12 0x0000000000cd2cff in do_command(THD*) () #13 0x0000000000d9c8f8 in handle_connection () #14 0x0000000000f18904 in pfs_spawn_thread () #15 0x00007fdd3526fe25 in start_thread () from /lib64/libpthread.so.0 #16 0x00007fdd3344434d in clone () from /lib64/libc.so.6
SQL查詢時也須要LOCK_global_system_variables, 此mutex被佔用,致使查詢被阻塞.
/**sql\item_func.cc:7275**/ void Item_func_get_system_var::fix_length_and_dec() { ... mysql_mutex_lock(&LOCK_global_system_variables); }
通過一輪分析,咱們基本找到問題緣由. 但心中還存在疑問.
LOCK_global_system_variables是熱點的mutex,大機率會出現爭用,但不至於死鎖.
show global variables操做爲什麼須要LOCK_log?
這樣經常使用的查詢真的會穩定致使數據庫死鎖麼?
進一步搜索,發現社區已有相似的缺陷反饋:
該缺陷可理解爲:
MySQL 5.7.22引入了以下兩個參數:
binlog_transaction_dependency_tracking
binlog_transaction_dependency_history_size
在5.7.23版本中,讀取這兩個參數須要LOCK_log.從而會致使各類各樣的死鎖(bug正文中提到了另一種場景).
在5.7.25版本之後,這兩個參數的讀取使用了另一種影響更低的lock,從而避免了本文以及bug中提到的死鎖.
對於環境中出現死鎖的這一組Percona MySQL 5.7.23-23半同步主從,咱們使用5.7.25-28版本進行了升級後此問題消失,至此能夠基本肯定致使問題的根本緣由.