【重溫mysql】七、死鎖與innodb狀態日誌

InnoDB 採起行級加鎖策略,雖然行級加鎖策略極大提升了程序的併發性能,但因爲鎖粒度的減少 InnoDB 執行某些操做的時候可能會同時佔用多個行鎖,加大了鎖衝突的機率;不一樣事務隔離級別對數據一致性要求不一樣,如RR級別下增長的gap鎖可能致使大量的鎖衝突;同時不當的業務設計也可能形成死鎖。死鎖會佔用系統資源阻塞請求,輕則影響 sql 執行效率拖慢系統,重則拖垮服務器致使服務不可用。所以咱們在使用過程當中須要儘可能避免死鎖的發生,在遇到死鎖時可以依據相關信息排查死鎖加以防範。html

死鎖機制

首先咱們來溫習一下致使死鎖的四個必要條件:node

  • 互斥,一個資源每次只能被一個進程使用。
  • 請求與佔用,一個進程因請求資源而阻塞時,對已得到的資源保持不放。
  • 不剝奪,進程已得到的資源,在該進程末使用完以前,不能強行剝奪。
  • 循環等待,若干進程之間造成一種頭尾相接的循環等待資源關係。

一樣,InnoDB 中的死鎖造成也須要知足上述四個條件。在 InnoDB 中形成死鎖,那麼必定是在多個線程同時操做同一份數據時纔會產生,即存在兩個或以上的事務,考慮以下場景,給定表test,表結構以下:mysql

CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
複製代碼

預置數據以下:git

id name
1 張三
2 李四
3 王五

如今按照以下操做,刪除兩條張三與李四兩條數據:github

序號 事務1 狀態 事務2 狀態
1 begain transaction;
2 delete from test where id = 1; 執行成功,持有id=1的X鎖
3 begain transaction;
4 delete from test where id = 2; 執行成功,持有id=2的X鎖
5 delete from test where id = 2; 等待事務2釋放鎖
6 delete from test where id = 1; 等待事務1釋放鎖。系統檢測到死鎖,回滾事務2
7 commit;
8 commit;

在上述操做過程當中,咱們構造了一個很是典型的死鎖條件,InnoDB 死鎖機制檢測到了死鎖,會選擇性回滾權重小的事務。經過語句能夠查詢死鎖狀態:算法

show engine innodb status;
複製代碼

從上述執行過程咱們能夠看出InnoDB 能夠檢測出死鎖,那麼 InnoDB 又是如何檢測死鎖的呢?

InnoDB 死鎖策略

InnoDB 有兩種死鎖處理方式:sql

  • 等待,直到超時(經過innodb_lock_wait_timeout參數控制)
  • 死鎖檢測,主動回滾一條事務,讓其餘事務繼續執行(innodb_deadlock_detect=on),該機制只適用於只有行鎖的情形

InnoDB 中的死鎖檢測是經過Waiting For Graph算法實現的。在 InnoDB 中,會記錄全部事務的鎖造成一個以事務爲頂點,鎖爲邊的有向圖。只需判斷該有向圖中是否存在迴環,便可知道有無死鎖存在。在上述死鎖場景中,事務1在等待事務2釋放ID=2的鎖,而事務2也在等待事務1釋放ID=1的鎖,造成了迴環結構,所以 InnoDB 判斷系統中存在死鎖。 數據庫

那麼假設存在不少個事務,事務的相互等待,造成了一條很是長的等待鏈的時候InnoDB 將會如何處理呢? 在這種狀況下,當 InnoDB 等待圖列表長度超出了 200 的時候,InnoDB 認爲當中出現了死鎖。 此外當等待鏈上事務持有鎖過多的時(須要對得到超過1,000,000個行鎖),InnoDB 也認爲當中出現了死鎖。bash

InnoDB 檢測到死鎖後,是如何釋放一個事務的呢?,InnoDB 經過事務權重來斷定事務重要程度,權重低的會釋放掉,事務權重通常是經過執行事務中的語句插入、更新、刪除語句影響數據條數得出的,事務語句影響數據條數越小,該事務權重越低。服務器

InnoDB 狀態分析

咱們能夠經過以下語句來查看系統中是否存在死鎖,如若存在死鎖,狀態日誌將顯示最後一次死鎖信息,能夠經過日誌粗略查看死鎖相關的事務與鎖狀況:

show engine innodb status;
複製代碼

一個標準的狀態日誌以下:

=====================================
2019-11-19 23:19:23 0x7fa850f58700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 48 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 852151 srv_active, 0 srv_shutdown, 4608208 srv_idle
srv_master_thread log flush and writes: 5459518
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2440006
OS WAIT ARRAY INFO: signal count 2411688
RW-shared spins 0, rounds 1948637, OS waits 972484
RW-excl spins 0, rounds 1702845, OS waits 160434
RW-sx spins 4496, rounds 134826, OS waits 4138
Spin rounds per wait: 1948637.00 RW-shared, 1702845.00 RW-excl, 29.99 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-11-19 23:19:08 0x7fa860156700
*** (1) TRANSACTION:
TRANSACTION 25321837, ACTIVE 29 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 2348540, OS thread handle 140360898553600, query id 66353526 163.125.229.225 root updating
delete from test where id = 2
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2266 page no 3 n bits 80 index PRIMARY of table `test`.`test` trx id 25321837 lock_mode X locks rec but not gap waiting
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 6; hex 000001826186; asc     a ;;
 2: len 7; hex 65000001250c88; asc e   %  ;;
 3: len 6; hex e69d8ee59b9b; asc       ;;

*** (2) TRANSACTION:
TRANSACTION 25321862, ACTIVE 15 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 2348917, OS thread handle 140361143248640, query id 66353634 163.125.229.225 root updating
delete from test where id = 1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2266 page no 3 n bits 80 index PRIMARY of table `test`.`test` trx id 25321862 lock_mode X locks rec but not gap
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
 0: len 4; hex 80000002; asc     ;;
 1: len 6; hex 000001826186; asc     a ;;
 2: len 7; hex 65000001250c88; asc e   %  ;;
 3: len 6; hex e69d8ee59b9b; asc       ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2266 page no 3 n bits 80 index PRIMARY of table `test`.`test` trx id 25321862 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 00000182616d; asc     am;;
 2: len 7; hex 58000001412fef; asc X   A/ ;;
 3: len 6; hex e5bca0e4b889; asc       ;;

*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 25321913
Purge done for trx's n:o < 25321911 undo n:o < 0 state: running but idle History list length 35 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 421836854896032, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854885088, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854895120, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854891472, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854888736, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854886912, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854900592, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854898768, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854893296, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854892384, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854890560, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854886000, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421836854883264, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 25321912, ACTIVE 0 sec 5 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2 MySQL thread id 2346328, OS thread handle 140360896157440, query id 66354003 47.98.152.73 root Trx read view will not see trx with id >= 25321911, sees < 25321837 ---TRANSACTION 25321837, ACTIVE 44 sec 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2 MySQL thread id 2348540, OS thread handle 140360898553600, query id 66353526 163.125.229.225 root -------- FILE I/O -------- I/O thread 0 state: waiting for completed aio requests (insert buffer thread) I/O thread 1 state: waiting for completed aio requests (log thread) I/O thread 2 state: waiting for completed aio requests (read thread) I/O thread 3 state: waiting for completed aio requests (read thread) I/O thread 4 state: waiting for completed aio requests (read thread) I/O thread 5 state: waiting for completed aio requests (read thread) I/O thread 6 state: waiting for completed aio requests (write thread) I/O thread 7 state: waiting for completed aio requests (write thread) I/O thread 8 state: waiting for completed aio requests (write thread) I/O thread 9 state: waiting for completed aio requests (write thread) Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] , ibuf aio reads:, log i/o's:, sync i/o's: Pending flushes (fsync) log: 0; buffer pool: 0 846404 OS file reads, 16785073 OS file writes, 5999964 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 8.37 writes/s, 3.17 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 185, seg size 187, 28201 merges merged operations: insert 528, delete mark 7663041, delete 0 discarded operations: insert 0, delete mark 0, delete 0 Hash table size 138389, node heap has 31 buffer(s) Hash table size 138389, node heap has 2 buffer(s) Hash table size 138389, node heap has 3983 buffer(s) Hash table size 138389, node heap has 218 buffer(s) Hash table size 138389, node heap has 5 buffer(s) Hash table size 138389, node heap has 1 buffer(s) Hash table size 138389, node heap has 2 buffer(s) Hash table size 138389, node heap has 5 buffer(s) 139.08 hash searches/s, 4.46 non-hash searches/s --- LOG --- Log sequence number 24370884626 Log flushed up to 24370882478 Pages flushed up to 24370875924 Last checkpoint at 24370875333 0 pending log flushes, 0 pending chkp writes 9315962 log i/o's done, 2.31 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 549715968 Dictionary memory allocated 10240009 Buffer pool size 32764 Free buffers 1024 Database pages 27493 Old database pages 10128 Modified db pages 34 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 964069, not young 143276717 0.00 youngs/s, 0.00 non-youngs/s Pages read 845408, created 77961, written 6649614 0.00 reads/s, 0.04 creates/s, 5.40 writes/s Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000 Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 27493, unzip_LRU len: 0 I/O sum[276]:cur[1], unzip sum[0]:cur[0] -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 1 read views open inside InnoDB Process ID=1773, Main thread ID=140361173219072, state: sleeping Number of rows inserted 19006577, updated 853025, deleted 5584787, read 210247854332 0.85 inserts/s, 0.83 updates/s, 0.06 deletes/s, 21633.22 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 複製代碼

InnoDB 標準監控信息一共包含了輸出頭主線程負載信號量死鎖信息事務信息IO相關信息插入緩衝和自適應哈希索引相關信息日誌信息緩衝池和內存使用狀況行操做信息。經過這些內部狀態信息咱們能夠了解到InnoDB 內部實際的負載狀況,對於數據庫運維有着重要的做用。

輸出頭

標準監控信息輸出頭包含以下幾個部分,顯示了當前監控的查詢時間、監控名稱、本次輸出與上次輸出時間間隔秒數。

後臺線程信息

該部分顯示了後臺 innodb 主線程的工做負載狀況。通常來講,srv_active 越大,srv_idle 越小代表當前mysql數據庫壓力不大。

信號量信息

信號量部分統計了線程空轉次數以及等待獲取互斥鎖或讀寫鎖信號量的次數。這裏的互斥鎖和讀寫鎖指的是mysql中的Latch,主要目的爲保證併發線程操做臨界資源的正確性。若是有大量線程在等待信號量,多是因爲磁盤 IO 性能瓶頸或 InnoDB 內部資源爭奪激烈致使,而InnoDB 內部資源通常是因爲並行查詢過多或操做系統線程調度發生了問題。如若 os_waits 次數較高,代表 latch 爭奪比較頻繁。

最新死鎖信息

該部分顯示了最後一次發生死鎖時的死鎖信息。死鎖信息裏面有一些很是有用的信息如加鎖與等待鎖的語句、加鎖使用的索引等信息,但遺憾的是該處日誌並不會顯示持有和等待的鎖,這對於多個線程引起的死鎖來說並非那麼容易分析。

事務信息

事務信息顯示了當前innodb中前一段時間的相關事務狀況,有助於咱們排查死鎖。

文件IO信息

該部分顯示了後臺線程的IO狀況。

插入緩衝與自適應哈希索引信息

該部分顯示了插入緩衝執行狀況與自適應哈希索引相關信息。

日誌信息

日誌信息顯示了重作日誌的當前狀況。

緩衝池與內存信息

這部分顯示了緩衝池中的頁讀寫統計及其內存使用統計。

行操做信息

這部分顯示了主線程正在作什麼,如各類類型的行操做的數量和性能統計。

如何避免死鎖

爲了減少死鎖對系統的影響,咱們應該儘量避免死鎖。一般有以下技巧能夠用於避免死鎖:

  • 使用事務,儘可能不用lock tables 此類表鎖,存在多種鎖類型的狀況下,InnoDB 沒法進行死鎖檢測,只能等待死鎖超時。
  • 儘可能減少事務執行時間,避免長事務,長事務可能持有鎖時間過長,更易引起死鎖。
  • 下降隔離級別,RC 級別下不存在間隙鎖,將極大減少死鎖產生的機率。
  • MDL語句修改多個表或或行數據的時候,須要保持操做順序一致,以避免不恰當的加鎖過程產生死鎖。
  • MDL語句儘可能使用主鍵或索引,InnoDB 加鎖是對索引進行加鎖,使用主鍵或索引能夠下降鎖衝突的機率。
  • 如若因爲業務須要沒法避免死鎖,能夠考慮經過表鎖或分佈式鎖加以控制。

對於 InnoDB 來講死鎖成因很是複雜,須要具體問題具體分析,在《常見的 MySQL 死鎖案例分析》一文的做者已經對常見的一些死鎖場景進行了收集整理,有興趣的同窗能夠查閱加深對死鎖的理解與分析。

感謝

相關文章
相關標籤/搜索