MySQL 死鎖套路:一次詭異的批量插入死鎖問題分析

線上最近出現了批量insert的死鎖,百思不得姐。死鎖記錄以下html

2018-10-26T11:04:41.759589Z 8530809 [Note] InnoDB: 
*** (1) TRANSACTION:

TRANSACTION 1202026765, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 8532863, OS thread handle 139858337453824, query id 16231472122 10.111.10.143 seewo update
INSERT IGNORE INTO xx_performance_type_label_relation(label_id, performance_type_id, type, create_time)
    VALUES
      
      ('bb0394e670644168a998a93a3ed521bc', '06b96ee0bab84d71bb17bf9645d3aa54', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '27d82e2331b241e1a9c9c0a74ec21099', -1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '3100b5978fb24f56b327d25732a7d7a7', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '435a1e19ce6e4e5bbb84240b3b34cf03', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '447fe27199ca40e289ef2834469d9a78', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '87a52c4d00844b5bb9eb75e8fe34202a', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', 'c6a0e26983bd4fae837d5ee2f4efeef8', 1, now())
2018-10-26T11:04:41.759635Z 8530809 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026765 lock_mode X locks gap before rec insert intention waiting
2018-10-26T11:04:41.759674Z 8530809 [Note] InnoDB: *** (2) TRANSACTION:

TRANSACTION 1202026764, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 8530809, OS thread handle 139858469242624, query id 16231472119 10.111.10.153 seewo update
INSERT IGNORE INTO xx_performance_type_label_relation(label_id, performance_type_id, type, create_time)
    VALUES
      
      ('bb0394e670644168a998a93a3ed521bc', '06b96ee0bab84d71bb17bf9645d3aa54', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '27d82e2331b241e1a9c9c0a74ec21099', -1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '3100b5978fb24f56b327d25732a7d7a7', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '435a1e19ce6e4e5bbb84240b3b34cf03', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '447fe27199ca40e289ef2834469d9a78', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '87a52c4d00844b5bb9eb75e8fe34202a', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', 'c6a0e26983bd4fae837d5ee2f4efeef8', 1, now())
2018-10-26T11:04:41.759713Z 8530809 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026764 lock mode S
2018-10-26T11:04:41.759753Z 8530809 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026764 lock_mode X locks gap before rec insert intention waiting
2018-10-26T11:04:41.759784Z 8530809 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)
複製代碼

第一反應是批量insert,insert的順序不同致使的死鎖。可是這個在這裏是不成立的。緣由有兩點java

  1. 出現問題的批量插入SQL中順序是如出一轍的,在順序同樣的狀況下,只會進行插入等待(implicit lock轉explicit X鎖)下面有實驗
  2. 若是是由於批量插入順序不一致帶來的死鎖日誌,打印的結果不是等待插入意向鎖(insert intention waiting),下面有實驗

如今採用一個簡化的表,作實驗mysql

CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(5)  NOT NULL DEFAULT '',
  `b` varchar(5)  NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_name` (`a`,`b`)
) ENGINE=InnoDB;
複製代碼

實驗 01

在記錄不存在的狀況下,兩個一樣順序的批量insert同時執行,第二個會進行鎖等待狀態sql

首先truncate t1;bash

t1 t2
begin; begin;
insert ignore into t1(a, b)values("1", "1"); 成功
insert ignore into t1(a, b)values("1", "1"); 鎖等待狀態

能夠看到目前鎖的狀態ui

mysql> select * from information_schema.innodb_locks;
+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+
| lock_id     | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+
| 31AE:54:4:2 | 31AE        | S         | RECORD    | `d1`.`t1`  | `uk_name`  |         54 |         4 |        2 | '1', '1'  |
| 31AD:54:4:2 | 31AD        | X         | RECORD    | `d1`.`t1`  | `uk_name`  |         54 |         4 |        2 | '1', '1'  |
+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+
複製代碼

在咱們執行事務t1的insert時,沒有在任何鎖的斷點處出現,這跟MySQL插入的原理有關係spa

insert 加的是隱式鎖。什麼是隱式鎖?隱式鎖的意思就是沒有鎖線程

在t1插入記錄時,是不加鎖的。這個時候事務t1還未提交的狀況下,事務t2嘗試插入的時候,發現有這條記錄,t2嘗試獲取S鎖,會斷定記錄上的事務id是否活躍,若是活躍的話,說明事務未結束,會幫t1把它的隱式鎖提高爲顯式鎖(X鎖)debug

源碼以下 日誌

t2獲取S鎖的結果:DB_LOCK_WAIT

實驗02

批量插入順序不一致的致使的死鎖日誌不是等待插入意向鎖

t1 t2
begin
insert into t1(a, b)values("1", "1"); 成功
insert into t1(a, b)values("2", "2"); 成功
insert into t1(a, b)values("2", "2"); t1嘗試獲取S鎖,把t2的隱式鎖提高爲顯式X鎖,進入DB_LOCK_WAIT
insert into t1(a, b)values("1", "1"); t2嘗試獲取S鎖,把t1的隱式鎖提高爲顯式X鎖,產生死鎖
------------------------
LATEST DETECTED DEADLOCK
------------------------
181101  9:48:36
*** (1) TRANSACTION:
TRANSACTION 3309, ACTIVE 215 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2
MySQL thread id 2, OS thread handle 0x70000a845000, query id 58 localhost root update
insert into t1(a, b)values("2", "2")
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 3309 lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 32; asc 2;;
 1: len 1; hex 32; asc 2;;
 2: len 4; hex 80000002; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 330A, ACTIVE 163 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2
MySQL thread id 3, OS thread handle 0x70000a888000, query id 59 localhost root update
insert into t1(a, b)values("1", "1")
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 330A lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 32; asc 2;;
 1: len 1; hex 32; asc 2;;
 2: len 4; hex 80000002; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 330A lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 31; asc 1;;
 1: len 1; hex 31; asc 1;;
 2: len 4; hex 80000001; asc     ;;

*** WE ROLL BACK TRANSACTION (2)
複製代碼

到目前爲止,已經陷入了僵局,徹底無法復現死鎖的狀況。看了代碼,發如今insert以前有一個delete,可是delete與insert不在一個事務裏面,也就是delete提交之後,才進行批量insert,真正出問題的地方在批量insert的地方。一開始就排除了delete對後面的影響,難道不在一個事務,也會有影響?

寫了一個代碼去模擬,有很大機率會復現

fun test() {
    dao.delete() // 對應delete from
    // sleep for 10ms
    dao.insert() // 對應insert ignore
}
複製代碼

對應的 SQL 以下,注意是兩個事務

begin;
delete from t1 where a = '25'
commit;

begin;
INSERT ignore INTO `t1` (`a`, `b`) VALUES('25','1')
commit;
複製代碼

這個代碼在兩個線程同時調用的時候,很是容易死鎖。

後來翻遍了網上相關的死鎖案例,有一個關於purge刪除的過程可能跟這個有關係。

若是標記爲刪除,說明事務已經提交,還沒來得及 purge,這時後面的事務加S鎖等待;

在源碼中打印一些日誌。 1.在storage/innobase/row/row0ins.crow_ins_set_shared_rec_lock增長日誌,能夠看到對惟一索引增長S鎖的過程

if (dict_index_is_clust(index)) {
    err = lock_clust_rec_read_check_and_lock(
        0, block, rec, index, offsets, LOCK_S, type, thr);
} else {
    err = lock_sec_rec_read_check_and_lock(
        0, block, rec, index, offsets, LOCK_S, type, thr);
    // 增長以下日誌
    fprintf(stderr, "row_ins_set_shared_rec_lock %s %lu %d\n" , index->name, type, err);
}
複製代碼

2.在lock_rec_enqueue_waiting增長日誌,能夠看到鎖等待的狀況

static
enum db_err
lock_rec_enqueue_waiting(
{
	fprintf(stderr, "lock_rec_enqueue_waiting::::: %s %lu\n" , index->name, type_mode);
}
複製代碼

日誌大概以下

row_ins_set_shared_rec_lock uk_name 0 9 (t1獲取S鎖成功)
row_ins_set_shared_rec_lock uk_name 0 9 (t2獲取S鎖成功)

lock_rec_enqueue_waiting::::: uk_name 2563(t1 X鎖進如鎖等待)
lock_rec_enqueue_waiting::::: uk_name 2563(t2 X鎖進如鎖等待)
複製代碼

其中2563=2048+512+3=LOCK_INSERT_INTENTION+LOCK_GAP+LOCK_X

這個過程跟很是經典的三個事務同時insert,一個回滾,剩下的兩個事務一個成功,一個死鎖,實際上是如出一轍的原理。

實驗03

三個 insert ignore,一個回滾形成的死鎖

insert語句都是insert ignore into t1(a, b)values("1", "1");如下省略

t1 t2 t3 備註
begin begin begin
insert 成功
insert 把t1的隱式鎖提高爲X鎖,t2進入進入S鎖等待
insert t3進入進入S鎖等待
rollback; t1回滾之後,釋放X鎖,t2和t3同時拿到了S鎖
ok deadlock t2和t3都想拿插入意向鎖X鎖,形成死鎖條件

死鎖日誌,跟咱們案例中的如出一轍

------------------------
LATEST DETECTED DEADLOCK
------------------------
181101 23:22:59
*** (1) TRANSACTION:
TRANSACTION 5032, ACTIVE 11 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
MySQL thread id 5, OS thread handle 0x70000d736000, query id 125 localhost root update
insert ignore into t1(a, b)values("1", "1")
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5032 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 3; hex 313031; asc 101;;
 1: len 3; hex 313031; asc 101;;
 2: len 4; hex 800007b1; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 5033, ACTIVE 6 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
MySQL thread id 6, OS thread handle 0x70000d779000, query id 126 localhost root update
insert ignore into t1(a, b)values("1", "1")
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5033 lock mode S locks gap before rec
Record lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 3; hex 313031; asc 101;;
 1: len 3; hex 313031; asc 101;;
 2: len 4; hex 800007b1; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5033 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 3; hex 313031; asc 101;;
 1: len 3; hex 313031; asc 101;;
 2: len 4; hex 800007b1; asc     ;;

*** WE ROLL BACK TRANSACTION (2)
複製代碼

目前來看,獲得的結論是:

一個已提交可是未purge掉的記錄會形成後續插入獲取S共享鎖,兩個事務同時獲取S鎖,而後嘗試獲取插入意向鎖,形成死鎖

網上大神梳理的insert流程

  • 首先對插入的間隙加插入意向鎖(Insert Intension Locks)

    • 若是該間隙已被加上了 GAP 鎖或 Next-Key 鎖,則加鎖失敗進入等待;
    • 若是沒有,則加鎖成功,表示能夠插入;
  • 而後判斷插入記錄是否有惟一鍵,若是有,則進行惟一性約束檢查

    • 若是不存在相同鍵值,則完成插入
    • 若是存在相同鍵值,則判斷該鍵值是否加鎖
      • 若是沒有鎖, 判斷該記錄是否被標記爲刪除
      • 若是標記爲刪除,說明事務已經提交,還沒來得及 purge,這時加 S 鎖等待;
      • 若是沒有標記刪除,則報 1062 duplicate key 錯誤;
    • 若是有鎖,說明該記錄正在處理(新增、刪除或更新),且事務還未提交,加 S 鎖等待;
  • 插入記錄並對記錄加 X 記錄鎖;

上面是我 debug 源碼獲得的一些結論,若是個人理解有誤的話,記得留言告訴我

如下爲參考文檔

相關文章
相關標籤/搜索