select for update引起死鎖分析

本文針對MySQL InnoDB中在Repeatable Read的隔離級別下使用select for update可能引起的死鎖問題進行分析。html

1. 業務案例

業務中須要對各類類型的實體進行編號,例如對於x類實體的編號多是x201712120001,x201712120002,x201712120003相似於這樣。能夠觀察到這類編號有兩個部分組成:x+日期做爲前綴,以及流水號(這裏是四位的流水號)。java

若是用數據庫表實現一個可以分配流水號的需求,無外乎就能夠創建一個相似於下面的表:mysql

CREATE TABLE number (
  prefix VARCHAR(20) NOT NULL DEFAULT '' COMMENT '前綴碼',
  value BIGINT NOT NULL DEFAULT 0 COMMENT '流水號',
  UNIQUE KEY uk_prefix(prefix)
);

那麼在業務層,根據業務規則獲得編號的前綴好比x20171212,接下去就能夠在代碼中起事務,用select for update進行以下的控制。sql

@Transactional
long acquire(String prefix) {
    SerialNumber current = dao.selectAndLock(prefix);
    if (current == null) {
        dao.insert(new Record(prefix, 1));
        return 1;
    }
    else {
        current.number++;
        dao.update(current);
        return current.number;
    }
}

這段代碼作的事情其實就是加鎖篩選,有則更新,無則插入,然而在Repeatable Read的隔離級別下這段代碼是有潛在死鎖問題的。(另外一處與事務傳播行爲相關的問題也會在下文說起)。數據庫

2. 分析與解決

當能夠經過select for update的where條件篩出記錄時,上面的代碼是不會有deadlock問題的。然而當select for update中的where條件沒法篩選出記錄時,這時在有多個線程執行上面的acquire方法時是可能會出現死鎖的。session

2.1 一個簡單的復現場景

下面經過一個比較簡單的例子復現一下這個場景
首先給表裏初始化3條數據。併發

insert into number select 'bbb',2;
insert into number select 'hhh',8;
insert into number select 'yyy',25;

接着按照以下的時序進行操做:ui

session 1 session 2
begin;
begin;
select * from number where prefix='ddd' for update;
select * from number where prefix='fff' for update
insert into number select 'ddd',1
鎖等待中 insert into number select 'fff',1
鎖等待解除 死鎖,session 2的事務被回滾

2.2 分析下這個死鎖

經過查看show engine innodb status的信息,咱們慢慢地觀察每一步的狀況:spa

2.2.1 session1作了select for update

------------
TRANSACTIONS
------------
Trx id counter 238435
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479459588792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238434, ACTIVE 3 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost root
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;線程

事務238434拿到了hhh前的gap鎖,也就是('bbb', 'hhh')的gap鎖。

2.2.2 session2作了select for update

------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238435, ACTIVE 3 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost root
TABLE LOCK table test.number trx id 238435 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
---TRANSACTION 238434, ACTIVE 30 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost root
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;

事務238435也拿到了hhh前的gap鎖。


截自InnoDB的lock_rec_has_to_wait方法實現,能夠看到的LOCK_GAP類型的鎖只要不帶有插入意向標識,沒必要等待其它鎖(表鎖除外)

2.2.3 session1嘗試insert

------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238435, ACTIVE 28 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost root
TABLE LOCK table test.number trx id 238435 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
---TRANSACTION 238434, ACTIVE 55 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executing
insert into number select 'ddd',1
------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
------------------
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;

能夠看到,這時候事務238434在嘗試插入'ddd',1時,因爲發現其餘事務(238435)已經有這個區間的gap鎖,所以innodb給事務238434上了插入意向鎖,鎖的模式爲LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION,等待事務238435釋放掉gap鎖。


截取自InnoDB的lock_rec_insert_check_and_lock方法實現

2.2.4 session2嘗試insert

------------------------
LATEST DETECTED DEADLOCK
------------------------
2017-12-21 22:50:40 0x70001028a000
*** (1) TRANSACTION:
TRANSACTION 238434, ACTIVE 81 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executing
insert into number select 'ddd',1
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** (2) TRANSACTION:
TRANSACTION 238435, ACTIVE 54 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69159 localhost root executing
insert into number select 'fff',1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238435 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479459588792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238434, ACTIVE 84 sec
3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root
TABLE LOCK table test.number trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 646464; asc ddd;;
1: len 6; hex 00000003a362; asc b;;
2: len 7; hex de000001e60110; asc ;;
3: len 8; hex 8000000000000001; asc ;;
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table test.number trx id 238434 lock_mode X locks gap before rec insert intention
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;

到了這裏,咱們能夠從死鎖信息中看出,因爲事務238435在插入時也發現了事務238434的gap鎖,一樣加上了插入意向鎖,等待事務238434釋放掉gap鎖。所以出現死鎖的狀況。

2.3 debug it!

接下來經過debug MySQL的源碼來從新復現上面的場景。

這裏session2的事務4445加鎖的type_mode爲515,也即(LOCK_X | LOCK_GAP),與session1事務的鎖4444的gap鎖lock2->type_mode=547(LOCK_X | LOCK_REC | LOCK_GAP)的lock_mode是不兼容的(二者皆爲LOCK_X)。然而因爲type_mode知足LOCK_GAP且不帶有LCK_INSERT_INTENTION的標識位,這裏會斷定爲不須要等待。所以,第二個session執行select for update也一樣成功加上gap鎖了。



這裏sesion1事務4444執行insert時type_mode爲2563(LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION),因爲帶有LOCK_INSERT_INTENTION標識位,所以須要等待session2事務釋放4445的gap鎖。後續session1事務4444得到了一個插入意向鎖,而且在等待session2事務4445釋放gap鎖。




這裏session2事務4445一樣執行了insert操做,插入意向鎖須要等待session1的事務4444的gap鎖釋放。在死鎖檢測時,被探測到造成等待環。所以InnoDB會選擇一個事務做爲victim進行回滾。
其過程大體以下:

  1. session2嘗試獲取插入意向鎖,須要等待session1的gap鎖
  2. session1事務的插入意向鎖處於等待中
  3. session1事務插入意向鎖在等待session2的gap鎖
  4. 造成環路,檢測到死鎖

2.4 如何避免這個死鎖

咱們已經知道,這種狀況出現的緣由是:兩個session同時經過select for update,而且未命中任何記錄的狀況下,是有可能獲得相同gap的鎖的(要看where篩選條件是否落在同一個區間。若是上面的案例若是一個session準備插入'ddd'另外一個準備插入'kkk'則不會出現衝突,由於不是同一個gap)。此時再進行併發插入,其中一個會進入鎖等待,待第二個session進行插入時,會出現死鎖。MySQL會根據事務權重選擇一個事務進行回滾。

那麼如何避免這個狀況呢?
一種解決辦法是將事務隔離級別下降到Read Committed,這時不會有gap鎖,對於上述場景,若是where中條件不一樣即最終要插入的鍵不一樣,則不會有問題。若是業務代碼中可能不一樣線程會嘗試對相同鍵進行select for update,則可在業務代碼中捕獲索引衝突異常進行重試。
此外,上面代碼示例中的代碼還有一處值得注意的地方是事務註解@Transactional的傳播機制,對於這類與主流程事務關係不大的方法,應當將事務傳播行爲改成REQUIRES_NEW
緣由有兩點:

  1. 由於這裏的解決方案是對隔離級別降級,若是傳播行爲仍然是默認的話,在外層事務隔離級別不是RC的狀況下,會拋出IllegalTransactionStateException異常(在你的TransactionManager開啓了validateExistingTransaction校驗的狀況下)。
  2. 若是加入外層事務的話,某個線程在執行獲取流水號的時候可能會由於另外一個線程的與流水號不相關的事務代碼還沒執行完畢而阻塞。

3.參考

InnoDB手冊
數據庫內核月報 - 2016 / 01 MySQL InnoDB源碼

相關文章
相關標籤/搜索