騰訊雲CDB回檔失敗淺析

Ⅰ、先看問題

先簡單介紹下cdb的回檔功能,回檔分爲極速、快速、普通,分別對應指定表、指定庫、整個實例回檔。sql

控制檯報錯回檔任務執行失敗app

提示信息:rollback table failed:SQL thread error(1146):Error 'Table xxx doesn't exist' on query. Default database: xxx, Query: 'xxxxxx'

從字面意思看是某個表不存在致使sql線程中斷(和sql線程什麼關係? 這和回檔功能原理相關,此處先忽略)
這種回檔失敗只會存在於極速和快速兩種模式下,普通回檔不會有問題。函數

用大白話講,我控制檯選擇極速回檔a表,binlog中的記錄涉及到其餘表的操做,好比:delete from a where (select xxx from b);學習

臨時解決方案
選擇普通回檔,整個實例回檔,抽取須要的數據,固然這個過程就比較慢了測試

長期解決方案
設置binlog_format爲row,或者設置transaction_isolation爲read-committedui

Ⅱ、借題發揮

知道問題怎麼解決還遠遠不夠,爲何這麼設置就沒問題,咱們須要藉助這個問題回顧一下binlog相關知識點,事務隔離級別簡單提一下線程

binlog的做用

三個主要做用爲:複製、恢復、兩階段提交中擔任重要角色,保證主從數據一致性3d

binlog_format

statement:記錄原生sql
row:記錄每一行記錄的變動
mixed:默認記錄爲statement,特殊場景觸發格式切換爲row,如uuid等不肯定性函數日誌

statement格式優點爲日誌文件空間小,劣勢是主從一致性得不到保證
row格式優點爲較好地保證主從數據一致性,劣勢是日誌文件佔空間大
關於一致性這裏不展開說明,mixed格式基本能夠知足大部分業務場景,因此cdb默認爲mixed,可是這裏十分建議你們用rowcode

測試

測試數據與sql
這裏用了一個普通sql和一個跨表sql

數據
(root@localhost) [test]> select * from t;
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
|    4 | d    |
|    5 | e    |
+------+------+
5 rows in set (0.00 sec)

(root@localhost) [test]> select * from tt;
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
|    4 | d    |
|    5 | e    |
+------+------+
5 rows in set (0.00 sec)

sql
(root@localhost) [test]> delete from t where id in (select id from tt where id < 3);
Query OK, 2 rows affected (0.01 sec)

(root@localhost) [test]> delete from t where id = 5;
Query OK, 1 row affected (0.02 sec)

四種回檔場景

- read-committed repeatable-read
row
mixed ×

這個表可直觀地反應瞭如何解決回檔報錯的問題

分析binlog
不想看過程請直接拉到最下面看結論

transaction_isolation binlog_format
read-committed row
BEGIN
/*!*/;
# at 331
#190124 11:41:58 server id 1  end_log_pos 378 CRC32 0x80a164cd  Table_map: `test`.`t` mapped to number 108
# at 378
#190124 11:41:58 server id 1  end_log_pos 427 CRC32 0x7ee92842  Delete_rows: table id 108 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=1
###   @2='a'
### DELETE FROM `test`.`t`
### WHERE
###   @1=2
###   @2='b'
# at 427
#190124 11:41:58 server id 1  end_log_pos 458 CRC32 0x97b7e158  Xid = 26
COMMIT/*!*/;

BEGIN
/*!*/;
# at 595
#190124 11:42:06 server id 1  end_log_pos 642 CRC32 0xf0a5f266  Table_map: `test`.`t` mapped to number 108
# at 642
#190124 11:42:06 server id 1  end_log_pos 684 CRC32 0x42239094  Delete_rows: table id 108 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=5
###   @2='e'
# at 684
#190124 11:42:06 server id 1  end_log_pos 715 CRC32 0xb95abaf4  Xid = 27
COMMIT/*!*/;
transaction_isolation binlog_format
repeatable-read row
BEGIN
/*!*/;
# at 331
#190124 12:18:50 server id 1  end_log_pos 378 CRC32 0xc4d70096  Table_map: `test`.`t` mapped to number 108
# at 378
#190124 12:18:50 server id 1  end_log_pos 427 CRC32 0x6d794dea  Delete_rows: table id 108 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=1
###   @2='a'
### DELETE FROM `test`.`t`
### WHERE
###   @1=2
###   @2='b'
# at 427
#190124 12:18:50 server id 1  end_log_pos 458 CRC32 0x3f3946c1  Xid = 10
COMMIT/*!*/;

BEGIN
/*!*/;
# at 595
#190124 12:18:58 server id 1  end_log_pos 642 CRC32 0x1ecaec0b  Table_map: `test`.`t` mapped to number 108
# at 642
#190124 12:18:58 server id 1  end_log_pos 684 CRC32 0xda32a16e  Delete_rows: table id 108 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=5
###   @2='e'
# at 684
#190124 12:18:58 server id 1  end_log_pos 715 CRC32 0x4fa0b638  Xid = 11
COMMIT/*!*/;
transaction_isolation binlog_format
read-committed mixed
BEGIN
/*!*/;
# at 331
#190124 12:26:37 server id 1  end_log_pos 378 CRC32 0x6cac93f1  Table_map: `test`.`t` mapped to number 108
# at 378
#190124 12:26:37 server id 1  end_log_pos 427 CRC32 0x2ec3da0f  Delete_rows: table id 108 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=1
###   @2='a'
### DELETE FROM `test`.`t`
### WHERE
###   @1=2
###   @2='b'
# at 427
#190124 12:26:37 server id 1  end_log_pos 458 CRC32 0xa4d92d55  Xid = 24
COMMIT/*!*/;

BEGIN
/*!*/;
# at 595
#190124 12:26:42 server id 1  end_log_pos 642 CRC32 0xa2926b8d  Table_map: `test`.`t` mapped to number 108
# at 642
#190124 12:26:42 server id 1  end_log_pos 684 CRC32 0x05059ae7  Delete_rows: table id 108 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=5
###   @2='e'
# at 684
#190124 12:26:42 server id 1  end_log_pos 715 CRC32 0x86e936fe  Xid = 25
COMMIT/*!*/;
transaction_isolation binlog_format
repeatable-read mixed
BEGIN
/*!*/;
# at 338
#190124 12:36:35 server id 1  end_log_pos 470 CRC32 0xfb5e71cd  Query   thread_id=2 exec_time=0 error_code=0
use `test`/*!*/;
SET TIMESTAMP=1548304595/*!*/;
delete from t where id in (select id from tt where id < 3)
/*!*/;
# at 470
#190124 12:36:35 server id 1  end_log_pos 501 CRC32 0xb0ab1a2a  Xid = 10
COMMIT/*!*/;

BEGIN
/*!*/;
# at 645
#190124 12:36:42 server id 1  end_log_pos 745 CRC32 0x264f35c7  Query   thread_id=2 exec_time=0 error_code=0
SET TIMESTAMP=1548304602/*!*/;
delete from t where id = 5
/*!*/;
# at 745
#190124 12:36:42 server id 1  end_log_pos 776 CRC32 0x6eb54ec8  Xid = 11
COMMIT/*!*/;

咱們能夠看到前三種狀況binlog都是row格式,記錄每行記錄的變化,而最後一種狀況卻記錄了原生sql,就這個例子你只回檔t表,binlog要select tt表這樣就有問題了,而新購的騰訊雲cdb默認就是最後一種狀況,因此有必定機率出現回檔失敗

原生MySQL5.7默認爲row+repeatable-read,騰訊雲cdb默認爲mixed+repeatable-read

固然,看完幾塊binlog的話也能感受到mixed佔空間少不少,對不對?嘿嘿

問題來了?
到這裏咱們知道,說白了,只要是row格式的binlog就基本上不會有問題,那爲何說用read-committed的事務隔離級別也能夠解決問題呢,細心的同窗應該也發現了測試過程當中,事務隔離級別設置爲read-committed,binlog_format爲mixed,這個delete語句並無涉及不肯定函數等,但也被強行以row格式記錄下來了

If you are using InnoDB tables and the transaction isolation level is READ COMMITTED or READ UNCOMMITTED, only row-based logging can be used. 

官網摘錄,意思就是在read-committed的事務隔離級別下,binlog_format強行row,和你怎麼設置無關

至於MySQL爲何這麼作,這裏不作分析,能夠學習91洲際哥博客中事務相關章節

Ⅲ、小結

cdb回檔失敗,建議設置transaction-isolation爲read-committed或者設置binlog_format爲row

相關文章
相關標籤/搜索