MySQL 5.7 基於複製線程SQL_Thread加快恢復的嘗試

1. MySQL 數據恢復經常使用辦法

MySQL恢復的方法通常有三種:mysql

1. 官方推薦的基於全備+binlog , 一般作法是先恢復最近一次的全備,而後經過mysqlbiinlog --start-position --stop-position binlog.000xxx | mysql -uroot -p xxx -S database 恢復到目標數據庫作恢復sql

2. 基於主從同步恢復數據,一般作法是先恢復最近一次的全備,而後恢復後的實例作slave 掛載到現有的master 上面,經過 start slave sql_thread until master_log_pos 恢復到故障前的一個pos。數據庫

如今嘗試第三種恢復方式, 經過原來主庫上面的binlog 把數據都恢復到slave 上。bash

處理思路: lua

由於relaylog和binlog本質其實是同樣的,因此是否能夠利用MySQL自身的sql_thread來增量binlogspa

    1)從新初始化一個實例,恢復全量備份文件。
    2)找到第一個binlog文件的position,和剩下全部的binlog。
    3)將binlog假裝成relaylog,經過sql thread增量恢復。rest

 

應用場景:日誌

1. 最近的一次全備離故障位置比較遠,經過上面兩種方式的恢復時間太慢orm

2. 雙主keepalived的集羣,因爲keepalived沒有像MHA 那樣有日誌補全機制,出故障是有可能會有數據丟失的,萬一同步有嚴重的複製延時出現故障切換到slave,這樣數據就不一致,須要作日誌補全blog

 

2. 實驗步驟

1. 創建基於主從同步(這裏實驗基於傳統的pos, 其實GTID 也同樣可行)

M1 :

root@localhost:mysql3307.sock [(none)]>select * from restore.t1;
+----+------+
| id | c1   |
+----+------+
|  1 | 1    |
|  2 | 3    |
|  3 | 2    |
|  4 | 3    |
|  5 | 6    |
|  6 | 7    |
|  7 | 9    |
| 10 | NULL |
| 11 | 10   |
+----+------+
9 rows in set (0.00 sec)

 M2:(slave)

root@localhost:mysql3307.sock [(none)]>select * from restore.t1;
+----+------+
| id | c1   |
+----+------+
|  1 | 1    |
|  2 | 3    |
|  3 | 2    |
|  4 | 3    |
|  5 | 6    |
|  6 | 7    |
|  7 | 9    |
| 10 | NULL |
| 11 | 10   |
+----+------+
9 rows in set (0.00 sec)

  

root@localhost:mysql3307.sock [restore]>show slave status\G	
*************************** 1. row ***************************	
               Slave_IO_State: Waiting for master to send event	
                  Master_Host: m1	
                  Master_User: repl	
                  Master_Port: 3307	
                Connect_Retry: 60	
              Master_Log_File: 3307-binlog.000002	
          Read_Master_Log_Pos: 154	
               Relay_Log_File: M2-relay-bin.000004	
                Relay_Log_Pos: 371	
        Relay_Master_Log_File: 3307-binlog.000002	
             Slave_IO_Running: Yes	
            Slave_SQL_Running: Yes	
              Replicate_Do_DB: 	
          Replicate_Ignore_DB: 	
           Replicate_Do_Table: 	
       Replicate_Ignore_Table: 	
      Replicate_Wild_Do_Table: 	
  Replicate_Wild_Ignore_Table: 	
                   Last_Errno: 0	
                   Last_Error: 	
                 Skip_Counter: 0	
          Exec_Master_Log_Pos: 154	
              Relay_Log_Space: 624	
              Until_Condition: None	
               Until_Log_File: 	
                Until_Log_Pos: 0	
           Master_SSL_Allowed: No	
           Master_SSL_CA_File: 	
           Master_SSL_CA_Path: 	
              Master_SSL_Cert: 	
            Master_SSL_Cipher: 	
               Master_SSL_Key: 	
        Seconds_Behind_Master: 0	
Master_SSL_Verify_Server_Cert: No	
                Last_IO_Errno: 0	
                Last_IO_Error: 	
               Last_SQL_Errno: 0	
               Last_SQL_Error: 	
  Replicate_Ignore_Server_Ids: 	
             Master_Server_Id: 13307	
                  Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3	
             Master_Info_File: /data/mysql/3307/data/master.info	
                    SQL_Delay: 0	
          SQL_Remaining_Delay: NULL	
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates	
           Master_Retry_Count: 86400	
                  Master_Bind: 	
      Last_IO_Error_Timestamp: 	
     Last_SQL_Error_Timestamp: 	
               Master_SSL_Crl: 	
           Master_SSL_Crlpath: 	
           Retrieved_Gtid_Set: 	
            Executed_Gtid_Set: 	
                Auto_Position: 0	
         Replicate_Rewrite_DB: 	
                 Channel_Name: 	
           Master_TLS_Version: 	
1 row in set (0.00 sec)	

 記錄此時slave 的 relay-log 信息

[root@M2 data]# more M2-relay-bin.index 
./M2-relay-bin.000003
./M2-relay-bin.000004

[root@M2 data]# more relay-log.info 
7
./M2-relay-bin.000004
371
3307-binlog.000002
154
0
0
1

 2. 使用sysbench 模擬數據不一樣步

[root@M1 logs]# mysqladmin create sbtest
[root@M1 sysbench]# sysbench --db-driver=mysql --mysql-host=m1 --mysql-port=3307 --mysql-user=sbtest --mysql-password='sbtest' /usr/share/sysbench/oltp_common.lua --tables=4 --table-size=100000 --threads=2 --time=60 --report-interval=10 prepare

  在主庫導入數據的時候在slave端中止同步,製造數據不一致

root@localhost:mysql3307.sock [mysql]>stop slave

 3. 等sysbench執行完,查看主庫的數據和slave 的數據

主庫:

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest1;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest2;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

  slave 端:

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4;
+----------+
| count(1) |
+----------+
|    67550 |
+----------+
1 row in set (0.06 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3;
+----------+
| count(1) |
+----------+
|    70252 |
+----------+
1 row in set (0.04 sec)

  能夠看到主從不一樣步。

4. 此時查看slave 的status:

root@localhost:mysql3307.sock [(none)]>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: m1
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: 3307-binlog.000002
          Read_Master_Log_Pos: 76364214
               Relay_Log_File: M2-relay-bin.000004
                Relay_Log_Pos: 64490301
        Relay_Master_Log_File: 3307-binlog.000002
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
         Exec_Master_Log_Pos: 64490084
              Relay_Log_Space: 76364861
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3
             Master_Info_File: /data/mysql/3307/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

 因爲本地的relay log 沒有執行完畢,爲了保證明驗準確性,咱們先讓本地的relaylog 執行完 , start slave sql_thread 

再次檢查:

*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: m1
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: 3307-binlog.000002
          Read_Master_Log_Pos: 76364214
               Relay_Log_File: M2-relay-bin.000005
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 3307-binlog.000002
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 76364214
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: afeab8d6-b871-11e7-9b2a-005056b643b3
             Master_Info_File: /data/mysql/3307/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

  本地relaylog 已經所有執行完畢,此時記錄最新的relay log 信息:

[root@M2 data]# more relay-log.info 
7
./M2-relay-bin.000005
4
3307-binlog.000002 76364214
0
0
1

0
0
1

  上面這個信息很重要,說明了從庫執行到主庫的000002 的binlog的76364214 這個位置,咱們下面將主庫的binlog 拷貝過來模擬relaylog, 並從這個位置開始恢復

5. 拷貝binlog 到目標端,並模擬成relay log

拷貝前先關閉從庫,並修改cnf (skip-slave-start)讓slave 不會重啓後自動開始複製 

[root@M2 data]# ll
total 185248
-rw-r----- 1 root root 461 Oct 24 17:14 3307-binlog.000001 -rw-r----- 1 root root 76364609 Oct 24 17:14 3307-binlog.000002 -rw-r----- 1 root root 203 Oct 24 17:14 3307-binlog.000003 -rw-r----- 1 root root 419 Oct 24 17:14 3307-binlog.000004 -rw-r----- 1 root root 164 Oct 24 17:14 3307-binlog.index
-rw-r----- 1 mysql mysql       56 Oct 24 15:08 auto.cnf
-rw-r----- 1 mysql mysql     4720 Oct 24 17:14 ib_buffer_pool
-rw-r----- 1 mysql mysql 12582912 Oct 24 17:14 ibdata1
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:14 ib_logfile0
-rw-r----- 1 mysql mysql 50331648 Oct 24 17:11 ib_logfile1
-rw-r----- 1 mysql mysql      177 Oct 24 17:14 M2-relay-bin.000005
-rw-r----- 1 mysql mysql       22 Oct 24 17:11 M2-relay-bin.index
-rw-r----- 1 mysql mysql      122 Oct 24 17:14 master.info
drwxr-x--- 2 mysql mysql     4096 Oct 24 15:07 mysql
-rw------- 1 root  root         0 Oct 24 15:08 nohup.out
drwxr-x--- 2 mysql mysql     4096 Oct 24 15:07 performance_schema
-rw-r----- 1 mysql mysql       68 Oct 24 17:14 relay-log.info
drwxr-x--- 2 mysql mysql     4096 Oct 24 15:07 restore
drwxr-x--- 2 mysql mysql     4096 Oct 24 16:47 sbtest
drwxr-x--- 2 mysql mysql    12288 Oct 24 15:07 sys
-rw-r----- 1 mysql mysql       24 Oct 24 15:07 xtrabackup_binlog_pos_innodb
-rw-r----- 1 mysql mysql      577 Oct 24 15:07 xtrabackup_info

 更名爲relay log

[root@M2 data]# cp 3307-binlog.000001 relay.000001
[root@M2 data]# cp 3307-binlog.000002 relay.000002
[root@M2 data]# cp 3307-binlog.000003 relay.000003
[root@M2 data]# cp 3307-binlog.000004 relay.000004 
改權限屬性
[root@M2 data]# chown mysql.mysql -R *

 修改relay log index 文件,讓系統能識別

[root@M2 data]# cat M2-relay-bin.index	
	./relay.000001
	./relay.000002
	./relay.000003
	./relay.000004

 修改relay log info 文件,告訴系統從哪一個位置開始複製

[root@M2 data]# cat relay-log.info	
	7
	./relay.000002 76364214
	3307-binlog.000002
	76364214
	0
	0
	1
	
	0
	0
	1

 最後開起sql_thread 進程開始快速恢復

start slave sql_thread

 6. 檢查數據是否一致

slave:

oot@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest4;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

root@localhost:mysql3307.sock [sbtest]>select count(1) from sbtest3;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.05 sec)

 能夠看到slave 已經把缺失的數據都所有恢復了。 

相關文章
相關標籤/搜索