GTID環境中手動修復主從故障一例(Error 1146)

 
Preface
 
    In my last test of pt-heartbeat,both of master and slave were out of disk.And the mysql client was hang.In order to resolve the issue,I've tryed to fix the replicaiton environment without using mysqldump to reconfigure the slave.Let's see the details.
 
Procedure
 
I dropped test tables in database "sysbench" to release the disk space on master.
 1 [root@zlm2 08:56:51 /data/mysql/mysql3306/logs]  2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=1 --table-size=10000000 --mysql-storage-engine=innodb cleanup  3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)  4 
 5 Dropping table 'sbtest1'...  6 
 7 (zlm@192.168.1.101 3306)[(none)]>use sysbench;  8 Reading table information for completion of table and column names  9 You can turn off this feature to get a quicker startup with -A 10 
11 Database changed 12 (zlm@192.168.1.101 3306)[sysbench]>show tables; 13 +--------------------+
14 | Tables_in_sysbench |
15 +--------------------+
16 | hb                 |
17 | sbtest2            |
18 | sbtest3            |
19 | sbtest4            |
20 | sbtest5            |
21 +--------------------+
22 5 rows in set (0.00 sec) 23 
24 //Only sbtest1 was deleted.It's not enough.
25 
26 [root@zlm2 08:59:04 ~/sysbench-1.0/src/lua] 27 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=5 --table-size=10000000 --mysql-storage-engine=innodb cleanup 28 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2) 29 
30 Dropping table 'sbtest1'... 31 Dropping table 'sbtest2'... 32 Dropping table 'sbtest3'... 33 Dropping table 'sbtest4'... 34 Dropping table 'sbtest5'... 35 
36 [root@zlm2 08:59:09 ~/sysbench-1.0/src/lua] 37 #df -h 38 Filesystem               Size  Used Avail Use% Mounted on 39 /dev/mapper/centos-root  8.4G  6.9G  1.5G  83% / //I'd got 27% free space.
40 devtmpfs                 488M     0  488M   0% /dev 41 tmpfs                    497M     0  497M   0% /dev/shm 42 tmpfs                    497M  6.6M  491M   2% /run 43 tmpfs                    497M     0  497M   0% /sys/fs/cgroup 44 /dev/sda1                497M  118M  379M  24% /boot 45 none                      87G   80G  6.6G  93% /vagrant 46 
47 (zlm@192.168.1.101 3306)[(none)]>drop database sysbench; 48 Query OK, 1 row affected (0.04 sec) 49 
50 //Further more,I dropped the "sysbench".

 

The slave hung still and disk space was full.
 1 [root@zlm3 08:55:17 ~]  2 #df -h  3 Filesystem               Size  Used Avail Use% Mounted on  4 /dev/mapper/centos-root  8.4G  8.4G   20K 100% /
 5 devtmpfs                 488M     0  488M   0% /dev  6 tmpfs                    497M     0  497M   0% /dev/shm  7 tmpfs                    497M  6.5M  491M   2% /run  8 tmpfs                    497M     0  497M   0% /sys/fs/cgroup  9 /dev/sda1                497M  118M  379M  24% /boot 10 none                      87G   80G  6.6G  93% /vagrant 11 
12 (zlm@192.168.1.102 3306)[(none)]>show slave status\G 13 ^C^C -- query aborted 14 
15 
16 ^Z 17 [6]+ Stopped mysql 18 
19 [root@zlm3 08:55:58 ~] 20 #pkill mysqld 21 
22 [root@zlm3 08:56:04 ~] 23 #./mysqld.sh 
24 
25 [root@zlm3 08:56:07 ~] 26 #mysql 27 ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.1.102' (111)
28 
29 [root@zlm3 09:00:35 ~] 30 #cd /data/mysql/mysql3306/data 31 
32 [root@zlm3 09:00:46 /data/mysql/mysql3306/data] 33 #cat error.log |tail -n 30
34 2018-07-19T08:57:02.581937+01:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 17039436409
35 2018-07-19T08:57:02.581958+01:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 17039436418
36 2018-07-19T08:57:02.581963+01:00 0 [Note] InnoDB: Database was not shutdown normally!
37 2018-07-19T08:57:02.581965+01:00 0 [Note] InnoDB: Starting crash recovery. 38 2018-07-19T08:57:02.696292+01:00 0 [Note] InnoDB: Transaction 31700139 was in the XA prepared state. 39 2018-07-19T08:57:02.700688+01:00 0 [Note] InnoDB: Transaction 31700139 was in the XA prepared state. 40 2018-07-19T08:57:02.700814+01:00 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 41 2018-07-19T08:57:02.700821+01:00 0 [Note] InnoDB: Trx id counter is 31700480
42 2018-07-19T08:57:02.701719+01:00 0 [Note] InnoDB: Last MySQL binlog file position 0 99139927, file name mysql-bin.000016
43 2018-07-19T08:57:02.805965+01:00 0 [Note] InnoDB: Ignoring tablespace `zlm`.`sbtest2` because the DISCARD flag is set . 44 2018-07-19T08:57:02.806462+01:00 0 [Note] InnoDB: Creating shared tablespace for temporary tables 45 2018-07-19T08:57:02.807316+01:00 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 46 2018-07-19T08:57:02.807568+01:00 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions 47 2018-07-19T08:57:02.807594+01:00 0 [Note] InnoDB: Rollback of non-prepared transactions completed 48 2018-07-19T08:57:02.871396+01:00 0 [Warning] InnoDB: Retry attempts for writing partial data failed. 49 2018-07-19T08:57:02.871423+01:00 0 [ERROR] InnoDB: Write to file ./ibtmp1failed at offset 11534336, 1048576 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded. 50 2018-07-19T08:57:02.871441+01:00 0 [ERROR] InnoDB: Error number 28 means 'No space left on device'
51 2018-07-19T08:57:02.871446+01:00 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
52 2018-07-19T08:57:02.871451+01:00 0 [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space 53 2018-07-19T08:57:02.871456+01:00 0 [ERROR] InnoDB: Unable to create the shared innodb_temporary 54 2018-07-19T08:57:02.871459+01:00 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 55 2018-07-19T08:57:03.273011+01:00 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
56 2018-07-19T08:57:03.273029+01:00 0 [ERROR] Plugin 'InnoDB' init function returned error. 57 2018-07-19T08:57:03.273033+01:00 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 58 2018-07-19T08:57:03.273037+01:00 0 [ERROR] Failed to initialize builtin plugins. 59 2018-07-19T08:57:03.273040+01:00 0 [ERROR] Aborting 60 
61 2018-07-19T08:57:03.273046+01:00 0 [Note] Binlog end 62 2018-07-19T08:57:03.273389+01:00 0 [Note] mysqld: Shutdown complete 63 
64 //The mysqld process could not run again because of no free disk space.

 

I decided to drop all the binlogs on slave to release the disk space.
 1 [root@zlm3 09:01:34 /data/mysql/mysql3306]  2 #cd logs  3 
 4 [root@zlm3 09:01:35 /data/mysql/mysql3306/logs]  5 #ls -l  6 total 2691436
 7 -rw-r----- 1 mysql mysql   4333745 Jul 13 11:07 mysql-bin.000001
 8 -rw-r----- 1 mysql mysql       653 Jul 16 09:28 mysql-bin.000002
 9 -rw-r----- 1 mysql mysql 268435663 Jul 16 10:03 mysql-bin.000003
10 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:22 mysql-bin.000004
11 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:32 mysql-bin.000005
12 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:48 mysql-bin.000006
13 -rw-r----- 1 mysql mysql 268435798 Jul 16 11:29 mysql-bin.000007
14 -rw-r----- 1 mysql mysql  35214498 Jul 16 11:59 mysql-bin.000008
15 -rw-r----- 1 mysql mysql       249 Jul 16 11:59 mysql-bin.000009
16 -rw-r----- 1 mysql mysql 199908536 Jul 18 11:27 mysql-bin.000010
17 -rw-r----- 1 mysql mysql       249 Jul 18 11:27 mysql-bin.000011
18 -rw-r----- 1 mysql mysql 268474318 Jul 19 04:20 mysql-bin.000012
19 -rw-r----- 1 mysql mysql 268817026 Jul 19 04:24 mysql-bin.000013
20 -rw-r----- 1 mysql mysql 268802986 Jul 19 04:28 mysql-bin.000014
21 -rw-r----- 1 mysql mysql 268826152 Jul 19 04:32 mysql-bin.000015
22 -rw-r----- 1 mysql mysql  99422208 Jul 19 08:56 mysql-bin.000016
23 -rw-r----- 1 mysql mysql       704 Jul 19 04:32 mysql-bin.index 24 
25 [root@zlm3 09:01:36 /data/mysql/mysql3306/logs] 26 #rm -f *
27 
28 [root@zlm3 09:01:38 /data/mysql/mysql3306/logs] 29 #ls -l 30 total 0
31 
32 [root@zlm3 09:01:48 ~] 33 #df -h 34 Filesystem               Size  Used Avail Use% Mounted on 35 /dev/mapper/centos-root  8.4G  4.5G  4.0G  53% / //The free disk space had been reduced to 47%.
36 devtmpfs                 488M     0  488M   0% /dev 37 tmpfs                    497M     0  497M   0% /dev/shm 38 tmpfs                    497M  6.5M  491M   2% /run 39 tmpfs                    497M     0  497M   0% /sys/fs/cgroup 40 /dev/sda1                497M  118M  379M  24% /boot 41 none                      87G   80G  6.6G  93% /vagrant

 

Ran the mysqld again and dropped the database "sysbench" on slave.
 1 [root@zlm3 09:01:42 /data/mysql/mysql3306/logs]  2 #sh /root/mysqld.sh
 3 
 4 [root@zlm3 09:01:51 /data/mysql/mysql3306/logs]  5 #ps aux|grep mysqld  6 mysql     4346  7.0 17.8 1109984 181576 pts/0  Sl   09:01   0:00 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf  7 root      4380  0.0  0.0 112640   960 pts/0    R+   09:01   0:00 grep --color=auto mysqld  8 
 9 [root@zlm3 09:01:55 /data/mysql/mysql3306/logs] 10 #mysql 11 Welcome to the MySQL monitor. Commands end with ; or \g. 12 Your MySQL connection id is 2
13 Server version: 5.7.21-log MySQL Community Server (GPL) 14 
15 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. 16 
17 Oracle is a registered trademark of Oracle Corporation and/or its 18 affiliates. Other names may be trademarks of their respective 19 owners. 20 
21 Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. 22 
23 (zlm@192.168.1.102 3306)[(none)]>drop database sysbench; 24 Query OK, 6 rows affected (0.11 sec)

 

Started the replication threads of slave.html

 1 (zlm@192.168.1.102 3306)[(none)]>start slave;  2 Query OK, 0 rows affected (0.00 sec)  3 
 4 (zlm@192.168.1.102 3306)[(none)]>show slave status\G  5 *************************** 1. row ***************************
 6                Slave_IO_State: Waiting for master to send event  7                   Master_Host: 192.168.1.101
 8  Master_User: repl  9                   Master_Port: 3306
10                 Connect_Retry: 60
11               Master_Log_File: mysql-bin.000019
12           Read_Master_Log_Pos: 155765801
13                Relay_Log_File: relay-bin.000068
14                 Relay_Log_Pos: 398
15         Relay_Master_Log_File: mysql-bin.000019
16  Slave_IO_Running: Yes 17  Slave_SQL_Running: No 18  Replicate_Do_DB: 19  Replicate_Ignore_DB: 20  Replicate_Do_Table: 21  Replicate_Ignore_Table: 22  Replicate_Wild_Do_Table: 23  Replicate_Wild_Ignore_Table: 24                    Last_Errno: 1146
25                    Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist' 26                  Skip_Counter: 0
27           Exec_Master_Log_Pos: 99148246
28               Relay_Log_Space: 56618146
29  Until_Condition: None 30  Until_Log_File: 31                 Until_Log_Pos: 0
32  Master_SSL_Allowed: No 33  Master_SSL_CA_File: 34  Master_SSL_CA_Path: 35  Master_SSL_Cert: 36  Master_SSL_Cipher: 37  Master_SSL_Key: 38  Seconds_Behind_Master: NULL 39 Master_SSL_Verify_Server_Cert: No 40                 Last_IO_Errno: 0
41  Last_IO_Error: 42                Last_SQL_Errno: 1146
43                Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist' //Since the database had been droppted.This error was notable.
44  Replicate_Ignore_Server_Ids: 45              Master_Server_Id: 1013306
46                   Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e 47  Master_Info_File: mysql.slave_master_info 48                     SQL_Delay: 0
49  SQL_Remaining_Delay: NULL 50  Slave_SQL_Running_State: 51            Master_Retry_Count: 86400
52  Master_Bind: 53  Last_IO_Error_Timestamp: 54      Last_SQL_Error_Timestamp: 180719 09:02:07
55  Master_SSL_Crl: 56  Master_SSL_Crlpath: 57            Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021 //It was stuck on transaction 3714549(which contained error).
58             Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3714548, 59 5c77c31b-4add-11e8-81e2-080027de0e0e:1-168
60                 Auto_Position: 1
61  Replicate_Rewrite_DB: 62  Channel_Name: 63  Master_TLS_Version: 64 1 row in set (0.00 sec) 65 
66 [root@zlm3 09:10:42 ~] 67 #perror 1146
68 MySQL error code 1146 (ER_NO_SUCH_TABLE): Table '%-.192s.%-.192s' doesn't exist
69 
70 //Error 1146 indicated the absence of table "sbtest1" in "sysbench" database. 71 //Obviously,the slave was replaying the operations relevant to this table on master.The table even the database had been dropped. 72 //How could I do next step?Do I have to generate a new mysqldump file and reconfigure the slave again? 73 //There's One thing I'm rather sure that there were no other transactions generated in the whole course except the operations on "sysbench" database. 74 //Since I'd drop "sysbentch" database on both master and slave.Maybe I can fix the issue easily.

 

Checked the Executed_Gtid_Set on master.mysql

1 (zlm@192.168.1.101 3306)[(none)]>show master status; 2 +------------------+-----------+--------------+------------------+------------------------------------------------+
3 | File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                              |
4 +------------------+-----------+--------------+------------------+------------------------------------------------+
5 | mysql-bin.000019 | 155765801 |              |                  | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021 |
6 +------------------+-----------+--------------+------------------+------------------------------------------------+
7 1 row in set (0.00 sec) 8 
9 //The executed gtid was upto "3730021".

 

Tryed to fix the replica of master.sql

 1 (zlm@192.168.1.102 3306)[(none)]>show slave status\G  2 *************************** 1. row ***************************
 3                Slave_IO_State: Waiting for master to send event  4                   Master_Host: 192.168.1.101
 5  Master_User: repl  6                   Master_Port: 3306
 7                 Connect_Retry: 60
 8               Master_Log_File: mysql-bin.000019
 9           Read_Master_Log_Pos: 155765801
 10                Relay_Log_File: relay-bin.000068
 11                 Relay_Log_Pos: 398
 12         Relay_Master_Log_File: mysql-bin.000019
 13  Slave_IO_Running: Yes  14  Slave_SQL_Running: No  15  Replicate_Do_DB:  16  Replicate_Ignore_DB:  17  Replicate_Do_Table:  18  Replicate_Ignore_Table:  19  Replicate_Wild_Do_Table:  20  Replicate_Wild_Ignore_Table:  21                    Last_Errno: 1146
 22                    Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'  23                  Skip_Counter: 0
 24           Exec_Master_Log_Pos: 99148246
 25               Relay_Log_Space: 56618627
 26  Until_Condition: None  27  Until_Log_File:  28                 Until_Log_Pos: 0
 29  Master_SSL_Allowed: No  30  Master_SSL_CA_File:  31  Master_SSL_CA_Path:  32  Master_SSL_Cert:  33  Master_SSL_Cipher:  34  Master_SSL_Key:  35  Seconds_Behind_Master: NULL  36 Master_SSL_Verify_Server_Cert: No  37                 Last_IO_Errno: 0
 38  Last_IO_Error:  39                Last_SQL_Errno: 1146
 40                Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'  41  Replicate_Ignore_Server_Ids:  42              Master_Server_Id: 1013306
 43                   Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e  44  Master_Info_File: mysql.slave_master_info  45                     SQL_Delay: 0
 46  SQL_Remaining_Delay: NULL  47  Slave_SQL_Running_State:  48            Master_Retry_Count: 86400
 49  Master_Bind:  50  Last_IO_Error_Timestamp:  51      Last_SQL_Error_Timestamp: 180719 09:02:57
 52  Master_SSL_Crl:  53  Master_SSL_Crlpath:  54            Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021
 55  Executed_Gtid_Set:  56                 Auto_Position: 1
 57  Replicate_Rewrite_DB:  58  Channel_Name:  59  Master_TLS_Version:  60 1 row in set (0.00 sec)  61 
 62 (zlm@192.168.1.102 3306)[(none)]>reset master;  63 Query OK, 0 rows affected (0.02 sec)  64 
 65 (zlm@192.168.1.102 3306)[(none)]>set @@global.gtid_purged='1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021';  66 Query OK, 0 rows affected (0.00 sec)  67 
 68 //On account of surely knowing there were no other transactions at all.I set the "gtid_purged" variable to the value of "gtid_executed" on master.  69 //It means I guised that all the transactions generated on master had been replayed on slave already.The slave could retrieve new GTID at the moment.
 70 
 71 (zlm@192.168.1.102 3306)[(none)]>start slave sql_thread;  72 Query OK, 0 rows affected (0.02 sec)  73 
 74 (zlm@192.168.1.102 3306)[(none)]>show slave status\G  75 *************************** 1. row ***************************
 76                Slave_IO_State: Waiting for master to send event  77                   Master_Host: 192.168.1.101
 78  Master_User: repl  79                   Master_Port: 3306
 80                 Connect_Retry: 60
 81               Master_Log_File: mysql-bin.000019
 82           Read_Master_Log_Pos: 155765801
 83                Relay_Log_File: relay-bin.000069
 84                 Relay_Log_Pos: 438
 85         Relay_Master_Log_File: mysql-bin.000019
 86  Slave_IO_Running: Yes  87             Slave_SQL_Running: Yes //The sql_thread became "Yes".
 88  Replicate_Do_DB:  89  Replicate_Ignore_DB:  90  Replicate_Do_Table:  91  Replicate_Ignore_Table:  92  Replicate_Wild_Do_Table:  93  Replicate_Wild_Ignore_Table:  94                    Last_Errno: 0
 95  Last_Error:  96                  Skip_Counter: 0
 97           Exec_Master_Log_Pos: 155765801
 98               Relay_Log_Space: 56618434
 99  Until_Condition: None 100  Until_Log_File: 101                 Until_Log_Pos: 0
102  Master_SSL_Allowed: No 103  Master_SSL_CA_File: 104  Master_SSL_CA_Path: 105  Master_SSL_Cert: 106  Master_SSL_Cipher: 107  Master_SSL_Key: 108         Seconds_Behind_Master: 0
109 Master_SSL_Verify_Server_Cert: No 110                 Last_IO_Errno: 0
111  Last_IO_Error: 112                Last_SQL_Errno: 0
113  Last_SQL_Error: 114  Replicate_Ignore_Server_Ids: 115              Master_Server_Id: 1013306
116                   Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e 117  Master_Info_File: mysql.slave_master_info 118                     SQL_Delay: 0
119  SQL_Remaining_Delay: NULL 120       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 121            Master_Retry_Count: 86400
122  Master_Bind: 123  Last_IO_Error_Timestamp: 124  Last_SQL_Error_Timestamp: 125  Master_SSL_Crl: 126  Master_SSL_Crlpath: 127            Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021
128             Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021 //The slave had skipped those GTID(which contained error 1146) of master and waited for newer GTID.The replica had been fixed up.
129                 Auto_Position: 1
130  Replicate_Rewrite_DB: 131  Channel_Name: 132  Master_TLS_Version: 133 1 row in set (0.00 sec)

 

 Summarycentos

  • The variable "gtid_purged" cannot be set if "gtid_executed" is not empty.
  • Caution,"reset master" can only be used on slave.Keep in mind that don't do it on master anytime.
  • This case can be followed only in test environment 'cause you cannot guarantee whether all the transactions are really replayed on slave.
相關文章
相關標籤/搜索