mysql5.5 物理刪除binlog文件致使的故障

故障現象:mysql

中午12點多,一套主從集羣的主庫由於沒有配置大頁內存,發佈時致使OOM,MYSQL實例重啓了,而後MHA發生了切換。切換過程正常。切換後須要把原master配置成新master的slave,在manager.log文件裏面找到change master to ....命令,執行後發現複製狀態一直停留在connectiong 。名稱定:OOM的是M1,掛掉後頂替的是S1.sql

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting to reconnect after a failed master event read
                  Master_Host: 10.3.171.40
                  Master_User: rep_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: centos-bin.000002
          Read_Master_Log_Pos: 107
               Relay_Log_File: relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: centos-bin.000002 Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 107
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2017140

檢查錯誤日誌文件,日誌以下,提示在S1上找不到master上的binlog文件數據庫

160408 12:25:40 [Note] Slave I/O thread: connected to master 'rep_user@10.3.171.40:3306',replication started in log 'centos-bin.000002' at position 107
160408 12:25:40 [ERROR] Error reading packet from server: File '/data2/mysql/centos-bin.000002' not found (Errcode: 2) ( server_errno=29)
160408 12:25:40 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'centos-bin.000002' at postion 107
160408 12:25:40 [ERROR] Error reading packet from server: File '/data2/mysql/centos-bin.000002' not found (Errcode: 2) ( server_errno=29)
160408 12:26:40 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'centos-bin.000002' at postion 107
160408 12:26:40 [ERROR] Error reading packet from server: File '/data2/mysql/centos-bin.000002' not found (Errcode: 2) ( server_errno=29)

到S1上去檢查,show master status;show master logs能夠看到業務數據在寫入,POS位置也一直在改變,這裏奇怪的是00001文件的大小是0centos

mysql> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| centos-bin.000001 |         0 |
| centos-bin.000002 | 568661746 |
+-------------------+-----------+
2 rows in set (0.00 sec)

mysql> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| centos-bin.000001 |         0 |
| centos-bin.000002 | 568941034 |
+-------------------+-----------+
2 rows in set (0.00 sec)

mysql> show master logs;
+-------------------+-----------+
| Log_name          | File_size |
+-------------------+-----------+
| centos-bin.000001 |         0 |
| centos-bin.000002 | 569017617 |
+-------------------+-----------+
2 rows in set (0.00 sec)

到data目錄查看,卻沒有找到這2個文件。複製提示也是找不到文件post

到這裏奇特的現象是:業務正常寫數據庫,show master status也能夠看到有pos位置變化,可是磁盤上沒有文件,複製沒法創建測試

[root@GZ_NS_M5_SYNC_mysql_sync1-standby_171.40 ~]# find / -name centos-bin.000002
[root@GZ_NS_M5_SYNC_mysql_sync1-standby_171.40 ~]# 

 

#故障重現spa

1)正常啓動實例,開啓binlog,配置複製環境日誌

2)rm 把主庫的binlog.index.binlog.0000X刪除code

3)繼續寫入數據,pos位置變化server

4)從庫報錯,找不到binlog文件

 

#爲何會出現這樣的狀況

回想起來這個故障,應該和故障重現的過程是同樣的,這套集羣3,4個月前搭起來的,在複製正常後,standby的binlog相關文件被刪除了,其實刪除的整個目錄,這個目錄專門用來存binlog,relaylog的。刪除後搭建複製的時候作change master to,把relay log重建了,可是binlog沒有。今天發生了MHA切換,standby變成了master,接受數據寫入。MHA裏面的filename,pos是連到standby作show master status獲得的,可是這些文件已經被刪除。因此複製出錯。

 

#繼續作實驗

1)生成binlog.0001後,把binlog.index,binlog.00001都rm後,數據寫入,pos逐步變大,當超過1G大小作文件切換,會發生什麼?

答:當1寫滿後作切換,binlog.index沒有,拿不到最大的文件ID,那就又從1開始。結論:一直寫00001文件

 

2)留下index文件,把00001刪除,繼續寫入,超過1G大小會怎麼樣?

答:會生成00002文件,這個文件是落地磁盤的正常的binlog文件。

 

#今天出現的故障,如何把events拿出來?

測試下來,若是是statement的,能夠經過show master events in xxxx,獲得binlog的命令。若是是row格式的,拿不到具體的SQL命令。

相關文章
相關標籤/搜索