解決 mysql 從庫 Slave_IO_Running: No

時間 2019-11-24

原文原文鏈接

因爲主庫的主機192.168.1.1宕機,再次啓來後，從庫192.168.71.1鏈接主庫發現報錯. Slave_IO_Running: No mysql

root@192.168.71.1:~# mysql -uroot -p --socket=/opt/mysql/3399/3399.sock 
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 452723
Server version: 5.0.51a-24+lenny2 (Debian)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show slave status\G;
*************************** 1. row ***************************
             Slave_IO_State: 
                Master_Host: 192.168.1.1
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: 99.000302
        Read_Master_Log_Pos: 165112917
             Relay_Log_File: 3399-relay-bin.000013
              Relay_Log_Pos: 165113047
      Relay_Master_Log_File: 99.000302
           Slave_IO_Running: No
          Slave_SQL_Running: Yes
            Replicate_Do_DB: 
        Replicate_Ignore_DB: mysql
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: 
               Skip_Counter: 0
        Exec_Master_Log_Pos: 165112917
            Relay_Log_Space: 165113047
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

查看錯誤日誌sql

mysql@192.168.71.1:/opt/mysql/3399$ cat 192.168.71.1.err
140115  1:51:01 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)
140115  1:51:01 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log
140115  1:51:01 [Note] Slave I/O thread exiting, read up to log '99.000302', position 165112917

根據錯誤位置,查找主庫上log ‘99.000302’ 對應的位置 165112917shell

root@192.168.1.1:mysql.bin# mysqlbinlog 99.000302 > /tmp/test
root@192.168.1.1:mysql# tail -n 10 /tmp/test 
#140115  0:50:25 server id 1176  end_log_pos 165111351 	Query	thread_id=111	exec_time=0	error_code=0
SET TIMESTAMP=1389718225/*!*/;
INSERT INTO user_info_db_86.region_info_table_56 (userid, region, gameflag) VALUES (563625686, 0, 2) ON DUPLICATE KEY UPDATE gameflag = (gameflag | 2)/*!*/;
# at 165111351
#140115  0:50:25 server id 1176  end_log_pos 165111378 	Xid = 17877752
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

結果發現主庫上位置最後是165111351 比165112917要小. 也就是從庫同步找的位置比主庫要大，故同步不成功安全

爲何會這樣,這是因爲這個在sync_binlog=0的狀況，很容易出現。併發

sync_binlog=0，當事務提交以後，MySQL不作fsync之類的磁盤同步指令刷新binlog_cache中的信息到磁盤，而讓系統自行決定何時來作同步，或者cache滿了以後才同步到磁盤。socket

sync_binlog=n，當每進行n次事務提交以後，MySQL將進行一次fsync之類的磁盤同步指令來將binlog_cache中的數據強制寫入磁盤。高併發

在MySQL中系統默認的設置是sync_binlog=0，也就是不作任何強制性的磁盤刷新指令，這時候的性能是最好的，可是風險也是最大的。由於一旦系統Crash，在binlog_cache中的全部binlog信息都會被丟失。而當設置爲「1」的時候，是最安全可是性能損耗最大的設置。由於當設置爲1的時候，即便系統Crash，也最多丟失binlog_cache中未完成的一個事務，對實際數據沒有任何實質性影響。從以往經驗和相關測試來看，對於高併發事務的系統來講，「sync_binlog」設置爲0和設置爲1的系統寫入性能差距可能高達5倍甚至更多。性能

這裏因爲mysql是默認配置因此該報錯緣由是： sync_binlog=0時，master binlog文件的flush log buffer（這個buffer是因爲binlog文件的os buffer) 到disk是依賴於OS自己的，但Slave IO 線程在讀取master dump 線程的位置，通常是直接讀取log buffer的,這個位置，可能遠遠大於binlog文件實際大小。因此當主機宕機後，binlog buffer未刷盤，當Master主機再次啓動後，此時從庫的binlog pos 165112917 已經比實際的binlog位置大小165111351 還大了。測試

解決方法:spa

直接作change master to到當下一個binlog。

CHANGE MASTER TO

MASTER_HOST='192.168.1.1',

MASTER_USER='repl',

MASTER_PASSWORD='replpass',

MASTER_PORT=3306,

MASTER_LOG_FILE='99.000303',

MASTER_LOG_POS=98;