MHA-手動Failover流程(傳統複製>ID複製)

時間 2019-11-22

標籤 mha 手動 failover 流程傳統複製 gtid 欄目負載均衡简体版

原文原文鏈接

本文僅梳理手動Failover流程。MHA的介紹詳見：MySQL高可用架構之MHAhtml

1、基本環境

1.一、複製結構

VMware10.0+CentOS6.9+MySQL5.7.21node

ROLE	HOSTNAME	BASEDIR	DATADIR	IP	PORT
Node1	ZST1	/usr/local/mysql	/data/mysql/mysql3307/data	192.168.85.132	3307
Node2	ZST2	/usr/local/mysql	/data/mysql/mysql3307/data	192.168.85.133	3307
Node3	ZST3	/usr/local/mysql	/data/mysql/mysql3307/data	192.168.85.134	3307

傳統複製基於Row+Position，GTID複製基於Row+Gtid搭建的一主兩從複製結構：Node1->{Node二、Node3}mysql

1.二、MHA配置文件

文中使用的MHA版本是0.56，而且在Node一、Node二、Node3所有安裝manager、node包
MHA的配置文件以下sql

# 全局級配置文件：/etc/masterha/masterha_default.conf
[root@ZST1 masterha]# cat masterha_default.conf 
[server default]
#MySQL的用戶和密碼
user=mydba
password=mysql5721

#系統ssh用戶
ssh_user=root

#複製用戶
repl_user=repl
repl_password=repl

#監控
ping_interval=5
#shutdown_script=/etc/masterha/send_report.sh

#切換調用的腳本
master_ip_failover_script=/etc/masterha/master_ip_failover
master_ip_online_change_script=/etc/masterha/master_ip_online_change

log_level=debug
[root@ZST1 masterha]# 


# 集羣1配置文件：/etc/masterha/app1.conf
[root@ZST1 masterha]# cat app1.conf 
[server default]
#mha manager工做目錄
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
remote_workdir=/var/log/masterha/app1

[server1]
hostname=192.168.85.132
port=3307
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0

[server2]
hostname=192.168.85.133
port=3307
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0

[server3]
hostname=192.168.85.134
port=3307
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=1
check_repl_delay=0
[root@ZST1 masterha]#

View Code

1.三、測試數據

經過中止從節點的io_thread，再往主節點寫入數據，模擬出主從數據、從從數據不一致~數據庫

#首先清空表中記錄
mydba@192.168.85.132,3307 [replcrash]> truncate table py_user;

#Node1寫入第一條記錄
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
#Node3中止io_thread
mydba@192.168.85.134,3307 [replcrash]> stop slave io_thread;

#Node1寫入第二條記錄
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
#Node2中止io_thread
mydba@192.168.85.133,3307 [replcrash]> stop slave io_thread;

#Node1寫入第三條記錄
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;

# 最終各節點記錄以下
#Node1有三條記錄
mydba@192.168.85.132,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
|   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
|   3 | 2d8900cc-325d-11e8-88e6-000c29c1 | 2018-03-28 15:54:01 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
3 rows in set (0.00 sec)
mydba@192.168.85.132,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000004 |     1303 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
#Node2有兩條記錄
mydba@192.168.85.133,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
|   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
2 rows in set (0.00 sec)
mydba@192.168.85.133,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000007 |     8859 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
#Node1有一條記錄
mydba@192.168.85.134,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
1 row in set (0.00 sec)
mydba@192.168.85.134,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |    10322 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

View Code

很明顯從節點Node3落後於從節點Node二、從節點Node2落後於主節點Node1服務器

2、傳統複製下手動Failover

手動Failover場景，Master掛掉，可是mha_manager沒有開啓，能夠經過手動Failoversession

2.一、手動Failover

• 關閉Node1節點數據庫服務架構

# 關閉Node1節點數據庫服務
mydba@192.168.85.132,3307 [replcrash]> shutdown;

# Node二、Node3節點複製狀態
mydba@192.168.85.133,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
mydba@192.168.85.133,3307 [replcrash]> show slave status\G
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 973
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 973
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
1 row in set (0.00 sec)
mydba@192.168.85.133,3307 [replcrash]> 

mydba@192.168.85.134,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
mydba@192.168.85.134,3307 [replcrash]> show slave status\G
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 643
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 643
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
1 row in set (0.00 sec)
mydba@192.168.85.134,3307 [replcrash]>

View Code

此時，是否開啓從庫的io_thread沒啥影響，主庫已經down掉，從庫的io_thread確定是連不上去
• 手動Failover腳本，指定新Master爲Node3app

# Node1節點手動故障切換
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover

View Code

此時複製結構爲Node1->{Node二、Node3}，手動故障切換後結構爲：Node3->{Node2}ssh

2.二、切換流程

手動Failover日誌輸出

# 手動Failover 
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
Wed Mar 28 16:01:07 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
Wed Mar 28 16:01:07 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Wed Mar 28 16:01:07 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
Wed Mar 28 16:01:07 2018 - [info] MHA::MasterFailover version 0.56.
Wed Mar 28 16:01:07 2018 - [info] Starting master failover.
Wed Mar 28 16:01:07 2018 - [info] 
==================== 1、配置檢查階段，Start ====================
Wed Mar 28 16:01:07 2018 - [info] * Phase 1: Configuration Check Phase..
Wed Mar 28 16:01:07 2018 - [info] 
Wed Mar 28 16:01:08 2018 - [debug] Connecting to servers..
Wed Mar 28 16:01:09 2018 - [debug]  Connected to: 192.168.85.133(192.168.85.133:3307), user=mydba
Wed Mar 28 16:01:09 2018 - [debug]  Number of slave worker threads on host 192.168.85.133(192.168.85.133:3307): 0
Wed Mar 28 16:01:09 2018 - [debug]  Connected to: 192.168.85.134(192.168.85.134:3307), user=mydba
Wed Mar 28 16:01:09 2018 - [debug]  Number of slave worker threads on host 192.168.85.134(192.168.85.134:3307): 0
Wed Mar 28 16:01:09 2018 - [debug]  Comparing MySQL versions..
Wed Mar 28 16:01:09 2018 - [debug]   Comparing MySQL versions done.
Wed Mar 28 16:01:09 2018 - [debug] Connecting to servers done.
Wed Mar 28 16:01:09 2018 - [info] GTID failover mode = 0
Wed Mar 28 16:01:09 2018 - [info] Dead Servers:
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:09 2018 - [info] Checking master reachability via MySQL(double check)...
Wed Mar 28 16:01:09 2018 - [info]  ok.
Wed Mar 28 16:01:09 2018 - [info] Alive Servers:
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.133(192.168.85.133:3307)
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.134(192.168.85.134:3307)
Wed Mar 28 16:01:09 2018 - [info] Alive Slaves:
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Mar 28 16:01:09 2018 - [debug]    Relay log info repository: FILE
Wed Mar 28 16:01:09 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:09 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Mar 28 16:01:09 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Mar 28 16:01:09 2018 - [debug]    Relay log info repository: FILE
Wed Mar 28 16:01:09 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:09 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
******************** 選擇是否繼續進行 ********************
Master 192.168.85.132(192.168.85.132:3307) is dead. Proceed? (yes/NO): yes
Wed Mar 28 16:01:30 2018 - [info] Starting Non-GTID based failover.
Wed Mar 28 16:01:30 2018 - [info] 
Wed Mar 28 16:01:30 2018 - [info] ** Phase 1: Configuration Check Phase completed.
==================== 1、配置檢查階段，End ====================
Wed Mar 28 16:01:30 2018 - [info] 
==================== 2、故障Master關閉階段，Start ====================
Wed Mar 28 16:01:30 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Mar 28 16:01:30 2018 - [info] 
Wed Mar 28 16:01:30 2018 - [debug]  Stopping IO thread on 192.168.85.133(192.168.85.133:3307)..
Wed Mar 28 16:01:30 2018 - [debug]  Stopping IO thread on 192.168.85.134(192.168.85.134:3307)..
Wed Mar 28 16:01:30 2018 - [debug]  Stop IO thread on 192.168.85.134(192.168.85.134:3307) done.
Wed Mar 28 16:01:30 2018 - [debug]  Stop IO thread on 192.168.85.133(192.168.85.133:3307) done.
Wed Mar 28 16:01:30 2018 - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Wed Mar 28 16:01:30 2018 - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
Wed Mar 28 16:01:30 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Mar 28 16:01:30 2018 - [info] Executing master IP deactivation script:
Wed Mar 28 16:01:30 2018 - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --command=stopssh --ssh_user=root  
Wed Mar 28 16:01:30 2018 - [info]  done.
Wed Mar 28 16:01:30 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Mar 28 16:01:30 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
==================== 2、故障Master關閉階段，End ====================
Wed Mar 28 16:01:30 2018 - [info] 
==================== 3、新Master恢復階段，Start ====================
Wed Mar 28 16:01:30 2018 - [info] * Phase 3: Master Recovery Phase..
Wed Mar 28 16:01:30 2018 - [info] 
==================== 3.1、獲取最新的Slave ====================
******************** 最新Slave，用途1：用於補全其餘Slave缺乏的relay-log；用途2：用於save故障Master的binlog的起始點 ********************
Wed Mar 28 16:01:30 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Mar 28 16:01:30 2018 - [info] 
Wed Mar 28 16:01:30 2018 - [debug] Fetching current slave status..
Wed Mar 28 16:01:30 2018 - [debug]  Fetching current slave status done.
Wed Mar 28 16:01:30 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000004:973
Wed Mar 28 16:01:30 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Mar 28 16:01:30 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Mar 28 16:01:30 2018 - [debug]    Relay log info repository: FILE
Wed Mar 28 16:01:30 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:30 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Mar 28 16:01:30 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000004:643
Wed Mar 28 16:01:30 2018 - [info] Oldest slaves:
Wed Mar 28 16:01:30 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Mar 28 16:01:30 2018 - [debug]    Relay log info repository: FILE
Wed Mar 28 16:01:30 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Wed Mar 28 16:01:30 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Mar 28 16:01:30 2018 - [info] 
==================== 3.2、保存故障Master的binlog ====================
Wed Mar 28 16:01:30 2018 - [info] * Phase 3.2: Saving Dead Master''s Binlog Phase..
Wed Mar 28 16:01:30 2018 - [info] 
Wed Mar 28 16:01:30 2018 - [info] Fetching dead master''s binary logs..
******************** 在故障Master執行，取最新Slave以後的部分 ********************
Wed Mar 28 16:01:30 2018 - [info] Executing command on the dead master 192.168.85.132(192.168.85.132:3307): save_binary_logs --command=save --start_file=mysql-bin.000004  --start_pos=973 --binlog_dir=/data/mysql/mysql3307/logs --output_file=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug 
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000004 pos 973 to mysql-bin.000004 EOF into /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog ..
parse_init_headers: file=mysql-bin.000004 event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
 Binlog Checksum enabled
parse_init_headers: file=mysql-bin.000004 event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
 Got previous gtids log event: 154.
parse_init_headers: file=mysql-bin.000004 event_type=34 server_id=1323307 length=65 nextmpos=219 prevrelay=154 cur(post)relay=219
  Dumping binlog format description event, from position 0 to 154.. ok.
  Dumping effective binlog data from /data/mysql/mysql3307/logs/mysql-bin.000004 position 973 to tail(1326).. ok.
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
 Binlog Checksum enabled
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
 Got previous gtids log event: 154.
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=34 server_id=1323307 length=65 nextmpos=1038 prevrelay=154 cur(post)relay=219
 Concat succeeded.
saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  100%  507     0.5KB/s   00:00    
******************** 將獲得的Master binlog scp到 管理節點mha-manage/手動failover 運行的工做目錄 ********************
Wed Mar 28 16:01:31 2018 - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
Wed Mar 28 16:01:31 2018 - [debug] SSH connection test to 192.168.85.133, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Wed Mar 28 16:01:31 2018 - [info] HealthCheck: SSH to 192.168.85.133 is reachable.
Wed Mar 28 16:01:37 2018 - [debug] SSH connection test to 192.168.85.134, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Wed Mar 28 16:01:38 2018 - [info] HealthCheck: SSH to 192.168.85.134 is reachable.
Wed Mar 28 16:01:38 2018 - [info] 
==================== 3.3、選舉新Master ====================
Wed Mar 28 16:01:38 2018 - [info] * Phase 3.3: Determining New Master Phase..
Wed Mar 28 16:01:38 2018 - [info] 
******************** 查找最新的Slave是否包含其餘Slave缺失的Relay-log ********************
Wed Mar 28 16:01:38 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Mar 28 16:01:38 2018 - [info] Checking whether 192.168.85.133 has relay logs from the oldest position..
Wed Mar 28 16:01:38 2018 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000004 --latest_rmlp=973 --target_mlf=mysql-bin.000004 --target_rmlp=643 --server_id=1333307 --workdir=/var/log/masterha/app1 --timestamp=20180328160107 --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info  --relay_dir=/data/mysql/mysql3307/data/  --debug  :
    Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
    Relay log found at /data/mysql/mysql3307/data, up to relay-bin.000005
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-bin.000005, start_pos:856.
Target relay log FOUND!
Wed Mar 28 16:01:39 2018 - [info] OK. 192.168.85.133 has all relay logs.
Wed Mar 28 16:01:39 2018 - [info] 192.168.85.134 can be new master.
Wed Mar 28 16:01:39 2018 - [info] New master is 192.168.85.134(192.168.85.134:3307)
Wed Mar 28 16:01:39 2018 - [info] Starting master failover..
Wed Mar 28 16:01:39 2018 - [info] 
From:
192.168.85.132(192.168.85.132:3307) (current master)
 +--192.168.85.133(192.168.85.133:3307)
 +--192.168.85.134(192.168.85.134:3307)

To:
192.168.85.134(192.168.85.134:3307) (new master)
 +--192.168.85.133(192.168.85.133:3307)

******************** 選擇是否進行切換 ********************
Starting master switch from 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307)? (yes/NO): yes
Wed Mar 28 16:01:42 2018 - [info] New master decided manually is 192.168.85.134(192.168.85.134:3307)
Wed Mar 28 16:01:42 2018 - [info] 
Wed Mar 28 16:01:42 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Mar 28 16:01:42 2018 - [info] 
******************** 在最新的Slave，產生新Master與最新的Slave缺失的Relay-log ********************
Wed Mar 28 16:01:42 2018 - [info] Server 192.168.85.134 received relay logs up to: mysql-bin.000004:643
Wed Mar 28 16:01:42 2018 - [info] Need to get diffs from the latest slave(192.168.85.133) up to: mysql-bin.000004:973 (using the latest slave''s relay logs)
Wed Mar 28 16:01:43 2018 - [info] Connecting to the latest slave host 192.168.85.133, generating diff relay log files..
Wed Mar 28 16:01:43 2018 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.85.134 --latest_mlf=mysql-bin.000004 --latest_rmlp=973 --target_mlf=mysql-bin.000004 --target_rmlp=643 --server_id=1333307 --diff_file_readtolatest=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --timestamp=20180328160107 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info  --relay_dir=/data/mysql/mysql3307/data/  --debug 
Wed Mar 28 16:01:45 2018 - [info] 
    Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
    Relay log found at /data/mysql/mysql3307/data, up to relay-bin.000005
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-bin.000005, start_pos:856.
 Concat binary/relay logs from relay-bin.000005 pos 856 to relay-bin.000005 EOF into /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog ..
parse_init_headers: file=relay-bin.000005 event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
 Binlog Checksum enabled
parse_init_headers: file=relay-bin.000005 event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
 Got previous gtids log event: 154.
parse_init_headers: file=relay-bin.000005 event_type=4 server_id=1323307 length=47 nextmpos=0 prevrelay=154 cur(post)relay=201
parse_init_headers: file=relay-bin.000005 event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=201 cur(post)relay=320
 Binlog Checksum enabled
parse_init_headers: file=relay-bin.000005 event_type=4 server_id=0 length=47 nextmpos=367 prevrelay=320 cur(post)relay=367
parse_init_headers: file=relay-bin.000005 event_type=34 server_id=1323307 length=65 nextmpos=219 prevrelay=367 cur(post)relay=432
  Dumping binlog format description event, from position 0 to 367.. ok.
  Dumping effective binlog data from /data/mysql/mysql3307/data/relay-bin.000005 position 856 to tail(1186).. ok.
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
 Binlog Checksum enabled
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
 Got previous gtids log event: 154.
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=4 server_id=1323307 length=47 nextmpos=0 prevrelay=154 cur(post)relay=201
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=201 cur(post)relay=320
 Binlog Checksum enabled
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=4 server_id=0 length=47 nextmpos=367 prevrelay=320 cur(post)relay=367
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type=34 server_id=1323307 length=65 nextmpos=708 prevrelay=367 cur(post)relay=432
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog .
******************** 將獲得的relay-log scp到新Master工做目錄 ********************
 scp ZST2:/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to root@192.168.85.134(22) succeeded.
Wed Mar 28 16:01:45 2018 - [info]  Generating diff files succeeded.
Wed Mar 28 16:01:45 2018 - [info] Sending binlog..
saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  100%  507     0.5KB/s   00:00    
******************** 從管理節點mha-manage/手動failover運行的工做目錄scp故障Master的binlog到新Master工做目錄 ********************
Wed Mar 28 16:01:45 2018 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.134:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
Wed Mar 28 16:01:45 2018 - [info] 
==================== 3.4、新Master應用差別log ====================
Wed Mar 28 16:01:45 2018 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Mar 28 16:01:45 2018 - [info] 
Wed Mar 28 16:01:45 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Mar 28 16:01:45 2018 - [info] Starting recovery on 192.168.85.134(192.168.85.134:3307)..
Wed Mar 28 16:01:45 2018 - [info]  Generating diffs succeeded.
******************** 等待新Master應用完本身的relay-log ********************
Wed Mar 28 16:01:45 2018 - [info] Waiting until all relay logs are applied.
Wed Mar 28 16:01:45 2018 - [info]  done.
Wed Mar 28 16:01:45 2018 - [debug]  Stopping SQL thread on 192.168.85.134(192.168.85.134:3307)..
Wed Mar 28 16:01:45 2018 - [debug]   done.
Wed Mar 28 16:01:45 2018 - [info] Getting slave status..
Wed Mar 28 16:01:45 2018 - [info] This slave(192.168.85.134)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000004:643). No need to recover from Exec_Master_Log_Pos.
Wed Mar 28 16:01:45 2018 - [debug] Current max_allowed_packet is 4194304.
Wed Mar 28 16:01:45 2018 - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
Wed Mar 28 16:01:45 2018 - [info] Connecting to the target slave host 192.168.85.134, running recover script..
******************** 新Master按順序應用與最新的Slave缺失的relay-log，以及故障Master保存的binlog ********************
Wed Mar 28 16:01:45 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.134 --slave_ip=192.168.85.134  --slave_port=3307 --apply_files=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.21-log --timestamp=20180328160107 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug  --slave_pass=xxx
Wed Mar 28 16:01:46 2018 - [info] 
******************** 將全部缺失的relay-log、binlog彙總到total_binlog ********************
 Concat all apply files to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180328160107.binlog ..
 Copying the first binlog file /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180328160107.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog.. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123
 Binlog Checksum enabled
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154
 Got previous gtids log event: 154.
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type=34 server_id=1323307 length=65 nextmpos=1038 prevrelay=154 cur(post)relay=219
dumped up to pos 154. ok.
 /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog has effective binlog events from pos 154.
  Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog position 154 to tail(507).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180328160107.binlog .
MySQL client version is 5.7.21. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.134:3307. This may take long time...
Applying log files succeeded.
Wed Mar 28 16:01:46 2018 - [debug] Setting max_allowed_packet back to 4194304 succeeded.
Wed Mar 28 16:01:46 2018 - [info]  All relay logs were successfully applied.
******************** 新Master應用完全部的relay-log、binlog，獲得當前位置 ********************
Wed Mar 28 16:01:46 2018 - [info] Getting new master''s binlog name and position..
Wed Mar 28 16:01:46 2018 - [info]  mysql-bin.000002:10948
Wed Mar 28 16:01:46 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=3307, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=10948, MASTER_USER='repl', MASTER_PASSWORD='xxx';
******************** 開啓虛擬IP，新Master能夠對外提供服務 ********************
Wed Mar 28 16:01:46 2018 - [info] Executing master IP activate script:
Wed Mar 28 16:01:46 2018 - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port=3307 --new_master_user='mydba' --new_master_password='mysql5721'  
Set read_only=0 on the new master.
Wed Mar 28 16:01:52 2018 - [info]  OK.
Wed Mar 28 16:01:52 2018 - [info] ** Finished master recovery successfully.
Wed Mar 28 16:01:52 2018 - [info] * Phase 3: Master Recovery Phase completed.
==================== 3、新Master恢復階段，End ====================
Wed Mar 28 16:01:52 2018 - [info] 
==================== 4、Slave恢復階段，Start ====================
******************** Slave恢復過程相似新Master，首先獲得與最新的Slave差別relay-log，而後獲取故障Master的binlog ********************
Wed Mar 28 16:01:52 2018 - [info] * Phase 4: Slaves Recovery Phase..
Wed Mar 28 16:01:52 2018 - [info] 
==================== 4.1、生成最新Slave和Slave之間的差別log ====================
Wed Mar 28 16:01:52 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Mar 28 16:01:52 2018 - [info] 
Wed Mar 28 16:01:52 2018 - [info] -- Slave diff file generation on host 192.168.85.133(192.168.85.133:3307) started, pid: 3488. Check tmp log /var/log/masterha/app1/192.168.85.133_3307_20180328160107.log if it takes time..
Wed Mar 28 16:01:52 2018 - [info] 
Wed Mar 28 16:01:52 2018 - [info] Log messages from 192.168.85.133 ...
Wed Mar 28 16:01:52 2018 - [info] 
Wed Mar 28 16:01:52 2018 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Mar 28 16:01:52 2018 - [info] End of log messages from 192.168.85.133.
Wed Mar 28 16:01:52 2018 - [info] -- 192.168.85.133(192.168.85.133:3307) has the latest relay log events.
Wed Mar 28 16:01:52 2018 - [info] Generating relay diff files from the latest slave succeeded.
Wed Mar 28 16:01:52 2018 - [info] 
==================== 4.2、Slave應用差別log ====================
Wed Mar 28 16:01:52 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Mar 28 16:01:52 2018 - [info] 
Wed Mar 28 16:01:52 2018 - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:3307) started, pid: 3490. Check tmp log /var/log/masterha/app1/192.168.85.133_3307_20180328160107.log if it takes time..
saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  100%  507     0.5KB/s   00:00    
Wed Mar 28 16:01:54 2018 - [debug] Explicitly disabled relay_log_purge.
Wed Mar 28 16:01:54 2018 - [info] 
Wed Mar 28 16:01:54 2018 - [info] Log messages from 192.168.85.133 ...
Wed Mar 28 16:01:54 2018 - [info] 
Wed Mar 28 16:01:52 2018 - [info] Sending binlog..
******************** 從管理節點mha-manage/手動failover運行的工做目錄scp故障Master的binlog到Slave工做目錄 ********************
Wed Mar 28 16:01:53 2018 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.133:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
Wed Mar 28 16:01:53 2018 - [info] Starting recovery on 192.168.85.133(192.168.85.133:3307)..
Wed Mar 28 16:01:53 2018 - [info]  Generating diffs succeeded.
Wed Mar 28 16:01:53 2018 - [info] Waiting until all relay logs are applied.
Wed Mar 28 16:01:53 2018 - [info]  done.
Wed Mar 28 16:01:53 2018 - [debug]  Stopping SQL thread on 192.168.85.133(192.168.85.133:3307)..
Wed Mar 28 16:01:53 2018 - [debug]   done.
Wed Mar 28 16:01:53 2018 - [info] Getting slave status..
Wed Mar 28 16:01:53 2018 - [info] This slave(192.168.85.133)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000004:973). No need to recover from Exec_Master_Log_Pos.
Wed Mar 28 16:01:53 2018 - [debug] Current max_allowed_packet is 4194304.
Wed Mar 28 16:01:53 2018 - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
Wed Mar 28 16:01:53 2018 - [info] Connecting to the target slave host 192.168.85.133, running recover script..
******************** Slave按順序應用與最新的Slave缺失的relay-log，以及故障Master保存的binlog ********************
Wed Mar 28 16:01:53 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.133 --slave_ip=192.168.85.133  --slave_port=3307 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.21-log --timestamp=20180328160107 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug  --slave_pass=xxx
Wed Mar 28 16:01:54 2018 - [info] 
MySQL client version is 5.7.21. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.133:3307. This may take long time...
Applying log files succeeded.
Wed Mar 28 16:01:54 2018 - [debug] Setting max_allowed_packet back to 4194304 succeeded.
Wed Mar 28 16:01:54 2018 - [info]  All relay logs were successfully applied.
Wed Mar 28 16:01:54 2018 - [info]  Resetting slave 192.168.85.133(192.168.85.133:3307) and starting replication from the new master 192.168.85.134(192.168.85.134:3307)..
Wed Mar 28 16:01:54 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
Wed Mar 28 16:01:54 2018 - [debug]   done.
Wed Mar 28 16:01:54 2018 - [info]  Executed CHANGE MASTER.
Wed Mar 28 16:01:54 2018 - [debug]  Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
Wed Mar 28 16:01:54 2018 - [debug]   done.
Wed Mar 28 16:01:54 2018 - [info]  Slave started.
Wed Mar 28 16:01:54 2018 - [info] End of log messages from 192.168.85.133.
Wed Mar 28 16:01:54 2018 - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:3307) succeeded.
Wed Mar 28 16:01:54 2018 - [info] All new slave servers recovered successfully.
==================== 4、Slave恢復階段，End ====================
Wed Mar 28 16:01:54 2018 - [info] 
==================== 5、新Master清理階段，Start ====================
Wed Mar 28 16:01:54 2018 - [info] * Phase 5: New master cleanup phase..
Wed Mar 28 16:01:54 2018 - [info] 
Wed Mar 28 16:01:54 2018 - [info] Resetting slave info on the new master..
Wed Mar 28 16:01:54 2018 - [debug]  Clearing slave info..
Wed Mar 28 16:01:54 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
Wed Mar 28 16:01:54 2018 - [debug]   done.
Wed Mar 28 16:01:54 2018 - [debug]  SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
Wed Mar 28 16:01:54 2018 - [info]  192.168.85.134: Resetting slave info succeeded.
==================== 5、新Master清理階段，End ====================
Wed Mar 28 16:01:54 2018 - [info] Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
Wed Mar 28 16:01:54 2018 - [debug]  Disconnected from 192.168.85.133(192.168.85.133:3307)
Wed Mar 28 16:01:54 2018 - [debug]  Disconnected from 192.168.85.134(192.168.85.134:3307)
Wed Mar 28 16:01:54 2018 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307) succeeded

Master 192.168.85.132(192.168.85.132:3307) is down!

Check MHA Manager logs at ZST3 for details.

Started manual(interactive) failover.
Invalidated master IP address on 192.168.85.132(192.168.85.132:3307)
The latest slave 192.168.85.133(192.168.85.133:3307) has all relay logs for recovery.
Selected 192.168.85.134(192.168.85.134:3307) as a new master.
192.168.85.134(192.168.85.134:3307): OK: Applying all logs succeeded.
192.168.85.134(192.168.85.134:3307): OK: Activated master IP address.
192.168.85.133(192.168.85.133:3307): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.85.133(192.168.85.133:3307): OK: Applying all logs succeeded. Slave started, replicating from 192.168.85.134(192.168.85.134:3307)
192.168.85.134(192.168.85.134:3307): Resetting slave info succeeded.
Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
[root@ZST3 app1]#

View Code

手動Failover流程

手動Failover(傳統)
1、配置檢查：鏈接各實例，檢查服務狀態，檢查主從關係
2、故障Master關閉：中止各Slave上的IO Thread，故障Master虛擬IP摘除(stopssh)
3、新Master恢復
    3.1、獲取最新的Slave
        用於補全新Master/其餘Slave缺乏的數據；用於save故障Master的binlog的起始點
    3.2、保存故障Master的binlog
        故障Master上執行save_binary_logs(只取最新Slave以後的部分)\n將獲得的binlog scp到手動Failover運行的工做目錄
    3.3、選舉新Master
        查找最新的Slave是否包含最舊的Slave缺失的relay-log
        肯定新Master，獲得切換先後結構
        生成最新Slave和新Master之間的差別relay-log，並拷貝到新Master的工做目錄
        從手動Failover運行的工做目錄scp故障Master的binlog到新Master工做目錄
    3.4、新Master應用差別log
        等待新Master應用完本身的relay-log
        按順序應用與最新的Slave缺失的relay-log，以及故障Master保存的binlog
        將全部缺失的relay-log、binlog彙總到total_binlog
        獲得新Master的binlog:pos，其餘Slave將從這個位置開始複製
        綁定虛擬IP，新Master能夠對外提供服務
4、其餘Slave恢復
    4.1、生成差別log
        生成最新Slave和Slave之間的差別relay-log，並拷貝到Slave的工做目錄；從手動Failover運行的工做目錄scp故障Master的binlog到Slave工做目錄
    4.2、Slave應用差別log
        等待Slave應用完本身的relay-log；按順序應用與最新的Slave缺失的relay-log，以及故障Master保存的binlog；重置Slave上的複製到新Master~
    4.3、若是存在多個Slaves，重複上述操做
5、新Master清理：清理舊的複製信息STOP SLAVE;RESET SLAVE ALL;

View Code

2.三、目錄文件

切換流程須要補全數據，會產生各種文件

# 故障Master
[root@ZST1 app1]# ll
total 4
-rw-r--r-- 1 root root 507 Mar 28 16:01 saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
[root@ZST1 app1]#

Dead Master

saved_master_binlog_from_**：故障Master與最新Slave之間的差別binlog，在故障Master生成，而後拷貝到 MHA管理節點/手動Failover 工做目錄

# 最新的Slave
[root@ZST2 app1]# ll
total 12
-rw-r--r--. 1 root root  697 Mar 28 16:01 relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
-rw-r--r--. 1 root root 2867 Mar 28 16:01 relay_log_apply_for_192.168.85.133_3307_20180328160107_err.log
-rw-r--r--. 1 root root  507 Mar 28 16:01 saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
[root@ZST2 app1]#

Latest Slave

relay_from_read_to_latest_**：最新Slave與其餘Slave之間的差別relay-log，在最新Slave生成，而後拷貝到其餘對應Slave
saved_master_binlog_from_**：從管理節點拷貝過來，源頭在故障Master

# 新Master
[root@ZST3 app1]# ll
total 16
-rw-r--r--. 1 root root    0 Mar 28 16:01 app1.failover.complete
-rw-r--r--. 1 root root  697 Mar 28 16:01 relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
-rw-r--r--. 1 root root 3629 Mar 28 16:01 relay_log_apply_for_192.168.85.134_3307_20180328160107_err.log
-rw-r--r--. 1 root root  507 Mar 28 16:01 saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
-rw-r--r--. 1 root root 1050 Mar 28 16:01 total_binlog_for_192.168.85.134_3307.20180328160107.binlog
[root@ZST3 app1]#

New Master

relay_from_read_to_latest_**：從最新的Slave上拷貝過來
saved_master_binlog_from_ **：從管理節點拷貝過來，源頭在故障Master
total_binlog_for_**：彙總全部缺失的relay-log、binlog信息
• 解析差別log，查看文件中的日誌信息

#最新Slave與其餘Slave之間的差別relay-log
[root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#180328 15:41:18 server id 1333307  end_log_pos 123 CRC32 0x152b7e41    Start: binlog v 4, server v 5.7.21-log created 180328 15:41:18
# This Format_description_event appears in a relay log and was generated by the slave thread.
# at 123
#180328 15:41:18 server id 1333307  end_log_pos 154 CRC32 0x5ea2e9c6    Previous-GTIDs
# [empty]
# at 154
#700101  8:00:00 server id 1323307  end_log_pos 0 CRC32 0x2076d50b      Rotate to mysql-bin.000004  pos: 4
# at 201
#180328 15:49:33 server id 1323307  end_log_pos 123 CRC32 0x9b1488de    Start: binlog v 4, server v 5.7.21-log created 180328 15:49:33 at startup
ROLLBACK/*!*/;
# at 320
#180328 15:41:18 server id 0  end_log_pos 367 CRC32 0x838279dd  Rotate to mysql-bin.000004  pos: 154
# at 367
#180328 15:53:50 server id 1323307  end_log_pos 708 CRC32 0x9fba3aa7    Anonymous_GTID  last_committed=2        sequence_number=3       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 432
#180328 15:53:50 server id 1323307  end_log_pos 793 CRC32 0x112f5399    Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1522223630/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 517
#180328 15:53:50 server id 1323307  end_log_pos 856 CRC32 0x890cf300    Table_map: `replcrash`.`py_user` mapped to number 108
# at 580
#180328 15:53:50 server id 1323307  end_log_pos 942 CRC32 0xccb038f5    Write_rows: table id 108 flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @3='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at 666
#180328 15:53:50 server id 1323307  end_log_pos 973 CRC32 0xbfda64ba    Xid = 31
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST3 app1]# 


#故障Master與最新Slave之間的差別binlog
[root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#180328 15:49:33 server id 1323307  end_log_pos 123 CRC32 0x9b1488de    Start: binlog v 4, server v 5.7.21-log created 180328 15:49:33 at startup
ROLLBACK/*!*/;
# at 123
#180328 15:49:33 server id 1323307  end_log_pos 154 CRC32 0x37f9307d    Previous-GTIDs
# [empty]
# at 154
#180328 15:54:01 server id 1323307  end_log_pos 1038 CRC32 0x74680cfa   Anonymous_GTID  last_committed=3        sequence_number=4       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 219
#180328 15:54:01 server id 1323307  end_log_pos 1123 CRC32 0x3774a1d0   Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1522223641/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 304
#180328 15:54:01 server id 1323307  end_log_pos 1186 CRC32 0x1468e6b1   Table_map: `replcrash`.`py_user` mapped to number 108
# at 367
#180328 15:54:01 server id 1323307  end_log_pos 1272 CRC32 0x79523051   Write_rows: table id 108 flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @3='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at 453
#180328 15:54:01 server id 1323307  end_log_pos 1303 CRC32 0xb93ce981   Xid = 32
COMMIT/*!*/;
# at 484
#180328 15:57:10 server id 1323307  end_log_pos 1326 CRC32 0x577dc41e   Stop
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST3 app1]# 


#全部缺失的relay-log、binlog信息
[root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows total_binlog_for_192.168.85.134_3307.20180328160107.binlog
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#180328 15:41:18 server id 1333307  end_log_pos 123 CRC32 0x152b7e41    Start: binlog v 4, server v 5.7.21-log created 180328 15:41:18
# This Format_description_event appears in a relay log and was generated by the slave thread.
# at 123
#180328 15:41:18 server id 1333307  end_log_pos 154 CRC32 0x5ea2e9c6    Previous-GTIDs
# [empty]
# at 154
#700101  8:00:00 server id 1323307  end_log_pos 0 CRC32 0x2076d50b      Rotate to mysql-bin.000004  pos: 4
# at 201
#180328 15:49:33 server id 1323307  end_log_pos 123 CRC32 0x9b1488de    Start: binlog v 4, server v 5.7.21-log created 180328 15:49:33 at startup
ROLLBACK/*!*/;
# at 320
#180328 15:41:18 server id 0  end_log_pos 367 CRC32 0x838279dd  Rotate to mysql-bin.000004  pos: 154
# at 367
#180328 15:53:50 server id 1323307  end_log_pos 708 CRC32 0x9fba3aa7    Anonymous_GTID  last_committed=2        sequence_number=3       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 432
#180328 15:53:50 server id 1323307  end_log_pos 793 CRC32 0x112f5399    Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1522223630/*!*/;
SET @@session.pseudo_thread_id=2/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 517
#180328 15:53:50 server id 1323307  end_log_pos 856 CRC32 0x890cf300    Table_map: `replcrash`.`py_user` mapped to number 108
# at 580
#180328 15:53:50 server id 1323307  end_log_pos 942 CRC32 0xccb038f5    Write_rows: table id 108 flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @3='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at 666
#180328 15:53:50 server id 1323307  end_log_pos 973 CRC32 0xbfda64ba    Xid = 31
COMMIT/*!*/;
# at 697
#180328 15:54:01 server id 1323307  end_log_pos 1038 CRC32 0x74680cfa   Anonymous_GTID  last_committed=3        sequence_number=4       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 762
#180328 15:54:01 server id 1323307  end_log_pos 1123 CRC32 0x3774a1d0   Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1522223641/*!*/;
BEGIN
/*!*/;
# at 847
#180328 15:54:01 server id 1323307  end_log_pos 1186 CRC32 0x1468e6b1   Table_map: `replcrash`.`py_user` mapped to number 108
# at 910
#180328 15:54:01 server id 1323307  end_log_pos 1272 CRC32 0x79523051   Write_rows: table id 108 flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @3='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @4='1323307' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at 996
#180328 15:54:01 server id 1323307  end_log_pos 1303 CRC32 0xb93ce981   Xid = 32
COMMIT/*!*/;
# at 1027
#180328 15:57:10 server id 1323307  end_log_pos 1326 CRC32 0x577dc41e   Stop
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST3 app1]#

View Code

手動故障切換後結構爲：Node3->{Node2}，且數據進行了自動補全

3、GTID複製下手動Failover

3.一、MHA配置文件調整

MHA在GTID模式下，須要配置[binlog*]，能夠是單獨的Binlog Server服務器，也能夠是主庫的binlog目錄。若是不配置[binlog*]，即便主服務器沒掛，也不會從主服務器拉binlog，全部未傳遞到從庫的日誌將丟失

#app1.conf尾部添加Binlog Server信息
[root@ZST1 masterha]# cat app1.conf 
...
[binlog1]
hostname=192.168.85.132
master_binlog_dir=/data/mysql/mysql3307/logs
no_master=1
[root@ZST1 masterha]#

View Code

3.二、手動Failover

基於Row+Gtid搭建的一主兩從複製結構：Node1->{Node二、Node3}，從新生成測試數據，關閉Node1節點數據庫服務，執行手動Failover腳本

# GTID+手動Failover
[root@ZST1 masterha]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
Thu Mar 29 15:00:32 2018 - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
Thu Mar 29 15:00:32 2018 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Thu Mar 29 15:00:32 2018 - [info] Reading server configuration from /etc/masterha/app1.conf..
Thu Mar 29 15:00:32 2018 - [info] MHA::MasterFailover version 0.56.
Thu Mar 29 15:00:32 2018 - [info] Starting master failover.
Thu Mar 29 15:00:32 2018 - [info] 
==================== 1、配置檢查階段，Start ====================
Thu Mar 29 15:00:32 2018 - [info] * Phase 1: Configuration Check Phase..
Thu Mar 29 15:00:32 2018 - [info] 
Thu Mar 29 15:00:32 2018 - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Thu Mar 29 15:00:32 2018 - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
Thu Mar 29 15:00:32 2018 - [info] Binlog server 192.168.85.132 is reachable.
Thu Mar 29 15:00:32 2018 - [debug] Connecting to servers..
Thu Mar 29 15:00:32 2018 - [debug]  Connected to: 192.168.85.133(192.168.85.133:3307), user=mydba
Thu Mar 29 15:00:32 2018 - [debug]  Number of slave worker threads on host 192.168.85.133(192.168.85.133:3307): 0
Thu Mar 29 15:00:32 2018 - [debug]  Connected to: 192.168.85.134(192.168.85.134:3307), user=mydba
Thu Mar 29 15:00:32 2018 - [debug]  Number of slave worker threads on host 192.168.85.134(192.168.85.134:3307): 0
Thu Mar 29 15:00:32 2018 - [debug]  Comparing MySQL versions..
Thu Mar 29 15:00:32 2018 - [debug]   Comparing MySQL versions done.
Thu Mar 29 15:00:32 2018 - [debug] Connecting to servers done.
Thu Mar 29 15:00:32 2018 - [info] GTID failover mode = 1
Thu Mar 29 15:00:32 2018 - [info] Dead Servers:
Thu Mar 29 15:00:32 2018 - [info]   192.168.85.132(192.168.85.132:3307)
Thu Mar 29 15:00:32 2018 - [info] Checking master reachability via MySQL(double check)...
Thu Mar 29 15:00:32 2018 - [info]  ok.
Thu Mar 29 15:00:32 2018 - [info] Alive Servers:
Thu Mar 29 15:00:32 2018 - [info]   192.168.85.133(192.168.85.133:3307)
Thu Mar 29 15:00:32 2018 - [info]   192.168.85.134(192.168.85.134:3307)
Thu Mar 29 15:00:32 2018 - [info] Alive Slaves:
Thu Mar 29 15:00:32 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu Mar 29 15:00:32 2018 - [info]     GTID ON
Thu Mar 29 15:00:32 2018 - [debug]    Relay log info repository: FILE
Thu Mar 29 15:00:32 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Thu Mar 29 15:00:32 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Mar 29 15:00:32 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu Mar 29 15:00:32 2018 - [info]     GTID ON
Thu Mar 29 15:00:32 2018 - [debug]    Relay log info repository: FILE
Thu Mar 29 15:00:32 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Thu Mar 29 15:00:32 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
******************** 選擇是否繼續進行 ********************
Master 192.168.85.132(192.168.85.132:3307) is dead. Proceed? (yes/NO): yes
Thu Mar 29 15:00:34 2018 - [info] Starting GTID based failover.
Thu Mar 29 15:00:34 2018 - [info] 
Thu Mar 29 15:00:34 2018 - [info] ** Phase 1: Configuration Check Phase completed.
==================== 1、配置檢查階段，End ====================
Thu Mar 29 15:00:34 2018 - [info] 
==================== 2、故障Master關閉階段，Start ====================
Thu Mar 29 15:00:34 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Mar 29 15:00:34 2018 - [info] 
Thu Mar 29 15:00:34 2018 - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5
Thu Mar 29 15:00:34 2018 - [debug]  Stopping IO thread on 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:34 2018 - [debug]  Stopping IO thread on 192.168.85.133(192.168.85.133:3307)..
Thu Mar 29 15:00:34 2018 - [debug]  Stop IO thread on 192.168.85.133(192.168.85.133:3307) done.
Thu Mar 29 15:00:34 2018 - [debug]  Stop IO thread on 192.168.85.134(192.168.85.134:3307) done.
Thu Mar 29 15:00:34 2018 - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
Thu Mar 29 15:00:35 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Mar 29 15:00:35 2018 - [info] Executing master IP deactivation script:
Thu Mar 29 15:00:35 2018 - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --command=stopssh --ssh_user=root  
Thu Mar 29 15:00:35 2018 - [info]  done.
Thu Mar 29 15:00:35 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Mar 29 15:00:35 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
==================== 2、故障Master關閉階段，End ====================
Thu Mar 29 15:00:35 2018 - [info] 
==================== 3、新Master恢復階段，Start ====================
Thu Mar 29 15:00:35 2018 - [info] * Phase 3: Master Recovery Phase..
Thu Mar 29 15:00:35 2018 - [info] 
==================== 3.1、獲取最新的Slave ====================
******************** 最新Slave，用於補全New Master缺乏的數據；用於save故障Master的binlog的起始點 ********************
Thu Mar 29 15:00:35 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Mar 29 15:00:35 2018 - [info] 
Thu Mar 29 15:00:35 2018 - [debug] Fetching current slave status..
Thu Mar 29 15:00:35 2018 - [debug]  Fetching current slave status done.
Thu Mar 29 15:00:35 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:1013
Thu Mar 29 15:00:35 2018 - [info] Retrieved Gtid Set: 90b30799-9215-11e7-8645-000c29c1025c:8-11
Thu Mar 29 15:00:35 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Mar 29 15:00:35 2018 - [info]   192.168.85.133(192.168.85.133:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu Mar 29 15:00:35 2018 - [info]     GTID ON
Thu Mar 29 15:00:35 2018 - [debug]    Relay log info repository: FILE
Thu Mar 29 15:00:35 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Thu Mar 29 15:00:35 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Mar 29 15:00:35 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:683
Thu Mar 29 15:00:35 2018 - [info] Retrieved Gtid Set: 90b30799-9215-11e7-8645-000c29c1025c:8-10
Thu Mar 29 15:00:35 2018 - [info] Oldest slaves:
Thu Mar 29 15:00:35 2018 - [info]   192.168.85.134(192.168.85.134:3307)  Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Thu Mar 29 15:00:35 2018 - [info]     GTID ON
Thu Mar 29 15:00:35 2018 - [debug]    Relay log info repository: FILE
Thu Mar 29 15:00:35 2018 - [info]     Replicating from 192.168.85.132(192.168.85.132:3307)
Thu Mar 29 15:00:35 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Mar 29 15:00:35 2018 - [info] 
==================== 3.3、選舉新Master ====================
Thu Mar 29 15:00:35 2018 - [info] * Phase 3.3: Determining New Master Phase..
Thu Mar 29 15:00:35 2018 - [info] 
Thu Mar 29 15:00:35 2018 - [info] 192.168.85.134 can be new master.
Thu Mar 29 15:00:35 2018 - [info] New master is 192.168.85.134(192.168.85.134:3307)
Thu Mar 29 15:00:35 2018 - [info] Starting master failover..
Thu Mar 29 15:00:35 2018 - [info] 
From:
192.168.85.132(192.168.85.132:3307) (current master)
 +--192.168.85.133(192.168.85.133:3307)
 +--192.168.85.134(192.168.85.134:3307)

To:
192.168.85.134(192.168.85.134:3307) (new master)
 +--192.168.85.133(192.168.85.133:3307)

******************** 選擇是否進行切換 ********************
Starting master switch from 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307)? (yes/NO): yes
Thu Mar 29 15:00:47 2018 - [info] New master decided manually is 192.168.85.134(192.168.85.134:3307)
Thu Mar 29 15:00:47 2018 - [info] 
Thu Mar 29 15:00:47 2018 - [info] * Phase 3.3: New Master Recovery Phase..
Thu Mar 29 15:00:47 2018 - [info] 
******************** 等待新Master應用完本身的relay-log ********************
Thu Mar 29 15:00:47 2018 - [info]  Waiting all logs to be applied.. 
Thu Mar 29 15:00:47 2018 - [info]   done.
Thu Mar 29 15:00:47 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:47 2018 - [debug]   done.
Thu Mar 29 15:00:47 2018 - [info]  Replicating from the latest slave 192.168.85.133(192.168.85.133:3307) and waiting to apply..
******************** 等待最新的Slave應用完本身的relay-log ********************
Thu Mar 29 15:00:47 2018 - [info]  Waiting all logs to be applied on the latest slave.. 
******************** 將新Master change到最新的Slave，以補全差別數據 ********************
Thu Mar 29 15:00:47 2018 - [info]  Resetting slave 192.168.85.134(192.168.85.134:3307) and starting replication from the new master 192.168.85.133(192.168.85.133:3307)..
Thu Mar 29 15:00:47 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:47 2018 - [debug]   done.
Thu Mar 29 15:00:47 2018 - [info]  Executed CHANGE MASTER.
Thu Mar 29 15:00:47 2018 - [debug]  Starting slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:48 2018 - [debug]   done.
Thu Mar 29 15:00:48 2018 - [info]  Slave started.
Thu Mar 29 15:00:48 2018 - [info]  Waiting to execute all relay logs on 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:48 2018 - [info]  master_pos_wait(mysql-bin.000009:3095) completed on 192.168.85.134(192.168.85.134:3307). Executed 0 events.
Thu Mar 29 15:00:48 2018 - [info]   done.
Thu Mar 29 15:00:48 2018 - [debug]  Stopping SQL thread on 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:48 2018 - [debug]   done.
Thu Mar 29 15:00:48 2018 - [info]   done.
Thu Mar 29 15:00:48 2018 - [info] -- Saving binlog from host 192.168.85.132 started, pid: 6161
Thu Mar 29 15:00:48 2018 - [info] 
Thu Mar 29 15:00:48 2018 - [info] Log messages from 192.168.85.132 ...
Thu Mar 29 15:00:48 2018 - [info] 
******************** 在故障Master/BinlogServer執行，取最新Slave以後的部分 ********************
Thu Mar 29 15:00:48 2018 - [info] Fetching binary logs from binlog server 192.168.85.132..
Thu Mar 29 15:00:48 2018 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.000009  --start_pos=1013 --output_file=/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.21-log  --debug  --binlog_dir=/data/mysql/mysql3307/logs 
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000009 pos 1013 to mysql-bin.000009 EOF into /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog ..
Executing command: mysqlbinlog --start-position=1013  /data/mysql/mysql3307/logs/mysql-bin.000009 >> /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog
 Concat succeeded.
******************** 將獲得的binlog scp到 手動failover 運行的工做目錄 ********************
Thu Mar 29 15:00:48 2018 - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog to local:/var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog succeeded.
Thu Mar 29 15:00:48 2018 - [info] End of log messages from 192.168.85.132.
Thu Mar 29 15:00:48 2018 - [info] Saved mysqlbinlog size from 192.168.85.132 is 2373 bytes.
Thu Mar 29 15:00:48 2018 - [info] Applying differential binlog /var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog ..
Thu Mar 29 15:00:48 2018 - [info] Differential log apply from binlog server succeeded.
******************** 新Master應用完binlog，獲得當前位置 ********************
Thu Mar 29 15:00:48 2018 - [info] Getting new master''s binlog name and position..
Thu Mar 29 15:00:48 2018 - [info]  mysql-bin.000004:3408
Thu Mar 29 15:00:48 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=3307, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Mar 29 15:00:48 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000004, 3408, 90b30799-9215-11e7-8645-000c29c1025c:1-12
******************** 開啓虛擬IP，新Master能夠對外提供服務 ********************
Thu Mar 29 15:00:48 2018 - [info] Executing master IP activate script:
Thu Mar 29 15:00:48 2018 - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port=3307 --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port=3307 --new_master_user='mydba' --new_master_password='mysql5721'  
Set read_only=0 on the new master.
RTNETLINK answers: Cannot assign requested address
RTNETLINK answers: File exists
Thu Mar 29 15:00:49 2018 - [info]  OK.
Thu Mar 29 15:00:49 2018 - [info] ** Finished master recovery successfully.
Thu Mar 29 15:00:49 2018 - [info] * Phase 3: Master Recovery Phase completed.
==================== 3、新Master恢復階段，End ====================
Thu Mar 29 15:00:49 2018 - [info] 
==================== 4、Slave恢復階段，Start ====================
Thu Mar 29 15:00:49 2018 - [info] * Phase 4: Slaves Recovery Phase..
Thu Mar 29 15:00:49 2018 - [info] 
Thu Mar 29 15:00:49 2018 - [info] 
==================== 4.1、Slave直接change master to New_Master ====================
Thu Mar 29 15:00:49 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Mar 29 15:00:49 2018 - [info] 
Thu Mar 29 15:00:49 2018 - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:3307) started, pid: 6201. Check tmp log /var/log/masterha/app1/192.168.85.133_3307_20180329150032.log if it takes time..
Thu Mar 29 15:00:50 2018 - [info] 
Thu Mar 29 15:00:50 2018 - [info] Log messages from 192.168.85.133 ...
Thu Mar 29 15:00:50 2018 - [info] 
Thu Mar 29 15:00:49 2018 - [info]  Resetting slave 192.168.85.133(192.168.85.133:3307) and starting replication from the new master 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:49 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
Thu Mar 29 15:00:49 2018 - [debug]   done.
Thu Mar 29 15:00:49 2018 - [info]  Executed CHANGE MASTER.
Thu Mar 29 15:00:49 2018 - [debug]  Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:3307)..
Thu Mar 29 15:00:50 2018 - [debug]   done.
Thu Mar 29 15:00:50 2018 - [info]  Slave started.
Thu Mar 29 15:00:50 2018 - [info]  gtid_wait(90b30799-9215-11e7-8645-000c29c1025c:1-12) completed on 192.168.85.133(192.168.85.133:3307). Executed 0 events.
Thu Mar 29 15:00:50 2018 - [info] End of log messages from 192.168.85.133.
Thu Mar 29 15:00:50 2018 - [info] -- Slave on host 192.168.85.133(192.168.85.133:3307) started.
Thu Mar 29 15:00:50 2018 - [info] All new slave servers recovered successfully.
==================== 4、Slave恢復階段，End ====================
Thu Mar 29 15:00:50 2018 - [info] 
==================== 5、新Master清理階段，Start ====================
Thu Mar 29 15:00:50 2018 - [info] * Phase 5: New master cleanup phase..
Thu Mar 29 15:00:50 2018 - [info] 
Thu Mar 29 15:00:50 2018 - [info] Resetting slave info on the new master..
Thu Mar 29 15:00:50 2018 - [debug]  Clearing slave info..
Thu Mar 29 15:00:50 2018 - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:3307)..
Thu Mar 29 15:00:50 2018 - [debug]   done.
Thu Mar 29 15:00:50 2018 - [debug]  SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
Thu Mar 29 15:00:50 2018 - [info]  192.168.85.134: Resetting slave info succeeded.
==================== 5、新Master清理階段，End ====================
Thu Mar 29 15:00:50 2018 - [info] Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
Thu Mar 29 15:00:50 2018 - [debug]  Disconnected from 192.168.85.133(192.168.85.133:3307)
Thu Mar 29 15:00:50 2018 - [debug]  Disconnected from 192.168.85.134(192.168.85.134:3307)
Thu Mar 29 15:00:50 2018 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.85.132(192.168.85.132:3307) to 192.168.85.134(192.168.85.134:3307) succeeded

Master 192.168.85.132(192.168.85.132:3307) is down!

Check MHA Manager logs at ZST1 for details.

Started manual(interactive) failover.
Invalidated master IP address on 192.168.85.132(192.168.85.132:3307)
Selected 192.168.85.134(192.168.85.134:3307) as a new master.
192.168.85.134(192.168.85.134:3307): OK: Applying all logs succeeded.
192.168.85.134(192.168.85.134:3307): OK: Activated master IP address.
192.168.85.133(192.168.85.133:3307): OK: Slave started, replicating from 192.168.85.134(192.168.85.134:3307)
192.168.85.134(192.168.85.134:3307): Resetting slave info succeeded.
Master failover to 192.168.85.134(192.168.85.134:3307) completed successfully.
[root@ZST1 masterha]#

View Code

手動Failover流程

手動Failover(GTID)
1、配置檢查：鏈接各實例，檢查服務狀態，檢查主從關係
2、故障Master關閉：中止各Slave上的IO Thread，故障Master虛擬IP摘除(stopssh)
3、新Master恢復
    3.1、獲取最新的Slave
        用於補全新Master缺乏的數據；用於save故障Master的binlog的起始點
    3.2、選舉新Master
        肯定新Master，獲得切換先後結構
    3.3、新Master恢復
        3.3.1、補全新Master與最新Slave差別
            等待新Master應用完本身的relay-log；等待最新Slave應用完本身的relay-log；將新Master change到最新Slave，以補全差別數據
        3.3.2、補全新Master與故障Master差別
            故障Master/BinlogServer上執行save_binary_logs；將獲得的binlog scp到手動Failover運行的工做目錄；新Master應用完binlog，獲得當前位置；綁定虛擬IP，新Master能夠對外提供服務
4、其餘Slave恢復
    4.1、重置複製，RESET SLAVE;CHANGE MASTER TO New Master;
    4.2、若是存在多個Slaves，重複上述操做
5、新Master清理：清理舊的複製信息STOP SLAVE;RESET SLAVE ALL;

View Code

3.三、傳統和GTID下手動Failover流程區別

爲了獲得詳細的切換日誌，建議
• MHA配置文件開啓log_level=debug
• Node一、Node二、Node3節點模擬數據差別
• New Master分別選擇Node二、Node3
手動Failover(GTID)，建議打開general-log，以查看New Master與Latest Slave之間數據補全方式

	傳統	GTID
是否補全數據	只要主節點服務器沒掛，默認會將全部數據補全	需在配置文件將master/binlog server配置到[binlog*]，才能補全Dead Master上的差別log，不然只應用到Latest Slave
補全數據的方式	新Master/其餘Slave拉取Latest Slave的relay-log	新master拉取Latest Slave的binlog
	全部的新Master/其餘Slave生成與Latest Slave之間差別的relay-log，並應用這些relay-log(對應文件relay_from_read_to_latest_**)	新Master change to Latest Slave，以補全與Latest Slave之間的差別數據
	新Master/其餘Slave應用Latest Slave與Dead Master之間的差別binlog(對應文件saved_master_binlog_from_**)	新Master追平Latest Slave後，再經過save_binary_logs生成與Dead Master之間的差別binlog，並應用(對應文件saved_binlog_binlog1_**)
		其餘Slave不需應用任何差別log，直接change master to new_master便可
生成的文件	relay_from_read_to_latest_**：最新Slave與其餘Slave之間的差別relay-log，在最新Slave生成，而後拷貝到其餘對應Slave	saved_master_binlog_from_**：故障Master與最新Slave之間的差別binlog，在故障Master/BinlogServer生成，而後拷貝到手動Failover運行的工做目錄
	saved_master_binlog_from_**：故障Master與最新Slave之間的差別binlog，在故障Master生成，先拷貝到手動Failover運行的工做目錄，而後拷貝到其餘Slave
	文件可使用mysqlbinlog解析~.~	文件不能使用mysqlbinlog解析(･ω･)也許是姿式不對~不過它們的命令確實稍有不一樣~~

GTID環境，只有在處理Dead Master數據時，才使用save_binary_logs的方式(主庫掛掉，無法change)，其餘都是直接經過change master to利用複製線程補全數據。同時它也再也不依賴Latest Slave的relay-log總的來講GTID環境下MHA有點臃腫，有能力的能夠自行寫腳本處理：肯定Latest_Slave->New_Master:change master to Latest_Slave->mysqlbinlog ./binlogserver/binlog --start-positon>New_Master->Other_Slave change master to New_Master若是使用加強半同步，基本能確保Dead_Master上的binlog所有傳遞到Latest_Slave，這種狀況下進行故障切換更加簡單(⊙_⊙)

相關標籤/搜索

複製

mha+gtid+failover+binlog

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。