在平常工做中,會碰到以下的場景,如mysql數據庫升級,主服務器硬件升級等,這個時候就須要將寫操做切換到另一臺服務器上,那麼如何進行在線切換呢?同時,要求切換過程短,對業務的影響比較小。html
MHA就提供了這樣一種優雅的方式,只會堵塞業務0.5~2s的時間,在這段時間內,業務沒法讀取和寫入。node
集羣信息mysql
角色 IP地址 ServerID 類型sql
Master 192.168.244.10 1 寫入數據庫
Candicate master 192.168.244.20 2 讀服務器
Slave 192.168.244.30 3 讀app
Monitor host 192.168.244.40 監控集羣組ssh
MHA具體的搭建步驟和原理,可參考另一篇博客:ui
在線切換的步驟
1. 關閉MHA監控
# masterha_stop --conf=/etc/masterha/app1.cnf
2. 在線切換
# /usr/local/bin/masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.244.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
其中,
--orig_master_is_new_slave是將原master切換爲新主的slave,默認狀況下,是不添加的。
--running_updates_limit默認爲1s,即若是主從延遲時間(Seconds_Behind_Master),或master show processlist中dml操做大於1s,則不會執行切換。
在線切換的輸出
Tue Apr 11 15:28:32 2017 - [info] MHA::MasterRotate version 0.56. Tue Apr 11 15:28:32 2017 - [info] Starting online master switch.. Tue Apr 11 15:28:32 2017 - [info] Tue Apr 11 15:28:32 2017 - [info] * Phase 1: Configuration Check Phase.. Tue Apr 11 15:28:32 2017 - [info] Tue Apr 11 15:28:32 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Apr 11 15:28:32 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Apr 11 15:28:32 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Apr 11 15:28:34 2017 - [info] GTID failover mode = 0 Tue Apr 11 15:28:34 2017 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306) Tue Apr 11 15:28:34 2017 - [info] Alive Slaves: Tue Apr 11 15:28:34 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log -bin:enabledTue Apr 11 15:28:34 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Tue Apr 11 15:28:34 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Apr 11 15:28:34 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log -bin:enabledTue Apr 11 15:28:34 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168 .244.10:3306)? (YES/no): yes
Tue Apr 11 15:28:47 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Apr 11 15:28:47 2017 - [info] ok. Tue Apr 11 15:28:47 2017 - [info] Checking MHA is not monitoring or doing failover.. Tue Apr 11 15:28:47 2017 - [info] Checking replication health on 192.168.244.20.. Tue Apr 11 15:28:47 2017 - [info] ok. Tue Apr 11 15:28:47 2017 - [info] Checking replication health on 192.168.244.30.. Tue Apr 11 15:28:47 2017 - [info] ok. Tue Apr 11 15:28:47 2017 - [info] 192.168.244.20 can be new master. Tue Apr 11 15:28:47 2017 - [info] From: 192.168.244.10(192.168.244.10:3306) (current master) +--192.168.244.20(192.168.244.20:3306) +--192.168.244.30(192.168.244.30:3306) To: 192.168.244.20(192.168.244.20:3306) (new master) +--192.168.244.30(192.168.244.30:3306) +--192.168.244.10(192.168.244.10:3306) Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO): yes Tue Apr 11 15:29:00 2017 - [info] Checking whether 192.168.244.20(192.168.244.20:3306) is ok for the new master.. Tue Apr 11 15:29:00 2017 - [info] ok. Tue Apr 11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.Tue Apr 11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): Resetting slave pointing to the dummy host. Tue Apr 11 15:29:00 2017 - [info] ** Phase 1: Configuration Check Phase completed. Tue Apr 11 15:29:00 2017 - [info] Tue Apr 11 15:29:00 2017 - [info] * Phase 2: Rejecting updates Phase.. Tue Apr 11 15:29:00 2017 - [info] Tue Apr 11 15:29:00 2017 - [info] Executing master ip online change script to disable write on the current master: Tue Apr 11 15:29:00 2017 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.244.10 --orig_ma ster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr 11 15:29:00 2017 476501 Set read_only on the new master.. ok. Tue Apr 11 15:29:00 2017 911951 Set read_only=1 on the orig master.. ok. Tue Apr 11 15:29:00 2017 919517 Killing all application threads.. Tue Apr 11 15:29:00 2017 919552 done. Disabling the VIP an old master: 192.168.244.10 SIOCSIFFLAGS: Cannot assign requested address Tue Apr 11 15:29:00 2017 - [info] ok. Tue Apr 11 15:29:00 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Tue Apr 11 15:29:00 2017 - [info] Executing FLUSH TABLES WITH READ LOCK.. Tue Apr 11 15:29:00 2017 - [info] ok. Tue Apr 11 15:29:00 2017 - [info] Orig master binlog:pos is mysql-bin.000016:211. Tue Apr 11 15:29:00 2017 - [info] Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306).. Tue Apr 11 15:29:01 2017 - [info] master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.20(192.168.244.20:3306). Executed 0 events.Tue Apr 11 15:29:01 2017 - [info] done. Tue Apr 11 15:29:01 2017 - [info] Getting new master's binlog name and position.. Tue Apr 11 15:29:01 2017 - [info] mysql-bin.000009:211 Tue Apr 11 15:29:01 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_ HOST='192.168.244.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=211, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Apr 11 15:29:01 2017 - [info] Executing master ip online change script to allow write on the new master: Tue Apr 11 15:29:01 2017 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.244.10 --orig_m aster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr 11 15:29:01 2017 109040 Set read_only=0 on the new master. Enabling the VIP 192.168.244.188 on the new master: 192.168.244.20 Tue Apr 11 15:29:01 2017 - [info] ok. Tue Apr 11 15:29:01 2017 - [info] Tue Apr 11 15:29:01 2017 - [info] * Switching slaves in parallel.. Tue Apr 11 15:29:01 2017 - [info] Tue Apr 11 15:29:01 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) started, pid: 17651 Tue Apr 11 15:29:01 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] Log messages from 192.168.244.30 ... Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:01 2017 - [info] Waiting to execute all relay logs on 192.168.244.30(192.168.244.30:3306).. Tue Apr 11 15:29:01 2017 - [info] master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.30(192.168.244.30:3306). Executed 0 events.Tue Apr 11 15:29:01 2017 - [info] done. Tue Apr 11 15:29:01 2017 - [info] Resetting slave 192.168.244.30(192.168.244.30:3306) and starting replication from the new master 1 92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:01 2017 - [info] Executed CHANGE MASTER. Tue Apr 11 15:29:01 2017 - [info] Slave started. Tue Apr 11 15:29:02 2017 - [info] End of log messages from 192.168.244.30 ... Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) succeeded. Tue Apr 11 15:29:02 2017 - [info] Unlocking all tables on the orig master: Tue Apr 11 15:29:02 2017 - [info] Executing UNLOCK TABLES.. Tue Apr 11 15:29:02 2017 - [info] ok. Tue Apr 11 15:29:02 2017 - [info] Starting orig master as a new slave.. Tue Apr 11 15:29:02 2017 - [info] Resetting slave 192.168.244.10(192.168.244.10:3306) and starting replication from the new master 1 92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:02 2017 - [info] Executed CHANGE MASTER. Tue Apr 11 15:29:02 2017 - [info] Slave started. Tue Apr 11 15:29:02 2017 - [info] All new slave servers switched successfully. Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] * Phase 5: New master cleanup phase.. Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] 192.168.244.20: Resetting slave info succeeded. Tue Apr 11 15:29:02 2017 - [info] Switching master to 192.168.244.20(192.168.244.20:3306) completed successfully.
MHA在線切換的原理
1. 檢查當前的配置信息及主從服務器的信息
包括讀取MHA的配置文件/etc/masterha/app1.cnf及檢查當前slave的健康狀態
2. 阻止對當前master的更新
主要經過以下步驟:
1> 等待1.5s($time_until_kill_threads*100ms),等待當前鏈接斷開。
2> 執行 read_only=1,阻止新的DML操做
3> 等待0.5s,等待當前DML操做完成。
4> kill掉全部鏈接。
5> FLUSH NO_WRITE_TO_BINLOG TABLES
6> FLUSH TABLES WITH READ LOCK
3. 等待新master執行完全部的relay log
Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..
4. 將新master的read_only設置爲off,並添加VIP
5. slave切換到新master上。
1> 等待slave(192.168.244.30)應用完原主從複製產生的relay log,而後執行change master操做切換到新master上。
2> 釋放原master上加的鎖。
3> 因masterha_master_switch命令行中帶有--orig_master_is_new_slave參數,故原master也切換爲新master的從。
6. 清理新master的相關信息。
主要是執行了reset slave all操做,清除以前的複製信息。
MHA在線切換需知足的條件
MHA在執行在線切換以前,會判斷當前的主從複製信息,只有知足瞭如下條件,才能執行切換動做:
1. 全部SLAVE的IO線程和SQL線程都在運行。
2. 全部slave的Seconds_Behind_Master小於或等於running_updates_limit的值,該參數若是沒有顯示指定的話,則默認爲1s
3. 在master上,經過show processlist輸出,沒有一個DML操做的時間大於running_updates_limit的值。
在線切換時,打開general log,各個服務器的操做信息
注:在執行masterha_master_switch命令時,會有兩次確認操做
1. It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168
.244.10:3306)? (YES/no):
2. Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO):
如下輸出中間都有兩次空白,其中第一次空白以前的輸出對應第一次確認以前,第二次以前的輸出對應第二次確認以前。
原master 192.168.244.10
170412 16:52:38 23 Connect monitor@node4 on 23 Query set autocommit=1 23 Query SELECT CONNECTION_ID() AS Value 170412 16:52:39 24 Connect monitor@node4 on 24 Query set autocommit=1 24 Query SELECT CONNECTION_ID() AS Value 24 Query SET wait_timeout=86400 24 Query SELECT @@global.server_id As Value 24 Query SELECT VERSION() AS Value 24 Query SELECT @@global.gtid_mode As Value 24 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 24 Query SHOW MASTER STATUS 24 Query SELECT @@global.datadir AS Value 24 Query SELECT @@global.slave_parallel_workers AS Value 24 Query SHOW SLAVE STATUS 24 Query SELECT @@global.read_only As Value 24 Query SELECT @@global.relay_log_purge As Value 170412 16:54:06 24 Query FLUSH NO_WRITE_TO_BINLOG TABLES 24 Query SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value 24 Query SHOW PROCESSLIST 170412 16:55:51 24 Query SHOW SLAVE STATUS 24 Query CHANGE MASTER TO MASTER_HOST='dummy_host' 170412 16:55:52 24 Query SHOW SLAVE STATUS 24 Query RESET SLAVE /*!50516 ALL */ 24 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value 24 Quit 25 Connect monitor@node4 on 25 Query set autocommit=1 25 Query SELECT CONNECTION_ID() AS Value 25 Query SET sql_log_bin=0 25 Query SHOW PROCESSLIST 25 Query SELECT @@global.read_only As Value 25 Query SET GLOBAL read_only=1 25 Query SELECT @@global.read_only As Value 25 Query SHOW PROCESSLIST 25 Query SET sql_log_bin=1 25 Quit 26 Connect monitor@node4 on 26 Query set autocommit=1 26 Query SELECT CONNECTION_ID() AS Value 26 Query SET wait_timeout=86400 26 Query FLUSH TABLES WITH READ LOCK 26 Query SHOW MASTER STATUS 170412 16:55:53 26 Query UNLOCK TABLES 26 Query CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120 26 Query SET GLOBAL relay_log_purge=0 26 Query START SLAVE 27 Connect Out repl@192.168.244.20:3306 26 Query SHOW SLAVE STATUS 26 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 26 Quit
新master 192.168.244.20
170412 16:52:38 23 Connect monitor@node4 on 23 Query set autocommit=1 23 Query SELECT CONNECTION_ID() AS Value 170412 16:52:39 24 Connect monitor@node4 on 24 Query set autocommit=1 24 Query SELECT CONNECTION_ID() AS Value 24 Query SET wait_timeout=86400 24 Query SELECT @@global.server_id As Value 24 Query SELECT VERSION() AS Value 24 Query SELECT @@global.gtid_mode As Value 24 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 24 Query SHOW MASTER STATUS 24 Query SELECT @@global.datadir AS Value 24 Query SELECT @@global.slave_parallel_workers AS Value 24 Query SHOW SLAVE STATUS 24 Query SELECT @@global.read_only As Value 24 Query SELECT @@global.relay_log_purge As Value 24 Query SELECT @@global.relay_log_info_repository AS Value 24 Query SELECT @@global.datadir AS Value 24 Query SELECT @@global.relay_log_info_file AS Value 24 Query SHOW SLAVE STATUS 24 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl' 170412 16:54:06 24 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value 24 Query SHOW SLAVE STATUS 24 Query SHOW SLAVE STATUS 170412 16:55:52 24 Query SHOW PROCESSLIST 25 Connect monitor@node4 on 25 Query set autocommit=1 25 Query SELECT CONNECTION_ID() AS Value 25 Query SELECT @@global.read_only As Value 25 Query SELECT @@global.read_only As Value 25 Quit 24 Query SHOW SLAVE STATUS 24 Query SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result 24 Query STOP SLAVE SQL_THREAD 24 Query SHOW SLAVE STATUS 24 Query SHOW MASTER STATUS 26 Connect monitor@node4 on 26 Query set autocommit=1 26 Query SELECT CONNECTION_ID() AS Value 26 Query SET sql_log_bin=0 26 Query SELECT @@global.read_only As Value 26 Query SET GLOBAL read_only=0 26 Query SET sql_log_bin=1 26 Quit 24 Query SELECT @@global.read_only As Value 27 Connect repl@node3 on 27 Query SELECT UNIX_TIMESTAMP() 27 Query SHOW VARIABLES LIKE 'SERVER_ID' 27 Query SET @master_heartbeat_period= 1799999979520 27 Query SET @master_binlog_checksum= @@global.binlog_checksum 27 Query SELECT @master_binlog_checksum 27 Query SELECT @@GLOBAL.GTID_MODE 27 Query SHOW VARIABLES LIKE 'SERVER_UUID' 27 Query SET @slave_uuid= '8a1093c8-1d00-11e7-954f-000c299a5715' 27 Binlog Dump Log: 'mysql-bin.000010' Pos: 120 170412 16:55:53 28 Connect repl@node1 on 28 Query SELECT UNIX_TIMESTAMP() 28 Query SHOW VARIABLES LIKE 'SERVER_ID' 28 Query SET @master_heartbeat_period= 1799999979520 28 Query SET @master_binlog_checksum= @@global.binlog_checksum 28 Query SELECT @master_binlog_checksum 28 Query SELECT @@GLOBAL.GTID_MODE 28 Query SHOW VARIABLES LIKE 'SERVER_UUID' 24 Query STOP SLAVE 28 Query SET @slave_uuid= '2a6365e0-1d05-11e7-956d-000c29c64704' 28 Binlog Dump Log: 'mysql-bin.000010' Pos: 120 24 Query SHOW SLAVE STATUS 24 Query RESET SLAVE /*!50516 ALL */ 24 Query SHOW SLAVE STATUS 24 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 24 Quit
slave 192.168.244.30
170412 16:52:37 16 Connect monitor@node4 on 16 Query set autocommit=1 16 Query SELECT CONNECTION_ID() AS Value 170412 16:52:38 17 Connect monitor@node4 on 17 Query set autocommit=1 17 Query SELECT CONNECTION_ID() AS Value 17 Query SET wait_timeout=86400 17 Query SELECT @@global.server_id As Value 17 Query SELECT VERSION() AS Value 17 Query SELECT @@global.gtid_mode As Value 17 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 17 Query SHOW MASTER STATUS 17 Query SELECT @@global.datadir AS Value 17 Query SELECT @@global.slave_parallel_workers AS Value 17 Query SHOW SLAVE STATUS 17 Query SELECT @@global.read_only As Value 17 Query SELECT @@global.relay_log_purge As Value 17 Query SELECT @@global.relay_log_info_repository AS Value 17 Query SELECT @@global.datadir AS Value 17 Query SELECT @@global.relay_log_info_file AS Value 17 Query SHOW SLAVE STATUS 17 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl' 170412 16:54:05 17 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value 17 Query SHOW SLAVE STATUS 17 Query SHOW SLAVE STATUS 170412 16:55:50 17 Query SHOW SLAVE STATUS 170412 16:55:51 17 Query SHOW SLAVE STATUS 17 Query SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result 17 Query STOP SLAVE SQL_THREAD 17 Query SHOW SLAVE STATUS 17 Query STOP SLAVE 17 Query STOP SLAVE 17 Query SHOW SLAVE STATUS 17 Query RESET SLAVE 17 Query CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120 17 Query SET GLOBAL relay_log_purge=0 17 Query START SLAVE 18 Connect Out repl@192.168.244.20:3306 17 Query SHOW SLAVE STATUS 170412 16:55:52 17 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 17 Quit
參考
《深刻淺出MySQL》