mha方案出自:http://www.cnblogs.com/xuanzhi201111/p/4231412.htmlhtml
當主服務器故障時,人工手動調用MHA來進行故障切換操做,具體命令以下:mysql
先停MHA Manager:sql
192.168.2.131 [root ~]$ masterha_stop --conf=/etc/masterha/app1.cnf Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 (wd: /usr/local/bin) (wd now: ~) 192.168.2.131 [root ~]$
在Manager主機上操做以下:服務器
192.168.2.131 [root bin]$ masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.2.128 --dead_master_port=3306 --new_master_host=192.168.2.129 --new_master_port=3306 --ignore_last_failover --dead_master_ip=<dead_master_ip> is not set. Using 192.168.2.128. Mon Jan 19 00:42:18 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Jan 19 00:42:18 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Mon Jan 19 00:42:18 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Mon Jan 19 00:42:18 2015 - [info] MHA::MasterFailover version 0.56. Mon Jan 19 00:42:18 2015 - [info] Starting master failover. Mon Jan 19 00:42:18 2015 - [info] Mon Jan 19 00:42:18 2015 - [info] * Phase 1: Configuration Check Phase.. Mon Jan 19 00:42:18 2015 - [info] Mon Jan 19 00:42:19 2015 - [info] Dead Servers: Mon Jan 19 00:42:19 2015 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover. Mon Jan 19 00:42:19 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53
看到報錯了,報錯的緣由:MHA manager檢測到沒有dead的server,將報錯,並結束failover,也就說,咱們要手動關了主庫,才能正常切換:app
192.168.2.128 [root ~]$ /etc/init.d/mysqld stop Shutting down MySQL... SUCCESS!
再執行手動failover命令:dom
192.168.2.131 [root bin]$ masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.2.128 --dead_master_port=3306 --new_master_host=192.168.2.129 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.2.128. Sun Jan 18 19:49:20 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Jan 18 19:49:20 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Sun Jan 18 19:49:20 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Sun Jan 18 19:49:20 2015 - [info] MHA::MasterFailover version 0.53. Sun Jan 18 19:49:20 2015 - [info] Starting master failover. Sun Jan 18 19:49:20 2015 - [info] Sun Jan 18 19:49:20 2015 - [info] * Phase 1: Configuration Check Phase.. Sun Jan 18 19:49:20 2015 - [info] Sun Jan 18 19:49:20 2015 - [info] Dead Servers: Sun Jan 18 19:49:20 2015 - [info] 192.168.2.128(192.168.2.128:3306) Sun Jan 18 19:49:20 2015 - [info] Checking master reachability via mysql(double check).. Sun Jan 18 19:49:20 2015 - [info] ok. Sun Jan 18 19:49:20 2015 - [info] Alive Servers: Sun Jan 18 19:49:20 2015 - [info] 192.168.2.129(192.168.2.129:3306) Sun Jan 18 19:49:20 2015 - [info] 192.168.2.130(192.168.2.130:3306) Sun Jan 18 19:49:20 2015 - [info] Alive Slaves: Sun Jan 18 19:49:20 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled Sun Jan 18 19:49:20 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) Sun Jan 18 19:49:20 2015 - [info] Primary candidate for the new Master (candidate_master is set) Sun Jan 18 19:49:20 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled Sun Jan 18 19:49:20 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) Master 192.168.2.128 is dead. Proceed? (yes/NO): yes Sun Jan 18 19:49:24 2015 - [info] ** Phase 1: Configuration Check Phase completed. Sun Jan 18 19:49:24 2015 - [info] Sun Jan 18 19:49:24 2015 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun Jan 18 19:49:24 2015 - [info] Sun Jan 18 19:49:24 2015 - [info] HealthCheck: SSH to 192.168.2.128 is reachable. Sun Jan 18 19:49:24 2015 - [info] Forcing shutdown so that applications never connect to the current master.. Sun Jan 18 19:49:24 2015 - [info] Executing master IP deactivatation script: Sun Jan 18 19:49:24 2015 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.2.88/24=== Disabling the VIP on old master: 192.168.2.128 Sun Jan 18 19:49:24 2015 - [info] done. Sun Jan 18 19:49:24 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sun Jan 18 19:49:24 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun Jan 18 19:49:24 2015 - [info] Sun Jan 18 19:49:24 2015 - [info] * Phase 3: Master Recovery Phase.. Sun Jan 18 19:49:24 2015 - [info] Sun Jan 18 19:49:24 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun Jan 18 19:49:24 2015 - [info] Sun Jan 18 19:49:24 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000016:107 Sun Jan 18 19:49:24 2015 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun Jan 18 19:49:24 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled Sun Jan 18 19:49:24 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) Sun Jan 18 19:49:24 2015 - [info] Primary candidate for the new Master (candidate_master is set) Sun Jan 18 19:49:24 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled Sun Jan 18 19:49:24 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) Sun Jan 18 19:49:24 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000016:107 Sun Jan 18 19:49:24 2015 - [info] Oldest slaves: Sun Jan 18 19:49:24 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled Sun Jan 18 19:49:24 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) Sun Jan 18 19:49:24 2015 - [info] Primary candidate for the new Master (candidate_master is set) Sun Jan 18 19:49:24 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled Sun Jan 18 19:49:24 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) Sun Jan 18 19:49:24 2015 - [info] Sun Jan 18 19:49:24 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Sun Jan 18 19:49:24 2015 - [info] Sun Jan 18 19:49:25 2015 - [info] Fetching dead master's binary logs.. Sun Jan 18 19:49:25 2015 - [info] Executing command on the dead master 192.168.2.128(192.168.2.128:3306): save_binary_logs --command=save --start_file=mysql-bin.000016 --start_pos=107 --binlog_dir=/data/mysql --output_file=/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 Creating /tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000016 pos 107 to mysql-bin.000016 EOF into /tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog .. Dumping binlog format description event, from position 0 to 107.. ok. Dumping effective binlog data from /data/mysql/mysql-bin.000016 position 107 to tail(126).. ok. Concat succeeded. saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog 100% 126 0.1KB/s 00:00 Sun Jan 18 19:49:25 2015 - [info] scp from root@192.168.2.128:/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog to local:/var/log/masterha/app1.log/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog succeeded. Sun Jan 18 19:49:25 2015 - [info] HealthCheck: SSH to 192.168.2.129 is reachable. Sun Jan 18 19:49:26 2015 - [info] HealthCheck: SSH to 192.168.2.130 is reachable. Sun Jan 18 19:49:26 2015 - [info] Sun Jan 18 19:49:26 2015 - [info] * Phase 3.3: Determining New Master Phase.. Sun Jan 18 19:49:26 2015 - [info] Sun Jan 18 19:49:26 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sun Jan 18 19:49:26 2015 - [info] All slaves received relay logs to the same position. No need to resync each other. Sun Jan 18 19:49:26 2015 - [info] 192.168.2.129 can be new master. Sun Jan 18 19:49:26 2015 - [info] New master is 192.168.2.129(192.168.2.129:3306) Sun Jan 18 19:49:26 2015 - [info] Starting master failover.. Sun Jan 18 19:49:26 2015 - [info] From: 192.168.2.128 (current master) +--192.168.2.129 +--192.168.2.130 To: 192.168.2.129 (new master) +--192.168.2.130 Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes Sun Jan 18 19:49:31 2015 - [info] New master decided manually is 192.168.2.129(192.168.2.129:3306) Sun Jan 18 19:49:31 2015 - [info] Sun Jan 18 19:49:31 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Sun Jan 18 19:49:31 2015 - [info] Sun Jan 18 19:49:31 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun Jan 18 19:49:31 2015 - [info] Sending binlog.. saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog 100% 126 0.1KB/s 00:00 Sun Jan 18 19:49:31 2015 - [info] scp from local:/var/log/masterha/app1.log/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog to root@192.168.2.129:/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog succeeded. Sun Jan 18 19:49:31 2015 - [info] Sun Jan 18 19:49:31 2015 - [info] * Phase 3.4: Master Log Apply Phase.. Sun Jan 18 19:49:31 2015 - [info] Sun Jan 18 19:49:31 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sun Jan 18 19:49:31 2015 - [info] Starting recovery on 192.168.2.129(192.168.2.129:3306).. Sun Jan 18 19:49:31 2015 - [info] Generating diffs succeeded. Sun Jan 18 19:49:31 2015 - [info] Waiting until all relay logs are applied. Sun Jan 18 19:49:31 2015 - [info] done. Sun Jan 18 19:49:31 2015 - [info] Getting slave status.. Sun Jan 18 19:49:31 2015 - [info] This slave(192.168.2.129)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000016:107). No need to recover from Exec_Master_Log_Pos. Sun Jan 18 19:49:31 2015 - [info] Connecting to the target slave host 192.168.2.129, running recover script.. Sun Jan 18 19:49:31 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.2.129 --slave_ip=192.168.2.129 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog --workdir=/tmp --target_version=5.5.30-log --timestamp=20150118194920 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx Sun Jan 18 19:49:32 2015 - [info] Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog on 192.168.2.129:3306. This may take long time... Applying log files succeeded. Sun Jan 18 19:49:32 2015 - [info] All relay logs were successfully applied. Sun Jan 18 19:49:32 2015 - [info] Getting new master's binlog name and position.. Sun Jan 18 19:49:32 2015 - [info] mysql-bin.000005:61791 Sun Jan 18 19:49:32 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sun Jan 18 19:49:32 2015 - [info] Executing master IP activate script: Sun Jan 18 19:49:32 2015 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.2.88/24=== Enabling the VIP - 192.168.2.88/24 on the new master - 192.168.2.129 Sun Jan 18 19:49:32 2015 - [info] OK. Sun Jan 18 19:49:32 2015 - [info] ** Finished master recovery successfully. Sun Jan 18 19:49:32 2015 - [info] * Phase 3: Master Recovery Phase completed. Sun Jan 18 19:49:32 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] * Phase 4: Slaves Recovery Phase.. Sun Jan 18 19:49:32 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sun Jan 18 19:49:32 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] -- Slave diff file generation on host 192.168.2.130(192.168.2.130:3306) started, pid: 20692. Check tmp log /var/log/masterha/app1.log/192.168.2.130_3306_20150118194920.log if it takes time.. Sun Jan 18 19:49:32 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] Log messages from 192.168.2.130 ... Sun Jan 18 19:49:32 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun Jan 18 19:49:32 2015 - [info] End of log messages from 192.168.2.130. Sun Jan 18 19:49:32 2015 - [info] -- 192.168.2.130(192.168.2.130:3306) has the latest relay log events. Sun Jan 18 19:49:32 2015 - [info] Generating relay diff files from the latest slave succeeded. Sun Jan 18 19:49:32 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sun Jan 18 19:49:32 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] -- Slave recovery on host 192.168.2.130(192.168.2.130:3306) started, pid: 20694. Check tmp log /var/log/masterha/app1.log/192.168.2.130_3306_20150118194920.log if it takes time.. saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog 100% 126 0.1KB/s 00:00 Sun Jan 18 19:49:33 2015 - [info] Sun Jan 18 19:49:33 2015 - [info] Log messages from 192.168.2.130 ... Sun Jan 18 19:49:33 2015 - [info] Sun Jan 18 19:49:32 2015 - [info] Sending binlog.. Sun Jan 18 19:49:32 2015 - [info] scp from local:/var/log/masterha/app1.log/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog to root@192.168.2.130:/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog succeeded. Sun Jan 18 19:49:33 2015 - [info] Starting recovery on 192.168.2.130(192.168.2.130:3306).. Sun Jan 18 19:49:33 2015 - [info] Generating diffs succeeded. Sun Jan 18 19:49:33 2015 - [info] Waiting until all relay logs are applied. Sun Jan 18 19:49:33 2015 - [info] done. Sun Jan 18 19:49:33 2015 - [info] Getting slave status.. Sun Jan 18 19:49:33 2015 - [info] This slave(192.168.2.130)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000016:107). No need to recover from Exec_Master_Log_Pos. Sun Jan 18 19:49:33 2015 - [info] Connecting to the target slave host 192.168.2.130, running recover script.. Sun Jan 18 19:49:33 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.2.130 --slave_ip=192.168.2.130 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog --workdir=/tmp --target_version=5.5.25-log --timestamp=20150118194920 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx Sun Jan 18 19:49:33 2015 - [info] Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog on 192.168.2.130:3306. This may take long time... Applying log files succeeded. Sun Jan 18 19:49:33 2015 - [info] All relay logs were successfully applied. Sun Jan 18 19:49:33 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306).. Sun Jan 18 19:49:33 2015 - [info] Executed CHANGE MASTER. Sun Jan 18 19:49:33 2015 - [info] Slave started. Sun Jan 18 19:49:33 2015 - [info] End of log messages from 192.168.2.130. Sun Jan 18 19:49:33 2015 - [info] -- Slave recovery on host 192.168.2.130(192.168.2.130:3306) succeeded. Sun Jan 18 19:49:33 2015 - [info] All new slave servers recovered successfully. Sun Jan 18 19:49:33 2015 - [info] Sun Jan 18 19:49:33 2015 - [info] * Phase 5: New master cleanup phease.. Sun Jan 18 19:49:33 2015 - [info] Sun Jan 18 19:49:33 2015 - [info] Resetting slave info on the new master.. Sun Jan 18 19:49:33 2015 - [info] 192.168.2.129: Resetting slave info succeeded. Sun Jan 18 19:49:33 2015 - [info] Master failover to 192.168.2.129(192.168.2.129:3306) completed successfully. Sun Jan 18 19:49:33 2015 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.2.128 to 192.168.2.129 succeeded Master 192.168.2.128 is down! Check MHA Manager logs at localhost.localdomain for details. Started manual(interactive) failover. Invalidated master IP address on 192.168.2.128. The latest slave 192.168.2.129(192.168.2.129:3306) has all relay logs for recovery. Selected 192.168.2.129 as a new master. 192.168.2.129: OK: Applying all logs succeeded. 192.168.2.129: OK: Activated master IP address. 192.168.2.130: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.2.130: OK: Applying all logs succeeded. Slave started, replicating from 192.168.2.129. 192.168.2.129: Resetting slave info succeeded. Master failover to 192.168.2.129(192.168.2.129:3306) completed successfully. Sun Jan 18 19:49:33 2015 - [info] Sending mail..總結:根據在虛擬機上的測試效果,此模式適合以下場景1.首先manager沒有運行2.master損壞3.執行完此切換後,集羣就變成了普通的主從複製,新master掛了後,剩下的slave不會變成master(這是在只剩1個slave的狀況下測試的結果,剩餘多個slave的狀況沒有測試)3.老master修復後,不能自加入集羣了,check_repl會提示,集羣中「there are 2 non-slave servers」,集羣中有兩個非slave節點4.mha集羣在沒有第二主節點時怎麼樣加入一個第二主節點? 切換後,首先將old master以slave身份去同步new master,並修改配置文件,用masterha_check_repl檢查,只要提示集羣health ok就好了,也能夠在適當的時候在線切換,那樣就是無損切換。