MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案,該工具僅適用於MySQL Replication 環境,目的在於維持master主庫的高可用性。MHA 是自動的master 故障轉移和slave提高的軟件包,基於標準的MySQL複製(異步/半同步)。node
MHA由兩部分組成:MHA Manager
(管理節點)和 MHA Node
(數據節點)。mysql
MHA Manager
會定時探測集羣中的master節點,當master出現故障時,它能夠自動將最新數據的slave提高爲新的master,而後將全部其餘的slave從新指向新的master。整個故障轉移過程對應用程序徹底透明。linux
在MHA自動故障切換過程當中,MHA試圖從宕機的主服務器上保存二進制日誌,最大程度的保證數據的不丟失,但這並不老是可行的。例如,若是主服務器硬件故障或沒法經過ssh訪問,MHA無法保存二進制日誌,只進行故障轉移而丟失了最新的數據。使用半同步複製,能夠大大下降數據丟失的風險。MHA能夠與半同步複製結合起來。若是隻有一個slave已經收到了最新的二進制日誌,MHA能夠將最新的二進制日誌應用於其餘全部的slave服務器上,所以能夠保證全部節點的數據一致性。git
(1)從宕機崩潰的master保存二進制日誌事件(binlog events);github
(2)識別含有最新更新的slave;redis
(3)apply差別的中繼日誌(relay log)到其餘的slave;sql
(4)apply從master保存的二進制日誌事件(binlog events);數據庫
(5)提高一個slave爲新的master;json
(6)使其餘的slave鏈接新的master進行復制。緩存
Manager管理工具
`masterha_check_ssh` 檢查MHA的SSH配置情況
`masterha_check_repl` 檢查MySQL複製情況
`masterha_manger` 啓動MHA
`masterha_check_status` 檢測當前MHA運行狀態
`masterha_master_monitor` 檢測master是否宕機
`masterha_master_switch` 控制故障轉移(自動或者手動)
`masterha_conf_host` 添加或刪除配置的server信息
Node數據節點工具
`save_binary_logs` 保存和複製master的二進制日誌
`apply_diff_relay_logs` 識別差別的中繼日誌事件並將其差別的事件應用於其餘的slave
`filter_mysqlbinlog` 去除沒必要要的ROLLBACK事件(MHA已再也不使用這個工具)
`purge_relay_logs` 清除中繼日誌(不會阻塞SQL線程)
系統 | 主機名 | mysql角色 | MHA角色 | server_id | MySQL 版本 | VIP |
---|---|---|---|---|---|---|
CentOS 7.5 | mysqldb1(100) | Mater | node | 1003306 | 5.7.23 | 192.168.56.111 |
CentOS 7.5 | mysqldb2(200) | Slave | node manager |
2003306 | 5.7.23 | |
CentOS 7.5 | mysqldb3(210) | Slave | node | 2103306 | 5.7.23 |
搭建過程參考MySQL 使用GTID進行復制
建立mha管理帳號
#在(mysqldb1)master上執行 GRANT ALL PRIVILEGES ON *.* TO 'mha_rep'@'192.168.56.%' IDENTIFIED BY '123456';
從庫上執行
#設置爲只讀模式 set global read_only=1; #read_only=1只讀模式,能夠限定普通用戶進行數據修改的操做,但不會限定具備super權限的用戶的數據修改操做 #禁用relay log自動清除 set global relay_log_purge=0;
MHA在發生切換的過程當中,從庫的恢復過程當中依賴於relay log的相關信息,因此這裏要將relay log的自動清除設置爲OFF,採用手動清除relay log的方式。在默認狀況下,從服務器上的中繼日誌會在SQL線程執行完畢後被自動刪除。可是在MHA環境中,這些中繼日誌在恢復其餘從服務器時可能會被用到,所以須要禁用中繼日誌的自動刪除功能。按期清除中繼日誌須要考慮到複製延時的問題。在ext3的文件系統下,刪除大的文件須要必定的時間,會致使嚴重的複製延時。爲了不復制延時,須要暫時爲中繼日誌建立硬連接,由於在linux系統中經過硬連接刪除大文件速度會很快。(在mysql數據庫中,刪除大表時,一般也採用創建硬連接的方式)
(1) 在三臺機器上生成各自的key文件
[root@mysqldb1 11:10:53 /root] # ssh-keygen -t rsa [root@mysqldb2 11:10:56 /root] # ssh-keygen -t rsa [root@mysqldb3 11:10:58 /root] # ssh-keygen -t rsa
(2)用ssh-copy-id把公鑰複製到每臺主機上
[root@mysqldb1 11:15:35 /root] # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.100 # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.200 # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.210 [root@mysqldb2 11:11:15 /root] # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.100 # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.200 # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.210 [root@mysqldb3 11:17:53 /root] # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.100 # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.200 # ssh-copy-id -i .ssh/id_rsa.pub root@192.168.56.210 【注意】:配置完成後,採用 ssh [hostname] date 進行驗證
網址:
https://github.com/yoshinorim/mha4mysql-manager
https://github.com/wubx/mha4mysql-manager 建議使用
文件:mha4mysql-manager-master.zip
網址:
https://github.com/yoshinorim/mha4mysql-node
https://github.com/wubx/mha4mysql-node #建議使用
文件:mha4mysql-node-master.zip
每一個節點都安裝manager和node,但只有mysqldb2上啓動manager。
1)解壓安裝包
unzip mha4mysql-manager-master.zip unzip mha4mysql-node-master.zip
2)安裝依賴包
--yum search perl |grep install yum install cpan yum install perl-Module-Install.noarch yum install perl-DBI yum install perl-DBD-MySQL yum install perl-Time-HiRes.x86_64 yum install perl-Config-Tiny.noarch yum install perl-Log-Dispatch.noarch yum install perl-Parallel-ForkManager.noarch
3)編譯安裝(node)
# cd /opt/mha4mysql-node-master/ # perl Makefile.PL include /opt/mha4mysql-node-master/inc/Module/Install.pm include inc/Module/Install/Metadata.pm include inc/Module/Install/Base.pm include inc/Module/Install/Makefile.pm include inc/Module/Install/Scripts.pm include inc/Module/Install/AutoInstall.pm include inc/Module/Install/Include.pm include inc/Module/AutoInstall.pm *** Module::AutoInstall version 1.06 *** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.627) - DBD::mysql ...loaded. (4.023) *** Module::AutoInstall configuration finished. include inc/Module/Install/WriteAll.pm include inc/Module/Install/Win32.pm include inc/Module/Install/Can.pm include inc/Module/Install/Fetch.pm Checking if your kit is complete... Warning: the following files are missing in your kit: META.yml Please inform the author. Writing Makefile for mha4mysql::node Writing MYMETA.yml and MYMETA.json Writing META.yml # make && make install cp lib/MHA/BinlogManager.pm blib/lib/MHA/BinlogManager.pm cp lib/MHA/BinlogPosFindManager.pm blib/lib/MHA/BinlogPosFindManager.pm cp lib/MHA/BinlogPosFinderXid.pm blib/lib/MHA/BinlogPosFinderXid.pm cp lib/MHA/BinlogHeaderParser.pm blib/lib/MHA/BinlogHeaderParser.pm cp lib/MHA/BinlogPosFinder.pm blib/lib/MHA/BinlogPosFinder.pm cp lib/MHA/BinlogPosFinderElp.pm blib/lib/MHA/BinlogPosFinderElp.pm cp lib/MHA/NodeUtil.pm blib/lib/MHA/NodeUtil.pm cp lib/MHA/SlaveUtil.pm blib/lib/MHA/SlaveUtil.pm cp lib/MHA/NodeConst.pm blib/lib/MHA/NodeConst.pm cp bin/filter_mysqlbinlog blib/script/filter_mysqlbinlog /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/filter_mysqlbinlog cp bin/apply_diff_relay_logs blib/script/apply_diff_relay_logs /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/apply_diff_relay_logs cp bin/purge_relay_logs blib/script/purge_relay_logs /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/purge_relay_logs cp bin/save_binary_logs blib/script/save_binary_logs /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/save_binary_logs Manifying blib/man1/filter_mysqlbinlog.1 Manifying blib/man1/apply_diff_relay_logs.1 Manifying blib/man1/purge_relay_logs.1 Manifying blib/man1/save_binary_logs.1 Installing /usr/local/share/perl5/MHA/BinlogManager.pm Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm Installing /usr/local/share/perl5/MHA/NodeUtil.pm Installing /usr/local/share/perl5/MHA/SlaveUtil.pm Installing /usr/local/share/perl5/MHA/NodeConst.pm Installing /usr/local/share/man/man1/filter_mysqlbinlog.1 Installing /usr/local/share/man/man1/apply_diff_relay_logs.1 Installing /usr/local/share/man/man1/purge_relay_logs.1 Installing /usr/local/share/man/man1/save_binary_logs.1 Installing /usr/local/bin/filter_mysqlbinlog Installing /usr/local/bin/apply_diff_relay_logs Installing /usr/local/bin/purge_relay_logs Installing /usr/local/bin/save_binary_logs Appending installation info to /usr/lib64/perl5/perllocal.pod
4)編譯安裝(manager)
# cd mha4mysql-manager-master/ # perl Makefile.PL include /opt/mha4mysql-manager-master/inc/Module/Install.pm include inc/Module/Install/Metadata.pm include inc/Module/Install/Base.pm include inc/Module/Install/Makefile.pm include inc/Module/Install/Scripts.pm include inc/Module/Install/AutoInstall.pm include inc/Module/Install/Include.pm include inc/Module/AutoInstall.pm *** Module::AutoInstall version 1.06 *** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.627) - DBD::mysql ...loaded. (4.023) - Time::HiRes ...loaded. (1.9725) - Config::Tiny ...loaded. (2.14) - Log::Dispatch ...loaded. (2.41) - Parallel::ForkManager ...loaded. (1.18) - MHA::NodeConst ...loaded. (0.57) *** Module::AutoInstall configuration finished. include inc/Module/Install/WriteAll.pm include inc/Module/Install/Win32.pm include inc/Module/Install/Can.pm include inc/Module/Install/Fetch.pm Checking if your kit is complete... Warning: the following files are missing in your kit: META.yml Please inform the author. Writing Makefile for mha4mysql::manager Writing MYMETA.yml and MYMETA.json Writing META.yml # make && make install cp lib/MHA/ManagerUtil.pm blib/lib/MHA/ManagerUtil.pm cp lib/MHA/Config.pm blib/lib/MHA/Config.pm cp lib/MHA/HealthCheck.pm blib/lib/MHA/HealthCheck.pm cp lib/MHA/ManagerConst.pm blib/lib/MHA/ManagerConst.pm cp lib/MHA/ServerManager.pm blib/lib/MHA/ServerManager.pm cp lib/MHA/FileStatus.pm blib/lib/MHA/FileStatus.pm cp lib/MHA/ManagerAdmin.pm blib/lib/MHA/ManagerAdmin.pm cp lib/MHA/ManagerAdminWrapper.pm blib/lib/MHA/ManagerAdminWrapper.pm cp lib/MHA/MasterFailover.pm blib/lib/MHA/MasterFailover.pm cp lib/MHA/MasterMonitor.pm blib/lib/MHA/MasterMonitor.pm cp lib/MHA/MasterRotate.pm blib/lib/MHA/MasterRotate.pm cp lib/MHA/SSHCheck.pm blib/lib/MHA/SSHCheck.pm cp lib/MHA/Server.pm blib/lib/MHA/Server.pm cp lib/MHA/DBHelper.pm blib/lib/MHA/DBHelper.pm cp bin/masterha_stop blib/script/masterha_stop /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_stop cp bin/masterha_conf_host blib/script/masterha_conf_host /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_conf_host cp bin/masterha_check_repl blib/script/masterha_check_repl /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_repl cp bin/masterha_check_status blib/script/masterha_check_status /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_status cp bin/masterha_master_monitor blib/script/masterha_master_monitor /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_monitor cp bin/masterha_check_ssh blib/script/masterha_check_ssh /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_ssh cp bin/masterha_master_switch blib/script/masterha_master_switch /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_switch cp bin/masterha_secondary_check blib/script/masterha_secondary_check /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_secondary_check cp bin/masterha_manager blib/script/masterha_manager /usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_manager Manifying blib/man1/masterha_stop.1 Manifying blib/man1/masterha_conf_host.1 Manifying blib/man1/masterha_check_repl.1 Manifying blib/man1/masterha_check_status.1 Manifying blib/man1/masterha_master_monitor.1 Manifying blib/man1/masterha_check_ssh.1 Manifying blib/man1/masterha_master_switch.1 Manifying blib/man1/masterha_secondary_check.1 Manifying blib/man1/masterha_manager.1 Installing /usr/local/share/perl5/MHA/ManagerUtil.pm Installing /usr/local/share/perl5/MHA/Config.pm Installing /usr/local/share/perl5/MHA/HealthCheck.pm Installing /usr/local/share/perl5/MHA/ManagerConst.pm Installing /usr/local/share/perl5/MHA/ServerManager.pm Installing /usr/local/share/perl5/MHA/FileStatus.pm Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm Installing /usr/local/share/perl5/MHA/MasterFailover.pm Installing /usr/local/share/perl5/MHA/MasterMonitor.pm Installing /usr/local/share/perl5/MHA/MasterRotate.pm Installing /usr/local/share/perl5/MHA/SSHCheck.pm Installing /usr/local/share/perl5/MHA/Server.pm Installing /usr/local/share/perl5/MHA/DBHelper.pm Installing /usr/local/share/man/man1/masterha_stop.1 Installing /usr/local/share/man/man1/masterha_conf_host.1 Installing /usr/local/share/man/man1/masterha_check_repl.1 Installing /usr/local/share/man/man1/masterha_check_status.1 Installing /usr/local/share/man/man1/masterha_master_monitor.1 Installing /usr/local/share/man/man1/masterha_check_ssh.1 Installing /usr/local/share/man/man1/masterha_master_switch.1 Installing /usr/local/share/man/man1/masterha_secondary_check.1 Installing /usr/local/share/man/man1/masterha_manager.1 Installing /usr/local/bin/masterha_stop Installing /usr/local/bin/masterha_conf_host Installing /usr/local/bin/masterha_check_repl Installing /usr/local/bin/masterha_check_status Installing /usr/local/bin/masterha_master_monitor Installing /usr/local/bin/masterha_check_ssh Installing /usr/local/bin/masterha_master_switch Installing /usr/local/bin/masterha_secondary_check Installing /usr/local/bin/masterha_manager Appending installation info to /usr/lib64/perl5/perllocal.pod
mkdir -p /etc/masterha mkdir -p /var/log/masterha/app1
cp /opt/mha4mysql-manager-master/samples/conf/app1.cnf /etc/masterha/ cd /etc/masterha/ #修改app1.cnf配置文件 vi app1.cnf [server default] #設置manager的工做目錄 manager_workdir=/var/log/masterha/app1 #設置manager的日誌 manager_log=/var/log/masterha/app1/manager.log #經過多種網絡路徑檢測ssh是否可以鏈接到master腳本,若其中一個路徑不通,則會經過另外一個路徑ssh鏈接到master secondary_check_script=/usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1 #故障轉移腳本 master_ip_failover_script=/usr/local/bin/master_ip_failover #master_ip_failover_script=/usr/bin/master_ip_failover #master_ip_online_change_script=/usr/bin/master_ip_online_change #shutdown_script=/usr/bin/power_manager #report_script=/usr/bin/send_report #設置監控的用戶 user=mha_rep #設置監控用戶的密碼 password=123456 #設置ssh的登陸用戶名 ssh_user=root #設置複製環境中的複製用戶名 repl_user=repl #設置複製用戶的密碼 repl_password=wanbin #設置監控主庫,發送ping包的時間間隔,默認是3秒,嘗試三次沒有迴應的時候自動進行railover ping_interval=1 [server1] hostname=192.168.56.100 #候選master candidate_master=1 master_binlog_dir=/data/mysql/mysql3306/logs [server2] hostname=192.168.56.200 #候選master candidate_master=1 master_binlog_dir=/data/mysql/mysql3306/logs [server3] hostname=192.168.56.210 #該服務器不會成爲new master no_master=1 master_binlog_dir=/data/mysql/mysql3306/logs
# cd /opt/mha4mysql-manager-master/samples/scripts/ # ls master_ip_failover master_ip_online_change power_manager send_report # cp master_ip_failover /usr/local/bin/ # 編輯master_ip_failover腳本 #!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '192.168.56.111/24'; my $key = '0'; my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip up"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; my $ssh_Bcast_arp = "/usr/bin/arping -c 3 -A 192.168.56.111"; #ARP回覆模式。要是不加則服務器會自動等到vip緩存失效,期間VIP會有必定時間的不可用。 GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); &start_arp(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub start_arp() { `ssh $ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`; } sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
在用虛擬IP的時候,須要在開啓MHA程序以前要把虛擬IP先設置到主上去,不然MHA是不會本身的去設置VIP,第一次設置VIP以後,後續腳本的故障轉移等功能會自動的對VIP進行切換。
# ifconfig eth0:0 192.168.56.111 up # /sbin/arping -c 3 -A 192.168.56.111 -I eth0 ARPING 192.168.56.111 from 192.168.56.111 eth0 Sent 3 probes (3 broadcast(s)) Received 0 response(s)
利用mha工具檢測ssh
# masterha_check_ssh --conf=/etc/masterha/app1.cnf Mon Oct 29 09:55:27 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 29 09:55:27 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Oct 29 09:55:27 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Oct 29 09:55:27 2018 - [info] Starting SSH connection tests.. Mon Oct 29 09:55:28 2018 - [debug] Mon Oct 29 09:55:27 2018 - [debug] Connecting via SSH from root@192.168.56.100(192.168.56.100:22) to root@192.168.56.200(192.168.56.200:22).. Mon Oct 29 09:55:27 2018 - [debug] ok. Mon Oct 29 09:55:27 2018 - [debug] Connecting via SSH from root@192.168.56.100(192.168.56.100:22) to root@192.168.56.210(192.168.56.210:22).. Mon Oct 29 09:55:27 2018 - [debug] ok. Mon Oct 29 09:55:28 2018 - [debug] Mon Oct 29 09:55:27 2018 - [debug] Connecting via SSH from root@192.168.56.200(192.168.56.200:22) to root@192.168.56.100(192.168.56.100:22).. Mon Oct 29 09:55:28 2018 - [debug] ok. Mon Oct 29 09:55:28 2018 - [debug] Connecting via SSH from root@192.168.56.200(192.168.56.200:22) to root@192.168.56.210(192.168.56.210:22).. Mon Oct 29 09:55:28 2018 - [debug] ok. Mon Oct 29 09:55:29 2018 - [debug] Mon Oct 29 09:55:28 2018 - [debug] Connecting via SSH from root@192.168.56.210(192.168.56.210:22) to root@192.168.56.100(192.168.56.100:22).. Mon Oct 29 09:55:28 2018 - [debug] ok. Mon Oct 29 09:55:28 2018 - [debug] Connecting via SSH from root@192.168.56.210(192.168.56.210:22) to root@192.168.56.200(192.168.56.200:22).. Mon Oct 29 09:55:29 2018 - [debug] ok. Mon Oct 29 09:55:29 2018 - [info] All SSH connection tests passed successfully.
# /usr/local/bin/masterha_check_repl --conf=/etc/masterha/app1.cnf Mon Oct 29 10:08:13 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 29 10:08:13 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Oct 29 10:08:13 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Oct 29 10:08:13 2018 - [info] MHA::MasterMonitor version 0.57. Mon Oct 29 10:08:14 2018 - [info] GTID failover mode = 1 Mon Oct 29 10:08:14 2018 - [info] Dead Servers: Mon Oct 29 10:08:14 2018 - [info] Alive Servers: Mon Oct 29 10:08:14 2018 - [info] 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:08:14 2018 - [info] 192.168.56.200(192.168.56.200:3306) Mon Oct 29 10:08:14 2018 - [info] 192.168.56.210(192.168.56.210:3306) Mon Oct 29 10:08:14 2018 - [info] Alive Slaves: Mon Oct 29 10:08:14 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:08:14 2018 - [info] GTID ON Mon Oct 29 10:08:14 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:08:14 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 10:08:14 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:08:14 2018 - [info] GTID ON Mon Oct 29 10:08:14 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:08:14 2018 - [info] Not candidate for the new Master (no_master is set) Mon Oct 29 10:08:14 2018 - [info] Current Alive Master: 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:08:14 2018 - [info] Checking slave configurations.. Mon Oct 29 10:08:14 2018 - [info] read_only=1 is not set on slave 192.168.56.200(192.168.56.200:3306). Mon Oct 29 10:08:14 2018 - [info] read_only=1 is not set on slave 192.168.56.210(192.168.56.210:3306). Mon Oct 29 10:08:14 2018 - [info] Checking replication filtering settings.. Mon Oct 29 10:08:14 2018 - [info] binlog_do_db= , binlog_ignore_db= Mon Oct 29 10:08:14 2018 - [info] Replication filtering check ok. Mon Oct 29 10:08:14 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Mon Oct 29 10:08:14 2018 - [info] Checking SSH publickey authentication settings on the current master.. Mon Oct 29 10:08:14 2018 - [info] HealthCheck: SSH to 192.168.56.100 is reachable. Mon Oct 29 10:08:14 2018 - [info] 192.168.56.100(192.168.56.100:3306) (current master) +--192.168.56.200(192.168.56.200:3306) +--192.168.56.210(192.168.56.210:3306) Mon Oct 29 10:08:14 2018 - [info] Checking replication health on 192.168.56.200.. Mon Oct 29 10:08:14 2018 - [info] ok. Mon Oct 29 10:08:14 2018 - [info] Checking replication health on 192.168.56.210.. Mon Oct 29 10:08:14 2018 - [info] ok. Mon Oct 29 10:08:14 2018 - [info] Checking master_ip_failover_script status: Mon Oct 29 10:08:14 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.56.111/24 up=== Checking the Status of the script.. OK Mon Oct 29 10:08:14 2018 - [info] OK. Mon Oct 29 10:08:14 2018 - [warning] shutdown_script is not defined. Mon Oct 29 10:08:14 2018 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
# masterha_manager --conf=/etc/masterha/app1.cnf > /var/log/masterha/app1/manager.log & # tail -100f /var/log/masterha/app1/manager.log Mon Oct 29 10:13:40 2018 - [info] MHA::MasterMonitor version 0.57. Mon Oct 29 10:13:41 2018 - [info] GTID failover mode = 1 Mon Oct 29 10:13:41 2018 - [info] Dead Servers: Mon Oct 29 10:13:41 2018 - [info] Alive Servers: Mon Oct 29 10:13:41 2018 - [info] 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:13:41 2018 - [info] 192.168.56.200(192.168.56.200:3306) Mon Oct 29 10:13:41 2018 - [info] 192.168.56.210(192.168.56.210:3306) Mon Oct 29 10:13:41 2018 - [info] Alive Slaves: Mon Oct 29 10:13:41 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:13:41 2018 - [info] GTID ON Mon Oct 29 10:13:41 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:13:41 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 10:13:41 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:13:41 2018 - [info] GTID ON Mon Oct 29 10:13:41 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:13:41 2018 - [info] Not candidate for the new Master (no_master is set) Mon Oct 29 10:13:41 2018 - [info] Current Alive Master: 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:13:41 2018 - [info] Checking slave configurations.. Mon Oct 29 10:13:41 2018 - [info] Checking replication filtering settings.. Mon Oct 29 10:13:41 2018 - [info] binlog_do_db= , binlog_ignore_db= Mon Oct 29 10:13:41 2018 - [info] Replication filtering check ok. Mon Oct 29 10:13:41 2018 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Mon Oct 29 10:13:41 2018 - [info] Checking SSH publickey authentication settings on the current master.. Mon Oct 29 10:13:42 2018 - [info] HealthCheck: SSH to 192.168.56.100 is reachable. Mon Oct 29 10:13:42 2018 - [info] 192.168.56.100(192.168.56.100:3306) (current master) +--192.168.56.200(192.168.56.200:3306) +--192.168.56.210(192.168.56.210:3306) Mon Oct 29 10:13:42 2018 - [info] Checking master_ip_failover_script status: Mon Oct 29 10:13:42 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.56.111/24 up=== Checking the Status of the script.. OK Mon Oct 29 10:13:42 2018 - [info] OK. Mon Oct 29 10:13:42 2018 - [warning] shutdown_script is not defined. Mon Oct 29 10:13:42 2018 - [info] Set master ping interval 1 seconds. Mon Oct 29 10:13:42 2018 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1 Mon Oct 29 10:13:42 2018 - [info] Starting ping health check on 192.168.56.100(192.168.56.100:3306).. Mon Oct 29 10:13:42 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. #查看狀態 # /usr/local/bin/masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:3579) is running(0:PING_OK), master:192.168.56.100
(1)查看VIP漂移和MySQL自動切換狀況
#kill mysqldb1上的mysqld進程 # ps -ef|grep mysqld avahi 741 1 0 08:50 ? 00:00:00 avahi-daemon: running [mysqldb1.local] mysql 2349 2271 0 09:08 pts/0 00:00:29 mysqld --defaults-file=/etc/my3306.cnf root 2986 2657 0 09:45 pts/1 00:00:00 ssh mysqldb1 root 4285 4228 0 10:36 pts/3 00:00:00 grep --color=auto mysqld # kill -9 2349
(2)在mysqldb2上觀察manager日誌:
# tail -f /var/log/masterha/app1/manager.log IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.56.111/24 up=== Checking the Status of the script.. OK Mon Oct 29 10:13:42 2018 - [info] OK. Mon Oct 29 10:13:42 2018 - [warning] shutdown_script is not defined. Mon Oct 29 10:13:42 2018 - [info] Set master ping interval 1 seconds. Mon Oct 29 10:13:42 2018 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1 Mon Oct 29 10:13:42 2018 - [info] Starting ping health check on 192.168.56.100(192.168.56.100:3306).. Mon Oct 29 10:13:42 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. Mon Oct 29 10:37:56 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Mon Oct 29 10:37:56 2018 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s mysqldb2 -s mysqldb1 --user=root --master_host=192.168.56.100 --master_ip=192.168.56.100 --master_port=3306 --master_user=mha_rep --master_password=123456 --ping_type=SELECT Mon Oct 29 10:37:56 2018 - [info] Executing SSH check script: exit 0 Mon Oct 29 10:37:56 2018 - [info] HealthCheck: SSH to 192.168.56.100 is reachable. Monitoring server mysqldb2 is reachable, Master is not reachable from mysqldb2. OK. Monitoring server mysqldb1 is reachable, Master is not reachable from mysqldb1. OK. Mon Oct 29 10:37:57 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start. Mon Oct 29 10:37:57 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.56.100' (111)) Mon Oct 29 10:37:57 2018 - [warning] Connection failed 2 time(s).. Mon Oct 29 10:37:58 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.56.100' (111)) Mon Oct 29 10:37:58 2018 - [warning] Connection failed 3 time(s).. Mon Oct 29 10:37:59 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.56.100' (111)) Mon Oct 29 10:37:59 2018 - [warning] Connection failed 4 time(s).. Mon Oct 29 10:37:59 2018 - [warning] Master is not reachable from health checker! Mon Oct 29 10:37:59 2018 - [warning] Master 192.168.56.100(192.168.56.100:3306) is not reachable! Mon Oct 29 10:37:59 2018 - [warning] SSH is reachable. Mon Oct 29 10:37:59 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Mon Oct 29 10:37:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 29 10:37:59 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Oct 29 10:37:59 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Oct 29 10:38:00 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 29 10:38:00 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Oct 29 10:38:00 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Oct 29 10:38:00 2018 - [info] GTID failover mode = 1 Mon Oct 29 10:38:00 2018 - [info] Dead Servers: Mon Oct 29 10:38:00 2018 - [info] 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:00 2018 - [info] Alive Servers: Mon Oct 29 10:38:00 2018 - [info] 192.168.56.200(192.168.56.200:3306) Mon Oct 29 10:38:00 2018 - [info] 192.168.56.210(192.168.56.210:3306) Mon Oct 29 10:38:00 2018 - [info] Alive Slaves: Mon Oct 29 10:38:00 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:00 2018 - [info] GTID ON Mon Oct 29 10:38:00 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:00 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 10:38:00 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:00 2018 - [info] GTID ON Mon Oct 29 10:38:00 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:00 2018 - [info] Not candidate for the new Master (no_master is set) Mon Oct 29 10:38:00 2018 - [info] Checking slave configurations.. Mon Oct 29 10:38:00 2018 - [info] Checking replication filtering settings.. Mon Oct 29 10:38:00 2018 - [info] Replication filtering check ok. Mon Oct 29 10:38:00 2018 - [info] Master is down! Mon Oct 29 10:38:00 2018 - [info] Terminating monitoring script. Mon Oct 29 10:38:00 2018 - [info] Got exit code 20 (Master dead). Mon Oct 29 10:38:00 2018 - [info] MHA::MasterFailover version 0.57. Mon Oct 29 10:38:00 2018 - [info] Starting master failover. Mon Oct 29 10:38:00 2018 - [info] Mon Oct 29 10:38:00 2018 - [info] * Phase 1: Configuration Check Phase.. Mon Oct 29 10:38:00 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] GTID failover mode = 1 Mon Oct 29 10:38:01 2018 - [info] Dead Servers: Mon Oct 29 10:38:01 2018 - [info] 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Checking master reachability via MySQL(double check)... Mon Oct 29 10:38:01 2018 - [info] ok. Mon Oct 29 10:38:01 2018 - [info] Alive Servers: Mon Oct 29 10:38:01 2018 - [info] 192.168.56.200(192.168.56.200:3306) Mon Oct 29 10:38:01 2018 - [info] 192.168.56.210(192.168.56.210:3306) Mon Oct 29 10:38:01 2018 - [info] Alive Slaves: Mon Oct 29 10:38:01 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 10:38:01 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Not candidate for the new Master (no_master is set) Mon Oct 29 10:38:01 2018 - [info] Starting GTID based failover. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] ** Phase 1: Configuration Check Phase completed. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] Forcing shutdown so that applications never connect to the current master.. Mon Oct 29 10:38:01 2018 - [info] Executing master IP deactivation script: Mon Oct 29 10:38:01 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.56.111/24 up=== Disabling the VIP on old master: 192.168.56.100 Mon Oct 29 10:38:01 2018 - [info] done. Mon Oct 29 10:38:01 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Mon Oct 29 10:38:01 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] * Phase 3: Master Recovery Phase.. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] The latest binary log file/position on all slaves is my3306_binlog.000025:194 Mon Oct 29 10:38:01 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): Mon Oct 29 10:38:01 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 10:38:01 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Not candidate for the new Master (no_master is set) Mon Oct 29 10:38:01 2018 - [info] The oldest binary log file/position on all slaves is my3306_binlog.000025:194 Mon Oct 29 10:38:01 2018 - [info] Oldest slaves: Mon Oct 29 10:38:01 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 10:38:01 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Not candidate for the new Master (no_master is set) Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] * Phase 3.3: Determining New Master Phase.. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] Searching new master from slaves.. Mon Oct 29 10:38:01 2018 - [info] Candidate masters from the configuration file: Mon Oct 29 10:38:01 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 10:38:01 2018 - [info] Non-candidate masters: Mon Oct 29 10:38:01 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 10:38:01 2018 - [info] GTID ON Mon Oct 29 10:38:01 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 10:38:01 2018 - [info] Not candidate for the new Master (no_master is set) Mon Oct 29 10:38:01 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Mon Oct 29 10:38:01 2018 - [info] New master is 192.168.56.200(192.168.56.200:3306) Mon Oct 29 10:38:01 2018 - [info] Starting master failover.. Mon Oct 29 10:38:01 2018 - [info] From: 192.168.56.100(192.168.56.100:3306) (current master) +--192.168.56.200(192.168.56.200:3306) +--192.168.56.210(192.168.56.210:3306) To: 192.168.56.200(192.168.56.200:3306) (new master) +--192.168.56.210(192.168.56.210:3306) Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] * Phase 3.3: New Master Recovery Phase.. Mon Oct 29 10:38:01 2018 - [info] Mon Oct 29 10:38:01 2018 - [info] Waiting all logs to be applied.. Mon Oct 29 10:38:01 2018 - [info] done. Mon Oct 29 10:38:01 2018 - [info] Getting new master's binlog name and position.. Mon Oct 29 10:38:01 2018 - [info] my3306_binlog.000015:210 Mon Oct 29 10:38:01 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.56.200', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon Oct 29 10:38:01 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: my3306_binlog.000015, 210, 7390a401-b705-11e8-9ed9-080027b0b461:1-140922 Mon Oct 29 10:38:01 2018 - [info] Executing master IP activate script: Mon Oct 29 10:38:01 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port=3306 --new_master_host=192.168.56.200 --new_master_ip=192.168.56.200 --new_master_port=3306 --new_master_user='mha_rep' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.56.111/24 up=== Enabling the VIP - 192.168.56.111/24 on the new master - 192.168.56.200 bash: /usr/bin/arping: No such file or directory Mon Oct 29 10:38:02 2018 - [info] OK. Mon Oct 29 10:38:02 2018 - [info] Setting read_only=0 on 192.168.56.200(192.168.56.200:3306).. Mon Oct 29 10:38:02 2018 - [info] ok. Mon Oct 29 10:38:02 2018 - [info] ** Finished master recovery successfully. Mon Oct 29 10:38:02 2018 - [info] * Phase 3: Master Recovery Phase completed. Mon Oct 29 10:38:02 2018 - [info] Mon Oct 29 10:38:02 2018 - [info] * Phase 4: Slaves Recovery Phase.. Mon Oct 29 10:38:02 2018 - [info] Mon Oct 29 10:38:02 2018 - [info] Mon Oct 29 10:38:02 2018 - [info] * Phase 4.1: Starting Slaves in parallel.. Mon Oct 29 10:38:02 2018 - [info] Mon Oct 29 10:38:02 2018 - [info] -- Slave recovery on host 192.168.56.210(192.168.56.210:3306) started, pid: 5507. Check tmp log /var/log/masterha/app1/192.168.56.210_3306_20181029103800.log if it takes time.. Mon Oct 29 10:38:03 2018 - [info] Mon Oct 29 10:38:03 2018 - [info] Log messages from 192.168.56.210 ... Mon Oct 29 10:38:03 2018 - [info] Mon Oct 29 10:38:02 2018 - [info] Resetting slave 192.168.56.210(192.168.56.210:3306) and starting replication from the new master 192.168.56.200(192.168.56.200:3306).. Mon Oct 29 10:38:02 2018 - [info] Executed CHANGE MASTER. Mon Oct 29 10:38:02 2018 - [info] Slave started. Mon Oct 29 10:38:02 2018 - [info] gtid_wait(7390a401-b705-11e8-9ed9-080027b0b461:1-140922) completed on 192.168.56.210(192.168.56.210:3306). Executed 0 events. Mon Oct 29 10:38:03 2018 - [info] End of log messages from 192.168.56.210. Mon Oct 29 10:38:03 2018 - [info] -- Slave on host 192.168.56.210(192.168.56.210:3306) started. Mon Oct 29 10:38:03 2018 - [info] All new slave servers recovered successfully. Mon Oct 29 10:38:03 2018 - [info] Mon Oct 29 10:38:03 2018 - [info] * Phase 5: New master cleanup phase.. Mon Oct 29 10:38:03 2018 - [info] Mon Oct 29 10:38:03 2018 - [info] Resetting slave info on the new master.. Mon Oct 29 10:38:03 2018 - [info] 192.168.56.200: Resetting slave info succeeded. Mon Oct 29 10:38:03 2018 - [info] Master failover to 192.168.56.200(192.168.56.200:3306) completed successfully. Mon Oct 29 10:38:03 2018 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.56.100(192.168.56.100:3306) to 192.168.56.200(192.168.56.200:3306) succeeded Master 192.168.56.100(192.168.56.100:3306) is down! Check MHA Manager logs at mysqldb2:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 192.168.56.100(192.168.56.100:3306) Selected 192.168.56.200(192.168.56.200:3306) as a new master. 192.168.56.200(192.168.56.200:3306): OK: Applying all logs succeeded. 192.168.56.200(192.168.56.200:3306): OK: Activated master IP address. 192.168.56.210(192.168.56.210:3306): OK: Slave started, replicating from 192.168.56.200(192.168.56.200:3306) 192.168.56.200(192.168.56.200:3306): Resetting slave info succeeded. Master failover to 192.168.56.200(192.168.56.200:3306) completed successfully.
(3)查看vip是否漂移到mysqldb2上
# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.56.200 netmask 255.255.255.0 broadcast 192.168.56.255 ... eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.56.111 netmask 255.255.255.0 broadcast 192.168.56.255 ether 08:00:27:5b:8a:9a txqueuelen 1000 (Ethernet) ...
一旦發生MHA切換,管理進程(Manager)將會退出,沒法進行再次測試,需將故障數據庫解決掉以後,從新change加入到MHA環境中來,而且要保證app1.failover.complete
不存在或則加上--ignore_last_failover
參數忽略,才能再次開啓管理進程。
# mysqld --defaults-file=/etc/my3306.cnf & mysql> CHANGE MASTER TO master_host='192.168.56.200', master_port=3306, master_user='repl', master_password='wanbin', master_auto_position=1; mysql> start slave;
(1)查看mha狀態
在手動切換的同時須要保證沒有啓用MHA自動切換功能。
# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
(2) 手工切換
# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.56.100 --orig_master_is_new_slave #參數解釋 --master_state:表明當前主庫的狀態爲alive --new_master_host:表明切換後新主庫爲192.168.56.100 --orig_master_is_new_slave:將原來的主庫變動爲slave節點。 Mon Oct 29 13:38:09 2018 - [info] MHA::MasterRotate version 0.57. Mon Oct 29 13:38:09 2018 - [info] Starting online master switch.. Mon Oct 29 13:38:09 2018 - [info] Mon Oct 29 13:38:09 2018 - [info] * Phase 1: Configuration Check Phase.. Mon Oct 29 13:38:09 2018 - [info] Mon Oct 29 13:38:09 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 29 13:38:09 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Oct 29 13:38:09 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Oct 29 13:38:10 2018 - [info] GTID failover mode = 1 Mon Oct 29 13:38:10 2018 - [info] Current Alive Master: 192.168.56.200(192.168.56.200:3306) Mon Oct 29 13:38:10 2018 - [info] Alive Slaves: Mon Oct 29 13:38:10 2018 - [info] 192.168.56.100(192.168.56.100:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 13:38:10 2018 - [info] GTID ON Mon Oct 29 13:38:10 2018 - [info] Replicating from 192.168.56.200(192.168.56.200:3306) Mon Oct 29 13:38:10 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 13:38:10 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 13:38:10 2018 - [info] GTID ON Mon Oct 29 13:38:10 2018 - [info] Replicating from 192.168.56.200(192.168.56.200:3306) Mon Oct 29 13:38:10 2018 - [info] Not candidate for the new Master (no_master is set) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.56.200(192.168.56.200:3306)? (YES/no): YES Mon Oct 29 13:38:19 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Mon Oct 29 13:38:19 2018 - [info] ok. Mon Oct 29 13:38:19 2018 - [info] Checking MHA is not monitoring or doing failover.. Mon Oct 29 13:38:19 2018 - [info] Checking replication health on 192.168.56.100.. Mon Oct 29 13:38:19 2018 - [info] ok. Mon Oct 29 13:38:19 2018 - [info] Checking replication health on 192.168.56.210.. Mon Oct 29 13:38:19 2018 - [info] ok. Mon Oct 29 13:38:19 2018 - [info] 192.168.56.100 can be new master. Mon Oct 29 13:38:19 2018 - [info] From: 192.168.56.200(192.168.56.200:3306) (current master) +--192.168.56.100(192.168.56.100:3306) +--192.168.56.210(192.168.56.210:3306) To: 192.168.56.100(192.168.56.100:3306) (new master) +--192.168.56.210(192.168.56.210:3306) +--192.168.56.200(192.168.56.200:3306) Starting master switch from 192.168.56.200(192.168.56.200:3306) to 192.168.56.100(192.168.56.100:3306)? (yes/NO): yes Mon Oct 29 13:38:28 2018 - [info] Checking whether 192.168.56.100(192.168.56.100:3306) is ok for the new master.. Mon Oct 29 13:38:28 2018 - [info] ok. Mon Oct 29 13:38:28 2018 - [info] 192.168.56.200(192.168.56.200:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Mon Oct 29 13:38:28 2018 - [info] 192.168.56.200(192.168.56.200:3306): Resetting slave pointing to the dummy host. Mon Oct 29 13:38:28 2018 - [info] ** Phase 1: Configuration Check Phase completed. Mon Oct 29 13:38:28 2018 - [info] Mon Oct 29 13:38:28 2018 - [info] * Phase 2: Rejecting updates Phase.. Mon Oct 29 13:38:28 2018 - [info] master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes Mon Oct 29 13:38:31 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Mon Oct 29 13:38:31 2018 - [info] Executing FLUSH TABLES WITH READ LOCK.. Mon Oct 29 13:38:31 2018 - [info] ok. Mon Oct 29 13:38:31 2018 - [info] Orig master binlog:pos is my3306_binlog.000016:250. Mon Oct 29 13:38:31 2018 - [info] Waiting to execute all relay logs on 192.168.56.100(192.168.56.100:3306).. Mon Oct 29 13:38:31 2018 - [info] master_pos_wait(my3306_binlog.000016:250) completed on 192.168.56.100(192.168.56.100:3306). Executed 0 events. Mon Oct 29 13:38:31 2018 - [info] done. Mon Oct 29 13:38:31 2018 - [info] Getting new master's binlog name and position.. Mon Oct 29 13:38:31 2018 - [info] my3306_binlog.000026:598 Mon Oct 29 13:38:31 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.56.100', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon Oct 29 13:38:31 2018 - [info] Mon Oct 29 13:38:31 2018 - [info] * Switching slaves in parallel.. Mon Oct 29 13:38:31 2018 - [info] Mon Oct 29 13:38:31 2018 - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:3306) started, pid: 12548 Mon Oct 29 13:38:31 2018 - [info] Mon Oct 29 13:38:32 2018 - [info] Log messages from 192.168.56.210 ... Mon Oct 29 13:38:32 2018 - [info] Mon Oct 29 13:38:31 2018 - [info] Waiting to execute all relay logs on 192.168.56.210(192.168.56.210:3306).. Mon Oct 29 13:38:31 2018 - [info] master_pos_wait(my3306_binlog.000016:250) completed on 192.168.56.210(192.168.56.210:3306). Executed 0 events. Mon Oct 29 13:38:31 2018 - [info] done. Mon Oct 29 13:38:31 2018 - [info] Resetting slave 192.168.56.210(192.168.56.210:3306) and starting replication from the new master 192.168.56.100(192.168.56.100:3306).. Mon Oct 29 13:38:31 2018 - [info] Executed CHANGE MASTER. Mon Oct 29 13:38:31 2018 - [info] Slave started. Mon Oct 29 13:38:32 2018 - [info] End of log messages from 192.168.56.210 ... Mon Oct 29 13:38:32 2018 - [info] Mon Oct 29 13:38:32 2018 - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:3306) succeeded. Mon Oct 29 13:38:32 2018 - [info] Unlocking all tables on the orig master: Mon Oct 29 13:38:32 2018 - [info] Executing UNLOCK TABLES.. Mon Oct 29 13:38:32 2018 - [info] ok. Mon Oct 29 13:38:32 2018 - [info] Starting orig master as a new slave.. Mon Oct 29 13:38:32 2018 - [info] Resetting slave 192.168.56.200(192.168.56.200:3306) and starting replication from the new master 192.168.56.100(192.168.56.100:3306).. Mon Oct 29 13:38:32 2018 - [info] Executed CHANGE MASTER. Mon Oct 29 13:38:33 2018 - [info] Slave started. Mon Oct 29 13:38:33 2018 - [info] All new slave servers switched successfully. Mon Oct 29 13:38:33 2018 - [info] Mon Oct 29 13:38:33 2018 - [info] * Phase 5: New master cleanup phase.. Mon Oct 29 13:38:33 2018 - [info] Mon Oct 29 13:38:33 2018 - [info] 192.168.56.100: Resetting slave info succeeded. Mon Oct 29 13:38:33 2018 - [info] Switching master to 192.168.56.100(192.168.56.100:3306) completed successfully.
手工切換後,vip沒有自動漂移到mysqldb1上,由於還沒設置master_ip_online_change
腳本。
(3)編輯master_ip_online_change
腳本
# cp /opt/mha4mysql-manager-master/samples/scripts/master_ip_online_change /usr/local/bin/master_ip_online_change # vi /usr/local/bin/master_ip_online_change #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper; my $_tstart; my $_running_interval = 0.1; my ( $command, $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user, ); my $vip = '192.168.56.111/24'; my $key = '0'; my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip up"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; my $ssh_Bcast_arp = "/usr/bin/arping -c 3 -A 192.168.56.111"; GetOptions( 'command=s' => \$command, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_password=s' => \$orig_master_password, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, 'new_master_ssh_user=s' => \$new_master_ssh_user, ); exit &main(); sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return $curdate . " " . sprintf( "%06d", $microsec ); } sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } } sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads; my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute(); while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 ); if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); } if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); } push @threads, $ref; } return @threads; } sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect(); # Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 ); ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand #$orig_master_handler->disable_log_bin_local(); #print current_time_us() . " Drpping app user on the orig master..\n"; #FIXME_xxx_drop_app_user($orig_master_handler); ## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } ## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); ## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; #$orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect(); ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery. # If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master #$new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; #FIXME_xxx_create_app_user($new_master_handler); #$new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub start_arp() { `ssh $new_master_ssh_user\@$new_master_host \" $ssh_Bcast_arp \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n"; die; }
(4)再次切換,觀察vip是否遷移到mysqldb2上
# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.56.200 --orig_master_is_new_slave Mon Oct 29 14:41:52 2018 - [info] MHA::MasterRotate version 0.57. Mon Oct 29 14:41:52 2018 - [info] Starting online master switch.. Mon Oct 29 14:41:52 2018 - [info] Mon Oct 29 14:41:52 2018 - [info] * Phase 1: Configuration Check Phase.. Mon Oct 29 14:41:52 2018 - [info] Mon Oct 29 14:41:52 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Oct 29 14:41:52 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Oct 29 14:41:52 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Oct 29 14:41:54 2018 - [info] GTID failover mode = 1 Mon Oct 29 14:41:54 2018 - [info] Current Alive Master: 192.168.56.100(192.168.56.100:3306) Mon Oct 29 14:41:54 2018 - [info] Alive Slaves: Mon Oct 29 14:41:54 2018 - [info] 192.168.56.200(192.168.56.200:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 14:41:54 2018 - [info] GTID ON Mon Oct 29 14:41:54 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 14:41:54 2018 - [info] Primary candidate for the new Master (candidate_master is set) Mon Oct 29 14:41:54 2018 - [info] 192.168.56.210(192.168.56.210:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled Mon Oct 29 14:41:54 2018 - [info] GTID ON Mon Oct 29 14:41:54 2018 - [info] Replicating from 192.168.56.100(192.168.56.100:3306) Mon Oct 29 14:41:54 2018 - [info] Not candidate for the new Master (no_master is set) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.56.100(192.168.56.100:3306)? (YES/no): yes Mon Oct 29 14:41:55 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Mon Oct 29 14:41:55 2018 - [info] ok. Mon Oct 29 14:41:55 2018 - [info] Checking MHA is not monitoring or doing failover.. Mon Oct 29 14:41:55 2018 - [info] Checking replication health on 192.168.56.200.. Mon Oct 29 14:41:55 2018 - [info] ok. Mon Oct 29 14:41:55 2018 - [info] Checking replication health on 192.168.56.210.. Mon Oct 29 14:41:55 2018 - [info] ok. Mon Oct 29 14:41:55 2018 - [info] 192.168.56.200 can be new master. Mon Oct 29 14:41:55 2018 - [info] From: 192.168.56.100(192.168.56.100:3306) (current master) +--192.168.56.200(192.168.56.200:3306) +--192.168.56.210(192.168.56.210:3306) To: 192.168.56.200(192.168.56.200:3306) (new master) +--192.168.56.210(192.168.56.210:3306) +--192.168.56.100(192.168.56.100:3306) Starting master switch from 192.168.56.100(192.168.56.100:3306) to 192.168.56.200(192.168.56.200:3306)? (yes/NO): yes Mon Oct 29 14:41:57 2018 - [info] Checking whether 192.168.56.200(192.168.56.200:3306) is ok for the new master.. Mon Oct 29 14:41:57 2018 - [info] ok. Mon Oct 29 14:41:57 2018 - [info] 192.168.56.100(192.168.56.100:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Mon Oct 29 14:41:57 2018 - [info] 192.168.56.100(192.168.56.100:3306): Resetting slave pointing to the dummy host. Mon Oct 29 14:41:57 2018 - [info] ** Phase 1: Configuration Check Phase completed. Mon Oct 29 14:41:57 2018 - [info] Mon Oct 29 14:41:57 2018 - [info] * Phase 2: Rejecting updates Phase.. Mon Oct 29 14:41:57 2018 - [info] Mon Oct 29 14:41:57 2018 - [info] Executing master ip online change script to disable write on the current master: Mon Oct 29 14:41:57 2018 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port=3306 --orig_master_user='mha_rep' --new_master_host=192.168.56.200 --new_master_ip=192.168.56.200 --new_master_port=3306 --new_master_user='mha_rep' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx Mon Oct 29 14:41:57 2018 500045 Set read_only on the new master.. ok. Mon Oct 29 14:41:57 2018 516171 Waiting all running 2 threads are disconnected.. (max 1500 milliseconds) {'Time' => '41','db' => undef,'Id' => '56','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'} {'Time' => '40','db' => undef,'Id' => '57','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'} Mon Oct 29 14:41:58 2018 025934 Waiting all running 2 threads are disconnected.. (max 1000 milliseconds) {'Time' => '42','db' => undef,'Id' => '56','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'} {'Time' => '41','db' => undef,'Id' => '57','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'} Mon Oct 29 14:41:58 2018 530397 Waiting all running 2 threads are disconnected.. (max 500 milliseconds) {'Time' => '42','db' => undef,'Id' => '56','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'} {'Time' => '41','db' => undef,'Id' => '57','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'} Mon Oct 29 14:41:59 2018 032085 Set read_only=1 on the orig master.. ok. Mon Oct 29 14:41:59 2018 041306 Waiting all running 2 queries are disconnected.. (max 500 milliseconds) {'Time' => '43','db' => undef,'Id' => '56','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb3:47840'} {'Time' => '42','db' => undef,'Id' => '57','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for more updates','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => 'mysqldb2:54496'} Disabling the VIP on old master: 192.168.56.100 Mon Oct 29 14:41:59 2018 852808 Killing all application threads.. Mon Oct 29 14:41:59 2018 869546 done. Mon Oct 29 14:41:59 2018 - [info] ok. Mon Oct 29 14:41:59 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Mon Oct 29 14:41:59 2018 - [info] Executing FLUSH TABLES WITH READ LOCK.. Mon Oct 29 14:41:59 2018 - [info] ok. Mon Oct 29 14:41:59 2018 - [info] Orig master binlog:pos is my3306_binlog.000026:598. Mon Oct 29 14:41:59 2018 - [info] Waiting to execute all relay logs on 192.168.56.200(192.168.56.200:3306).. Mon Oct 29 14:41:59 2018 - [info] master_pos_wait(my3306_binlog.000026:598) completed on 192.168.56.200(192.168.56.200:3306). Executed 0 events. Mon Oct 29 14:41:59 2018 - [info] done. Mon Oct 29 14:41:59 2018 - [info] Getting new master's binlog name and position.. Mon Oct 29 14:41:59 2018 - [info] my3306_binlog.000016:250 Mon Oct 29 14:41:59 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.56.200', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon Oct 29 14:41:59 2018 - [info] Executing master ip online change script to allow write on the new master: Mon Oct 29 14:41:59 2018 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.56.100 --orig_master_ip=192.168.56.100 --orig_master_port=3306 --orig_master_user='mha_rep' --new_master_host=192.168.56.200 --new_master_ip=192.168.56.200 --new_master_port=3306 --new_master_user='mha_rep' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx Mon Oct 29 14:42:00 2018 046752 Set read_only=0 on the new master. Enabling the VIP - 192.168.56.111/24 on the new master - 192.168.56.200 Mon Oct 29 14:42:00 2018 - [info] ok. Mon Oct 29 14:42:00 2018 - [info] Mon Oct 29 14:42:00 2018 - [info] * Switching slaves in parallel.. Mon Oct 29 14:42:00 2018 - [info] Mon Oct 29 14:42:00 2018 - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:3306) started, pid: 13737 Mon Oct 29 14:42:00 2018 - [info] Mon Oct 29 14:42:01 2018 - [info] Log messages from 192.168.56.210 ... Mon Oct 29 14:42:01 2018 - [info] Mon Oct 29 14:42:00 2018 - [info] Waiting to execute all relay logs on 192.168.56.210(192.168.56.210:3306).. Mon Oct 29 14:42:00 2018 - [info] master_pos_wait(my3306_binlog.000026:598) completed on 192.168.56.210(192.168.56.210:3306). Executed 0 events. Mon Oct 29 14:42:00 2018 - [info] done. Mon Oct 29 14:42:00 2018 - [info] Resetting slave 192.168.56.210(192.168.56.210:3306) and starting replication from the new master 192.168.56.200(192.168.56.200:3306).. Mon Oct 29 14:42:00 2018 - [info] Executed CHANGE MASTER. Mon Oct 29 14:42:00 2018 - [info] Slave started. Mon Oct 29 14:42:01 2018 - [info] End of log messages from 192.168.56.210 ... Mon Oct 29 14:42:01 2018 - [info] Mon Oct 29 14:42:01 2018 - [info] -- Slave switch on host 192.168.56.210(192.168.56.210:3306) succeeded. Mon Oct 29 14:42:01 2018 - [info] Unlocking all tables on the orig master: Mon Oct 29 14:42:01 2018 - [info] Executing UNLOCK TABLES.. Mon Oct 29 14:42:01 2018 - [info] ok. Mon Oct 29 14:42:01 2018 - [info] Starting orig master as a new slave.. Mon Oct 29 14:42:01 2018 - [info] Resetting slave 192.168.56.100(192.168.56.100:3306) and starting replication from the new master 192.168.56.200(192.168.56.200:3306).. Mon Oct 29 14:42:01 2018 - [info] Executed CHANGE MASTER. Mon Oct 29 14:42:01 2018 - [info] Slave started. Mon Oct 29 14:42:01 2018 - [info] All new slave servers switched successfully. Mon Oct 29 14:42:01 2018 - [info] Mon Oct 29 14:42:01 2018 - [info] * Phase 5: New master cleanup phase.. Mon Oct 29 14:42:01 2018 - [info] Mon Oct 29 14:42:01 2018 - [info] 192.168.56.200: Resetting slave info succeeded. Mon Oct 29 14:42:01 2018 - [info] Switching master to 192.168.56.200(192.168.56.200:3306) completed successfully. # ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.56.200 netmask 255.255.255.0 broadcast 192.168.56.255 ... eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.56.111 netmask 255.255.255.0 broadcast 192.168.56.255 ether 08:00:27:5b:8a:9a txqueuelen 1000 (Ethernet) ..