MySQL 高可用架構之 MHA (Centos 7.5 MySQL 5.7.18 MHA 0.58)

時間 2019-11-08

標籤 mysql 可用架構 mha centos 7.5 5.7.18 0.58 欄目 MySQL 简体版

原文原文鏈接

目錄node

簡介

MHA（Master High Availability）目前在MySQL高可用方面是一個相對成熟的解決方案，它由日本DeNA公司youshimaton（現就任於Facebook公司）開發，是一套優秀的做爲MySQL高可用性環境下故障切換和主從提高的高可用軟件。mysql

在MySQL故障切換過程當中，MHA能作到在0~30秒以內自動完成數據庫的故障切換操做，而且在進行故障切換的過程當中，MHA能在最大程度上保證數據的一致性，以達到真正意義上的高可用。linux

該軟件由兩部分組成：git

MHA Manager（管理節點）
MHA Node（數據節點）

MHA Manager能夠單獨部署在一臺獨立的機器上管理多個master-slave集羣，也能夠部署在一臺slave節點上。github

MHA Node運行在每臺MySQL服務器上，MHA Manager會定時探測集羣中的master節點，當master出現故障時，它能夠自動將最新數據的slave提高爲新的master，而後將全部其餘的slave從新指向新的master。sql

整個故障轉移過程對應用程序徹底透明。數據庫

能夠將MHA工做原理總結爲以下vim

從宕機崩潰的master保存二進制日誌事件（binlog events）
識別含有最新更新的slave
應用差別的中繼日誌（relay log）到其餘的slave；
應用從master保存的二進制日誌事件（binlog events）；
提高一個slave爲新的master；
使其餘的slave鏈接新的master進行復制；

Manager工具包centos

組件名稱	組件說明
masterha_check_ssh	檢查MHA的SSH配置情況
masterha_check_repl	檢查MySQL複製情況
masterha_manger	啓動MHA
masterha_check_status	檢測當前MHA運行狀態
masterha_master_monitor	檢測master是否宕機
masterha_master_switch	控制故障轉移（自動或者手動）
masterha_conf_host	添加或刪除配置的server信息

Node工具包api

這些工具一般由MHA Manager的腳本觸發，無需人爲操做

組件名稱	組件說明
save_binary_logs	保存和複製master的二進制日誌
apply_diff_relay_logs	識別差別的中繼日誌事件並將其差別的事件應用於其餘的slave
filter_mysqlbinlog	去除沒必要要的ROLLBACK事件（MHA已再也不使用這個工具）
purge_relay_logs	清除中繼日誌（不會阻塞SQL線程）

注意：

爲了儘量的減小主庫硬件損壞宕機形成的數據丟失，所以在配置MHA的同時建議配置成MySQL 5.5的半同步複製。關於半同步複製原理各位本身進行查閱。（不是必須）

環境準備

操做系統	內核版本	主機名	MySQL 版本	ip地址	角色
centos 7.5	5.1.3-1.el7	manager.mha	MySQL 5.7.18	10.0.20.200	Manager
centos 7.5	5.1.3-1.el7	node01.mha	MySQL 5.7.18	10.0.20.201	node01 mysql-master
centos 7.5	5.1.3-1.el7	node02.mha	MySQL 5.7.18	10.0.20.202	node02 mysql-slave
centos 7.5	5.1.3-1.el7	node03.mha	MySQL 5.7.18	10.0.20.203	node03 mysql-slave
centos 7.5	5.1.3-1.el7	node04.mha	MySQL 5.7.18	10.0.20.204	node04 mysql-slave

MHA Manager 版本	GitHub下載地址	百度網盤下載地址
v0.58	GitHub下載地址	百度網盤地址提取碼：lzb0

MHA Node 版本	GitHub下載地址	百度網盤下載地址
v0.58	GitHub下載地址	百度網盤地址提取碼：4e6h

祕鑰互信

配置全部機器相互之間root用戶祕鑰互信

在全部機器上執行:

生成密鑰對

ssh-keygen -t dsa -f ~/.ssh/id_rsa -P ""

推送公鑰

ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.200
ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.201
ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.202
ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.203
ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.204

此時全部的機器之間以完成互信，無需密碼等便可ssh登錄

安裝基礎依賴包

在全部機器上執行:

yum install -y perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

安裝MHA組件

安裝 MHA Node組件

在全部節點上執行

[root@node01 ~]# cd /opt/soft
[root@node01 soft]# ll
total 639152
-rw-r--r-- 1 root root     56220 Jun 12 17:59 mha4mysql-node-0.58.tar.gz
-rw-r--r-- 1 root root 654430368 Jun 11 11:21 mysql-5.7.18-linux-glibc2.5-x86_64.tar.gz

解壓安裝

具體命令執行輸出就不復製出來了

[root@node01 soft]# tar xf mha4mysql-node-0.58.tar.gz
[root@node01 soft]# cd mha4mysql-node-0.58
[root@node01 mha4mysql-node-0.58]# perl Makefile.PL
[root@node01 mha4mysql-node-0.58]# make && make install

Node安裝完成後會獲得四個工具

[root@node01 mha4mysql-node-0.58]# ll /usr/local/bin/
total 48
-r-xr-xr-x 1 root root 17639 Jun 13 15:00 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Jun 13 15:00 filter_mysqlbinlog
-r-xr-xr-x 1 root root  8337 Jun 13 15:00 purge_relay_logs
-r-xr-xr-x 1 root root  7525 Jun 13 15:00 save_binary_logs

安裝 MHA Manager 組件

在 Manager 節點執行安裝

不用在Node節點上安裝

[root@manager soft]# tar xf mha4mysql-manager-0.58.tar.gz 
[root@manager soft]# cd mha4mysql-manager-0.58
[root@manager mha4mysql-manager-0.58]# ls
AUTHORS  bin  COPYING  debian  inc  lib  Makefile.PL  MANIFEST  META.yml  README  rpm  samples  t  tests
[root@manager mha4mysql-manager-0.58]# perl Makefile.PL
[root@manager mha4mysql-manager-0.58]# make && make install

查看 Manager 工具

[root@manager mha4mysql-manager-0.58]# ll /usr/local/bin/
total 88
-r-xr-xr-x 1 root root 17639 Jun 13 15:10 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Jun 13 15:10 filter_mysqlbinlog
-r-xr-xr-x 1 root root  1995 Jun 13 15:13 masterha_check_repl
-r-xr-xr-x 1 root root  1779 Jun 13 15:13 masterha_check_ssh
-r-xr-xr-x 1 root root  1865 Jun 13 15:13 masterha_check_status
-r-xr-xr-x 1 root root  3201 Jun 13 15:13 masterha_conf_host
-r-xr-xr-x 1 root root  2517 Jun 13 15:13 masterha_manager
-r-xr-xr-x 1 root root  2165 Jun 13 15:13 masterha_master_monitor
-r-xr-xr-x 1 root root  2373 Jun 13 15:13 masterha_master_switch
-r-xr-xr-x 1 root root  5172 Jun 13 15:13 masterha_secondary_check
-r-xr-xr-x 1 root root  1739 Jun 13 15:13 masterha_stop
-r-xr-xr-x 1 root root  8337 Jun 13 15:10 purge_relay_logs
-r-xr-xr-x 1 root root  7525 Jun 13 15:10 save_binary_logs

創建 MySQL 一主三從

本文章主要實現是MHA集羣，MySQL集羣直接貼命令和my.cnf配置

在四臺 Node 節點上，實現，node01 爲 master，剩下三個 node 爲 slave 。

[root@node01 mysql-5.7]# rpm -qa |grep mariadb | xargs rpm -e --nodeps
[root@node01 soft]# useradd -s /sbin/nologin -M mysql
[root@node01 soft]# tar xf mysql-5.7.18-linux-glibc2.5-x86_64.tar.gz
[root@node01 soft]# mv mysql-5.7.18-linux-glibc2.5-x86_64 mysql-5.7
[root@node01 soft]# mv mysql-5.7 /usr/local/
[root@node01 soft]# ln -s /usr/local/mysql-5.7 /usr/local/mysql
[root@node01 soft]# cd /usr/local/mysql-5.7
[root@node01 mysql-5.7]# echo 'export PATH=$PATH:/usr/local/mysql-5.7/bin' >> /etc/profile
[root@node01 mysql-5.7]# source /etc/profile
[root@node01 mysql-5.7]# mysql -V
mysql  Ver 14.14 Distrib 5.7.18, for linux-glibc2.5 (x86_64) using  EditLine wrapper
[root@node01 mysql-5.7]# cp support-files/mysql.server /etc/init.d/mysqld
[root@node01 mysql-5.7]# sed -i 's@/etc/my.cnf@/usr/local/mysql-5.7/my.cnf@g' /etc/init.d/mysqld
[root@node01 mysql-5.7]# sed -i 's@/usr/local/mysql/data@/opt/mysql_data@g' /etc/init.d/mysqld
[root@node01 mysql-5.7]# chkconfig mysqld on
[root@node01 mysql-5.7]# mkdir /opt/mysql_data
[root@node01 mysql-5.7]# chown -R mysql.mysql /usr/local/mysql-5.7
[root@node01 mysql-5.7]# chown -R mysql.mysql /opt/mysql_data
[root@node01 mysql-5.7]#ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
[root@node01 mysql-5.7]#ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql

my.cnf 配置文件

注意須要把my.cnf 中的server-id的的值四臺node不能重複，不然主從會創建失敗。

[root@node04 mysql-5.7]# cat my.cnf 
[client]
socket = /tmp/mysql.sock
port=3306

[mysql]
default-character-set=utf8
socket = /tmp/mysql.sock

[mysqld]
socket = /tmp/mysql.sock
character-set-server=utf8
basedir=/usr/local/mysql-5.7
datadir=/opt/mysql_data
port=3306
pid-file=/opt/mysql_data/mysqld.pid

# 四臺node不可重複
server-id=204

skip-name-resolve

default-storage-engine=INNODB
explicit_defaults_for_timestamp = true

gtid_mode = on  
enforce_gtid_consistency = 1
log_slave_updates = 1

plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
loose_rpl_semi_sync_master_timeout = 5000


relay-log = mysql-relay-bin
replicate-wild-ignore-table=mysql.%
replicate-wild-ignore-table=test.%
replicate-wild-ignore-table=information_schema.%

max_connections=2000
query_cache_size=0
table_open_cache=2000
tmp_table_size=246M
thread_cache_size=300
thread_stack = 192k
key_buffer_size=512M
read_buffer_size=4M
read_rnd_buffer_size=32M


innodb_data_home_dir = /opt/mysql_data
innodb_flush_log_at_trx_commit=0
innodb_log_buffer_size=16M

# 此選項修改成實際運行mysql機器內存的%60 - %80
innodb_buffer_pool_size=13G

innodb_log_file_size=128M
innodb_thread_concurrency=128
innodb_autoextend_increment=1000
innodb_buffer_pool_instances=8
innodb_concurrency_tickets=5000
innodb_old_blocks_time=1000
innodb_open_files=300
innodb_stats_on_metadata=0
innodb_file_per_table=1
innodb_checksum_algorithm=0

back_log = 80
flush_time = 0
join_buffer_size = 128M
max_allowed_packet = 1024M
max_connect_errors = 2000
open_files_limit = 4161
query_cache_type = 0
sort_buffer_size = 32M
table_definition_cache = 1400
binlog_row_event_max_size = 8K
sync_master_info = 10000
sync_relay_log = 10000
sync_relay_log_info = 10000
bulk_insert_buffer_size = 64M
interactive_timeout = 120
wait_timeout = 120
log-bin-trust-function-creators=1
sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 

[mysqld_safe]
log-error = /opt/mysql_data/error.log
pid-file = /opt/mysql_data/mysqld.pid

初始化 MySQL

node01

[root@node01 mysql-5.7]# mysqld --initialize  --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data
2019-06-13T07:59:00.947482Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-06-13T07:59:01.056859Z 0 [Warning] InnoDB: New log files created, LSN=45790
2019-06-13T07:59:01.076218Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2019-06-13T07:59:01.129463Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae29152-8db1-11e9-9d54-005056990727.
2019-06-13T07:59:01.129873Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2019-06-13T07:59:01.130247Z 1 [Note] A temporary password is generated for root@localhost: 1qGoEiI7ga#U

node02

[root@node02 mysql-5.7]# mysqld --initialize  --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data
2019-06-13T07:59:00.952176Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-06-13T07:59:01.092736Z 0 [Warning] InnoDB: New log files created, LSN=45790
2019-06-13T07:59:01.116696Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2019-06-13T07:59:01.171324Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae8f47b-8db1-11e9-b8bb-0050569972c0.
2019-06-13T07:59:01.171711Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2019-06-13T07:59:01.172126Z 1 [Note] A temporary password is generated for root@localhost: qTwtKAOue7:o

node03

[root@node03 mysql-5.7]# mysqld --initialize  --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data
2019-06-13T07:59:00.949924Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-06-13T07:59:01.090890Z 0 [Warning] InnoDB: New log files created, LSN=45790
2019-06-13T07:59:01.116166Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2019-06-13T07:59:01.171335Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae8f4ef-8db1-11e9-b6ae-0050569975f7.
2019-06-13T07:59:01.171753Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2019-06-13T07:59:01.172159Z 1 [Note] A temporary password is generated for root@localhost: XIu,h#*HQ5&M

node04

2019-06-13T07:59:00.955598Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-06-13T07:59:01.090420Z 0 [Warning] InnoDB: New log files created, LSN=45790
2019-06-13T07:59:01.113972Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2019-06-13T07:59:01.166754Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae84210-8db1-11e9-b6fe-005056992c6b.
2019-06-13T07:59:01.167145Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2019-06-13T07:59:01.167537Z 1 [Note] A temporary password is generated for root@localhost: 26jvaV)XAy>G

執行完初始化操做後，最後會給予root的默認密碼，使用此密碼登錄後，要第一時間修改root密碼，不然不容許操做數據庫；

啓動MySQL 並簡單配置

# /etc/init.d/mysqld start
Starting MySQL.Logging to '/opt/mysql_data/error.log'.
.. SUCCESS!

登錄MySQL 並修改密碼

[root@node01 mysql-5.7]# mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.18

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> alter user user() identified by "123456";
Query OK, 0 rows affected (0.00 sec)

全部mysql增長主從用戶

mysql> grant replication slave on *.* to 'repl'@'10.0.20.%' identified by '123456';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all on *.* to 'root'@'%' identified by '123456';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

創建一主三從

node01 的MySQL執行

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |      154 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

node0二、node0三、node04 都執行下列語句

change master to master_host='10.0.20.201',master_user='repl',master_password='123456',master_log_file='mysql-bin.000002',master_log_pos=463;

show slave status\G; #查看slave IO和slave sql是否都正常

特別說明

下面開始配置Manager機器，本人的全部機器，均作了bond網卡綁定，全部機器的網卡名都爲bond0，你們根據本身的網卡名稱自行修改，還有發送郵件的郵箱以及微信公衆號的相關配置，均須要修改成本身的。

本次是用vip 是: 10.0.20.199

你們根據本身的狀況，作出對應的修改。

MHA Manager 配置

下面配置，均在manager機器上操做。

# 建立MHA配置文件目錄
mkdir /etc/mha
# 建立MHA腳本目錄
mkdir /etc/mha/scripts
# 建立MHA日誌目錄
mkdir /var/log/mha/
# 建立日誌目錄
mkdir /var/log/mha/app1 -p
# 建立日誌文件
touch /var/log/mha/app1/manager.log

MHA 配置文件

[root@manager mha]# cat /etc/masterha_default.cnf 

[server default]
user=root
password=SIjiayong.123

repl_user=repl
repl_password=SIjiayong.123

ssh_user=root

ping_interval=1
master_binlog_dir=/opt/mysql_data

manager_workdir=/var/log/mha/app1.log
manager_log=/var/log/mha/manager.log
master_ip_failover_script="/etc/mha/scripts/master_ip_failover"
master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change"
report_script="/etc/mha/scripts/send_report"
remote_workdir=/tmp
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 10.0.20.201 -s 10.0.20.202 -s 10.0.20.203 -s 10.0.20.204
shutdown_script=""

[root@manager ~]# cat /etc/mha/app1.cnf 
[server1]
hostname=10.0.20.201
port=3306

[server2]
hostname=10.0.20.202
port=3306
candidate_master=1
check_repl_delay=0

[server3]
hostname=10.0.20.203
port=3306

[server4]
hostname=10.0.20.204
port=3306

配置文件說明

MHA主要配置文件說明

manager_workdir=/var/log/masterha/app1.log：設置manager的工做目錄
manager_log=/var/log/masterha/app1/manager.log：設置manager的日誌文件
master_binlog_dir=/data/mysql：設置master 保存binlog的位置，以便MHA能夠找到master的日誌
master_ip_failover_script= /usr/local/bin/master_ip_failover：設置自動failover時候的切換腳本
master_ip_online_change_script= /usr/local/bin/master_ip_online_change：設置手動切換時候的切換腳本
user=root：設置監控mysql的用戶
password=dayi123：設置監控mysql的用戶，須要受權可以在manager節點遠程登陸
ping_interval=1：設置監控主庫，發送ping包的時間間隔，默認是3秒，嘗試三次沒有迴應的時候自動進行railover
remote_workdir=/tmp：設置遠端mysql在發生切換時binlog的保存位置
repl_user=repl ：設置mysql中用於複製的用戶密碼
repl_password=replication：設置mysql中用於複製的用戶
report_script=/usr/local/send_report：設置發生切換後發送的報警的腳本
shutdown_script=""：設置故障發生後關閉故障主機腳本（該腳本的主要做用是關閉主機放在發生腦裂,這裏沒有使用）
ssh_user=root //設置ssh的登陸用戶名
candidate_master=1：在節點下設置，設置當前節點爲候選的master
slave check_repl_delay=0 :在節點配置下設置，默認狀況下若是一個slave落後master 100M的relay logs的話，MHA將不會選擇該slave做爲一個新的master；這個選項對於對於設置了candidate_master=1的主機很是有用

腳本配置

自動 VIP 管理配置

#爲了防止腦裂發生,推薦生產環境採用腳本的方式來管理虛擬 ip,而不是使用 keepalived來完成

vim /etc/mha/scripts/master_ip_failover

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '10.0.20.199/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig bond0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig bond0:$key down";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

配置郵件和微信報警腳本

# 安裝發送郵件的工具
yum install mailx -y

mail郵件發送程序，須要先配置好發送這信息

vim /etc/mail.rc

set from=*****@163.com
set smtp=smtp.163.com
set smtp-auth-user=*****
#拿163郵箱來講這個不是密碼，而是受權碼
set smtp-auth-password=*****
set smtp-auth=login

這是具體的郵件和微信發送腳本

vim /etc/mha/scripts/send_report

#!/bin/bash
source /root/.bash_profile
# 解析變量
orig_master_host=`echo "$1" | awk -F = '{print $2}'`
new_master_host=`echo "$2" | awk -F = '{print $2}'`
new_slave_hosts=`echo "$3" | awk -F = '{print $2}'`
subject=`echo "$4" | awk -F = '{print $2}'`
body=`echo "$5" | awk -F = '{print $2}'`
#定義收件人地址
email="***@***.com"

# 下面這倆個須要微信公衆號中自行獲取
CropID='******************'
Secret='***************************************'

GURL="https://qyapi.weixin.qq.com/cgi-bin/gettoken?corpid=$CropID&corpsecret=$Secret"
Gtoken=$(/usr/bin/curl -s -G $GURL | awk -F\" '{print $10}')

PURL="https://qyapi.weixin.qq.com/cgi-bin/message/send?access_token=$Gtoken"

function body() {
        #企業號中的應用id
        local int AppID=1000002
        #部門成員id，
        local UserID=$1
        #部門id，定義了範圍，組內成員均可接收到消息
        local PartyID='2|3'
        #過濾出zabbix傳遞的第三個參數
        local Msg=$(echo "$@" | cut -d" " -f3-)
        printf '{\n'
        printf '\t"touser": "'"$UserID"\"",\n"
        printf '\t"toparty": "'"$PartyID"\"",\n"
        printf '\t"msgtype": "text",\n'
        printf '\t"agentid": "'" $AppID "\"",\n"
        printf '\t"text": {\n'
        printf '\t\t"content": "'"$Msg"\""\n"
        printf '\t},\n'
        printf '\t"safe":"0"\n'
        printf '}\n'
}






tac /var/log/mha/app1/manager.log | sed -n 2p | grep 'successfully' > /dev/null
if [ $? -eq 0 ]
    then
    messages=`echo -e "MHA $subject 主從切換成功\n master:$orig_master_host --> $new_master_host \n $body \n 當前從庫:$new_slave_hosts"` 
    echo "$messages" | mail -s "Mysql 實例宕掉，MHA $subject 切換成功" $email >>/tmp/mailx.log 2>&1 
        /usr/bin/curl --data-ascii "$(body 1 1 ${messages})" ${PURL}
    else
    messages=`echo -e "MHA $subject 主從切換失敗\n master:$orig_master_host --> $new_master_host \n $body" `
    echo "$messages" | mail -s ""Mysql 實例宕掉，MHA $subject 切換失敗"" $email >>/tmp/mailx.log 2>&1  
        /usr/bin/curl --data-ascii "$(body 1 1 ${messages})" ${PURL}
fi

手動 VIP 管理配置腳本

vim /etc/mha/scripts/master_ip_online_change

#!/bin/bash
source /root/.bash_profile

vip=`echo '10.0.20.199/24'`  #設置VIP
key=`echo '1'`

command=`echo "$1" | awk -F = '{print $2}'`
orig_master_host=`echo "$2" | awk -F = '{print $2}'`
new_master_host=`echo "$7" | awk -F = '{print $2}'`
orig_master_ssh_user=`echo "${12}" | awk -F = '{print $2}'`
new_master_ssh_user=`echo "${13}" | awk -F = '{print $2}'`

#要求服務的網卡識別名同樣
stop_vip=`echo "ssh root@$orig_master_host /usr/sbin/ifconfig bond0:$key down"`
start_vip=`echo "ssh root@$new_master_host /usr/sbin/ifconfig bond0:$key $vip"`

if [ $command = 'stop' ]
  then
    echo -e "\n\n\n****************************\n"
    echo -e "Disabled thi VIP - $vip on old master: $orig_master_host \n"
    $stop_vip
    if [ $? -eq 0 ]
      then
    echo "Disabled the VIP successfully"
      else
    echo "Disabled the VIP failed"
    fi
    echo -e "***************************\n\n\n"
  fi

if [ $command = 'start' -o $command = 'status' ]
  then
    echo -e "\n\n\n*************************\n"
    echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"
    $start_vip
    if [ $? -eq 0 ]
      then
    echo "Enabled the VIP successfully"
      else
    echo "Enabled the VIP failed"
    fi
    echo -e "***************************\n\n\n"
fi

賦權

最後給剛剛配置的三個腳本增長執行權限

chmod +x /etc/mha/scripts/master_ip_failover 
chmod +x /etc/mha/scripts/master_ip_online_change 
chmod +x /etc/mha/scripts/send_report

驗證 MHA 相關操做

驗證 ssh 信任登陸是否成功

經過 masterha_check_ssh 命令驗證

[root@manager scripts]# masterha_check_ssh --conf=/etc/mha/app1.cnf
# 最後出現如下提示，則表示經過
Thu Jun 13 17:19:34 2019 - [info] All SSH connection tests passed successfully.

驗證 mysql 主從複製是否成功

經過 masterha_check_repl 命令驗證

[root@manager mha]# vim /etc/masterha_default.cnf
# 最後出現如下提示，則表示經過
MySQL Replication Health is OK.

啓動 MHA

手動第一次添加vip

本次在node01 上操做

先在node01 的 MySQL master上綁定vip，只須要在master綁定這一次，之後會自動切換

[root@node01 mysql-5.7]# ip a | grep 20
    inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0
    inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1

啓動

這一步在manager上操做

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &

檢查 MHA 狀態

[root@manager mha]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:4745) is running(0:PING_OK), master:10.0.20.201

MHA 的日誌保存在/var/log/masterha/app1/manager.log 下

[root@manager mha]# tailf /var/log/mha/manager.log
#若是最後一行是以下，代表啓動成功
Thu Jun 13 17:31:41 2019 - [info] Starting ping health check on 10.0.20.201(10.0.20.201:3306)..
Thu Jun 13 17:31:41 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

關閉

若已處於監控狀態，須要停掉它

masterha_stop --conf=/etc/mha/app1.cnf

模擬宕機測試

手動中止node01 的 MySQL master，而後查看其它節點狀況。

[root@node01 ~]# /etc/init.d/mysqld  stop
Shutting down MySQL............ SUCCESS!
[root@node01 ~]# ip a | grep 20
    inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0

在node02 上查看VIP

[root@node02 ~]# ip a | grep 20
    inet 10.0.20.202/24 brd 10.0.20.255 scope global bond0
    inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1

在node03 上查看主從同步狀態和地址

[root@node03 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running'
mysql: [Warning] Using a password on the command line interface can be insecure.
                  Master_Host: 10.0.20.202
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

在node04 上查看主從同步狀態和地址

[root@node04 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running'
mysql: [Warning] Using a password on the command line interface can be insecure.
                  Master_Host: 10.0.20.202
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

查看Manager日誌

[root@manager mha]# tailf manager.log
Fri Jun 14 10:01:03 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Fri Jun 14 10:01:03 2019 - [info] Executing SSH check script: exit 0
Fri Jun 14 10:01:03 2019 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 10.0.20.201 -s 10.0.20.202 -s 10.0.20.203 -s 10.0.20.204  --user=root  --master_host=10.0.20.201  --master_ip=10.0.20.201  --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECT
Fri Jun 14 10:01:03 2019 - [info] HealthCheck: SSH to 10.0.20.201 is reachable.
Monitoring server 10.0.20.201 is reachable, Master is not reachable from 10.0.20.201. OK.
Monitoring server 10.0.20.202 is reachable, Master is not reachable from 10.0.20.202. OK.
Monitoring server 10.0.20.203 is reachable, Master is not reachable from 10.0.20.203. OK.
Fri Jun 14 10:01:04 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111))
Fri Jun 14 10:01:04 2019 - [warning] Connection failed 2 time(s)..
Monitoring server 10.0.20.204 is reachable, Master is not reachable from 10.0.20.204. OK.
Fri Jun 14 10:01:04 2019 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Jun 14 10:01:05 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111))
Fri Jun 14 10:01:05 2019 - [warning] Connection failed 3 time(s)..
Fri Jun 14 10:01:06 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111))
Fri Jun 14 10:01:06 2019 - [warning] Connection failed 4 time(s)..
Fri Jun 14 10:01:06 2019 - [warning] Master is not reachable from health checker!
Fri Jun 14 10:01:06 2019 - [warning] Master 10.0.20.201(10.0.20.201:3306) is not reachable!
Fri Jun 14 10:01:06 2019 - [warning] SSH is reachable.
Fri Jun 14 10:01:06 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..
Fri Jun 14 10:01:06 2019 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Jun 14 10:01:06 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Jun 14 10:01:06 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Jun 14 10:01:07 2019 - [info] GTID failover mode = 1
Fri Jun 14 10:01:07 2019 - [info] Dead Servers:
Fri Jun 14 10:01:07 2019 - [info]   10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:07 2019 - [info] Alive Servers:
Fri Jun 14 10:01:07 2019 - [info]   10.0.20.202(10.0.20.202:3306)
Fri Jun 14 10:01:07 2019 - [info]   10.0.20.203(10.0.20.203:3306)
Fri Jun 14 10:01:07 2019 - [info]   10.0.20.204(10.0.20.204:3306)
Fri Jun 14 10:01:07 2019 - [info] Alive Slaves:
Fri Jun 14 10:01:07 2019 - [info]   10.0.20.202(10.0.20.202:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:07 2019 - [info]     GTID ON
Fri Jun 14 10:01:07 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:07 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jun 14 10:01:07 2019 - [info]   10.0.20.203(10.0.20.203:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:07 2019 - [info]     GTID ON
Fri Jun 14 10:01:07 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:07 2019 - [info]   10.0.20.204(10.0.20.204:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:07 2019 - [info]     GTID ON
Fri Jun 14 10:01:07 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:07 2019 - [info] Checking slave configurations..
Fri Jun 14 10:01:07 2019 - [info]  read_only=1 is not set on slave 10.0.20.202(10.0.20.202:3306).
Fri Jun 14 10:01:07 2019 - [info]  read_only=1 is not set on slave 10.0.20.203(10.0.20.203:3306).
Fri Jun 14 10:01:07 2019 - [info]  read_only=1 is not set on slave 10.0.20.204(10.0.20.204:3306).
Fri Jun 14 10:01:07 2019 - [info] Checking replication filtering settings..
Fri Jun 14 10:01:07 2019 - [info]  Replication filtering check ok.
Fri Jun 14 10:01:07 2019 - [info] Master is down!
Fri Jun 14 10:01:07 2019 - [info] Terminating monitoring script.
Fri Jun 14 10:01:07 2019 - [info] Got exit code 20 (Master dead).
Fri Jun 14 10:01:07 2019 - [info] MHA::MasterFailover version 0.58.
Fri Jun 14 10:01:07 2019 - [info] Starting master failover.
Fri Jun 14 10:01:07 2019 - [info] 
Fri Jun 14 10:01:07 2019 - [info] * Phase 1: Configuration Check Phase..
Fri Jun 14 10:01:07 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] GTID failover mode = 1
Fri Jun 14 10:01:08 2019 - [info] Dead Servers:
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info] Checking master reachability via MySQL(double check)...
Fri Jun 14 10:01:08 2019 - [info]  ok.
Fri Jun 14 10:01:08 2019 - [info] Alive Servers:
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.202(10.0.20.202:3306)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.203(10.0.20.203:3306)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.204(10.0.20.204:3306)
Fri Jun 14 10:01:08 2019 - [info] Alive Slaves:
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.202(10.0.20.202:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.203(10.0.20.203:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.204(10.0.20.204:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info] Starting GTID based failover.
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Jun 14 10:01:08 2019 - [info] Executing master IP deactivation script:
Fri Jun 14 10:01:08 2019 - [info]   /etc/mha/scripts/master_ip_failover --orig_master_host=10.0.20.201 --orig_master_ip=10.0.20.201 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig bond0:1 down==/sbin/ifconfig bond0:1 10.0.20.199/24===

Disabling the VIP on old master: 10.0.20.201 
Fri Jun 14 10:01:08 2019 - [info]  done.
Fri Jun 14 10:01:08 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Jun 14 10:01:08 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] * Phase 3: Master Recovery Phase..
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] The latest binary log file/position on all slaves is mysql-bin.000004:194
Fri Jun 14 10:01:08 2019 - [info] Retrieved Gtid Set: 6211616e-8db3-11e9-be15-005056990727:3-5
Fri Jun 14 10:01:08 2019 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.202(10.0.20.202:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.203(10.0.20.203:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.204(10.0.20.204:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info] The oldest binary log file/position on all slaves is mysql-bin.000004:194
Fri Jun 14 10:01:08 2019 - [info] Retrieved Gtid Set: 6211616e-8db3-11e9-be15-005056990727:3-5
Fri Jun 14 10:01:08 2019 - [info] Oldest slaves:
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.202(10.0.20.202:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.203(10.0.20.203:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.204(10.0.20.204:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] * Phase 3.3: Determining New Master Phase..
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] Searching new master from slaves..
Fri Jun 14 10:01:08 2019 - [info]  Candidate masters from the configuration file:
Fri Jun 14 10:01:08 2019 - [info]   10.0.20.202(10.0.20.202:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 10:01:08 2019 - [info]     GTID ON
Fri Jun 14 10:01:08 2019 - [info]     Replicating from 10.0.20.201(10.0.20.201:3306)
Fri Jun 14 10:01:08 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jun 14 10:01:08 2019 - [info]  Non-candidate masters:
Fri Jun 14 10:01:08 2019 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Jun 14 10:01:08 2019 - [info] New master is 10.0.20.202(10.0.20.202:3306)
Fri Jun 14 10:01:08 2019 - [info] Starting master failover..
Fri Jun 14 10:01:08 2019 - [info] 
From:
10.0.20.201(10.0.20.201:3306) (current master)
 +--10.0.20.202(10.0.20.202:3306)
 +--10.0.20.203(10.0.20.203:3306)
 +--10.0.20.204(10.0.20.204:3306)

To:
10.0.20.202(10.0.20.202:3306) (new master)
 +--10.0.20.203(10.0.20.203:3306)
 +--10.0.20.204(10.0.20.204:3306)
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info]  Waiting all logs to be applied.. 
Fri Jun 14 10:01:08 2019 - [info]   done.
Fri Jun 14 10:01:08 2019 - [info] Getting new master's binlog name and position..
Fri Jun 14 10:01:08 2019 - [info]  mysql-bin.000002:194
Fri Jun 14 10:01:08 2019 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.20.202', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Jun 14 10:01:08 2019 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 194, 6211616e-8db3-11e9-be15-005056990727:4-5
Fri Jun 14 10:01:08 2019 - [info] Executing master IP activate script:
Fri Jun 14 10:01:08 2019 - [info]   /etc/mha/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.0.20.201 --orig_master_ip=10.0.20.201 --orig_master_port=3306 --new_master_host=10.0.20.202 --new_master_ip=10.0.20.202 --new_master_port=3306 --new_master_user='root'   --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password


IN SCRIPT TEST====/sbin/ifconfig bond0:1 down==/sbin/ifconfig bond0:1 10.0.20.199/24===

Enabling the VIP - 10.0.20.199/24 on the new master - 10.0.20.202 
Fri Jun 14 10:01:08 2019 - [info]  OK.
Fri Jun 14 10:01:08 2019 - [info] ** Finished master recovery successfully.
Fri Jun 14 10:01:08 2019 - [info] * Phase 3: Master Recovery Phase completed.
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] * Phase 4: Slaves Recovery Phase..
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] * Phase 4.1: Starting Slaves in parallel..
Fri Jun 14 10:01:08 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info] -- Slave recovery on host 10.0.20.203(10.0.20.203:3306) started, pid: 2838. Check tmp log /var/log/mha/10.0.20.203_3306_20190614100107.log if it takes time..
Fri Jun 14 10:01:08 2019 - [info] -- Slave recovery on host 10.0.20.204(10.0.20.204:3306) started, pid: 2839. Check tmp log /var/log/mha/10.0.20.204_3306_20190614100107.log if it takes time..
Fri Jun 14 10:01:09 2019 - [info] 
Fri Jun 14 10:01:09 2019 - [info] Log messages from 10.0.20.204 ...
Fri Jun 14 10:01:09 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info]  Resetting slave 10.0.20.204(10.0.20.204:3306) and starting replication from the new master 10.0.20.202(10.0.20.202:3306)..
Fri Jun 14 10:01:08 2019 - [info]  Executed CHANGE MASTER.
Fri Jun 14 10:01:08 2019 - [info]  Slave started.
Fri Jun 14 10:01:08 2019 - [info]  gtid_wait(6211616e-8db3-11e9-be15-005056990727:4-5) completed on 10.0.20.204(10.0.20.204:3306). Executed 0 events.
Fri Jun 14 10:01:09 2019 - [info] End of log messages from 10.0.20.204.
Fri Jun 14 10:01:09 2019 - [info] -- Slave on host 10.0.20.204(10.0.20.204:3306) started.
Fri Jun 14 10:01:10 2019 - [info] 
Fri Jun 14 10:01:10 2019 - [info] Log messages from 10.0.20.203 ...
Fri Jun 14 10:01:10 2019 - [info] 
Fri Jun 14 10:01:08 2019 - [info]  Resetting slave 10.0.20.203(10.0.20.203:3306) and starting replication from the new master 10.0.20.202(10.0.20.202:3306)..
Fri Jun 14 10:01:08 2019 - [info]  Executed CHANGE MASTER.
Fri Jun 14 10:01:09 2019 - [info]  Slave started.
Fri Jun 14 10:01:09 2019 - [info]  gtid_wait(6211616e-8db3-11e9-be15-005056990727:4-5) completed on 10.0.20.203(10.0.20.203:3306). Executed 0 events.
Fri Jun 14 10:01:10 2019 - [info] End of log messages from 10.0.20.203.
Fri Jun 14 10:01:10 2019 - [info] -- Slave on host 10.0.20.203(10.0.20.203:3306) started.
Fri Jun 14 10:01:10 2019 - [info] All new slave servers recovered successfully.
Fri Jun 14 10:01:10 2019 - [info] 
Fri Jun 14 10:01:10 2019 - [info] * Phase 5: New master cleanup phase..
Fri Jun 14 10:01:10 2019 - [info] 
Fri Jun 14 10:01:10 2019 - [info] Resetting slave info on the new master..
Fri Jun 14 10:01:10 2019 - [info]  10.0.20.202: Resetting slave info succeeded.
Fri Jun 14 10:01:10 2019 - [info] Master failover to 10.0.20.202(10.0.20.202:3306) completed successfully.
Fri Jun 14 10:01:10 2019 - [info] Deleted server1 entry from /etc/mha/app1.cnf .
Fri Jun 14 10:01:10 2019 - [info] 

----- Failover Report -----

app1: MySQL Master failover 10.0.20.201(10.0.20.201:3306) to 10.0.20.202(10.0.20.202:3306) succeeded

Master 10.0.20.201(10.0.20.201:3306) is down!

Check MHA Manager logs at manager.mha:/var/log/mha/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 10.0.20.201(10.0.20.201:3306)
Selected 10.0.20.202(10.0.20.202:3306) as a new master.
10.0.20.202(10.0.20.202:3306): OK: Applying all logs succeeded.
10.0.20.202(10.0.20.202:3306): OK: Activated master IP address.
10.0.20.204(10.0.20.204:3306): OK: Slave started, replicating from 10.0.20.202(10.0.20.202:3306)
10.0.20.203(10.0.20.203:3306): OK: Slave started, replicating from 10.0.20.202(10.0.20.202:3306)
10.0.20.202(10.0.20.202:3306): Resetting slave info succeeded.
Master failover to 10.0.20.202(10.0.20.202:3306) completed successfully.
Fri Jun 14 10:01:10 2019 - [info] Sending mail..
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   347  100    45  100   302    133    897 --:--:-- --:--:-- --:--:--   898

由上面的日誌以及各節點狀態看出，vip已經自動漂移到node02的服務器上，而且node02自動提高爲主庫，node03 和 node04 自動同步node02的庫。

同時也收到了微信和郵件告警。

自動切換步驟

從上面的輸出能夠看出整個 MHA 的切換過程,共包括如下的步驟：

配置文件檢查階段,這個階段會檢查整個集羣配置文件配置
宕機的 master 處理,這個階段包括虛擬 ip 摘除操做,主機關機操做（因爲沒有定義power_manager腳本，不會關機）
複製 dead maste 和最新 slave 相差的 relay log,並保存到 MHA Manger 具體的目錄下
識別含有最新更新的 slave
應用從 master 保存的二進制日誌事件(binlog events)（這點信息對於將故障master修復後加入集羣很重要）
提高一個 slave 爲新的 master 進行復制
使其餘的 slave 鏈接新的 master 進行復制

修復後從新加入集羣

切換完成後,關注以下變化:

vip 自動從原來的 master 切換到新的 master,同時,manager 節點的監控進程自動退出。
在日誌目錄(/var/log/mha/app1)產生一個 app1.failover.complete 文件
/etc/mha/app1.cnf 配置文件中原來老的 master 配置被刪除。

模擬宕機的時候，中止了MySQL進程，如今從新啓動MySQL，並加入到Node02 的從庫中

node02 操做

[root@node02 ~]# mysql -uroot -p123456 -e 'show master status\G'
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 194
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 6211616e-8db3-11e9-be15-005056990727:4-5

node01 操做

mysql> change master to master_host='10.0.20.202',master_user='repl',master_password='123456',master_log_file='mysql-bin.000002',master_log_pos=194;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> exit
Bye
[root@node01 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running'
mysql: [Warning] Using a password on the command line interface can be insecure.
                  Master_Host: 10.0.20.202
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

manager 操做

須要注意的是，當發生宕機切換後，manager中的MHA進程會自動中止，在修復後，須要手動再次啓動

當發生宕機切換，MHA會自動把宕機的信息從app1.cnf配置文件中刪除，修復後機器，要把信息從新寫入到app1.cnf中。

修改前

[root@manager mha]# pwd
/etc/mha
[root@manager mha]# cat app1.cnf 
[server2]
candidate_master=1
check_repl_delay=0
hostname=10.0.20.202
port=3306

[server3]
hostname=10.0.20.203
port=3306

[server4]
hostname=10.0.20.204
port=3306

修改後

[root@manager mha]# cat app1.cnf 
[server1]
candidate_master=1
check_repl_delay=0
hostname=10.0.20.201

[server2]
hostname=10.0.20.202
port=3306

[server3]
hostname=10.0.20.203
port=3306

[server4]
hostname=10.0.20.204
port=3306

從新啓動MHA

修改好配置文件後，再次啓動MHA便可

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &

此時修復完成。

在線進行切換

在許多狀況下，須要將現有的主服務器遷移到另一臺服務器上。好比主服務器硬件故障，RAID 控制卡須要重建，將主服務器移到性能更好的服務器上等等。維護主服務器引發性能降低，致使停機時間至少沒法寫入數據。另外，阻塞或殺掉當前運行的會話會致使主主之間數據不一致的問題發生。 MHA 提供快速切換和優雅的阻塞寫入，這個切換過程只須要 0.5-2s 的時間，這段時間內數據是沒法寫入的。在不少狀況下，0.5-2s 的阻塞寫入是能夠接受的。所以切換主服務器不須要計劃分配維護時間窗口。

MHA在線切換的大概過程：

檢測複製設置和肯定當前主服務器
肯定新的主服務器
阻塞寫入到當前主服務器
等待全部從服務器遇上覆制
授予寫入到新的主服務器
從新設置從服務器

注意，在線切換的時候應用架構須要考慮如下兩個問題：

自動識別master和slave的問題（master的機器可能會切換），若是採用了vip的方式，基本能夠解決這個問題。
負載均衡的問題（能夠定義大概的讀寫比例，每臺機器可承擔的負載比例，當有機器離開集羣時，須要考慮這個問題）

爲了保證數據徹底一致性，在最快的時間內完成切換，MHA的在線切換必須知足如下條件纔會切換成功，不然會切換失敗。

全部slave的IO線程都在運行
全部slave的SQL線程都在運行
全部的show slave status的輸出中Seconds_Behind_Master參數小於或者等於running_updates_limit秒，若是在切換過程當中不指定running_updates_limit,那麼默認狀況下running_updates_limit爲1秒。
在master端，經過show processlist輸出，沒有一個更新花費的時間大於running_updates_limit秒。

中止MHA 的manager 監控

[root@manager mha]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1

執行切換命令

進行在線切換操做

模擬在線切換主庫操做，原主庫10.0.20.202變爲slave，10.0.20.201提高爲新的主庫

上一次進行了模擬宕機測試，最開始的主庫是201，切換到了202位主庫了

[root@manager mha]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=10.0.20.201 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0

執行後輸出的日誌以下：

Fri Jun 14 11:30:26 2019 - [info] MHA::MasterRotate version 0.58.
Fri Jun 14 11:30:26 2019 - [info] Starting online master switch..
Fri Jun 14 11:30:26 2019 - [info] 
Fri Jun 14 11:30:26 2019 - [info] * Phase 1: Configuration Check Phase..
Fri Jun 14 11:30:26 2019 - [info] 
Fri Jun 14 11:30:26 2019 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Jun 14 11:30:26 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Jun 14 11:30:26 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Jun 14 11:30:27 2019 - [info] GTID failover mode = 1
Fri Jun 14 11:30:27 2019 - [info] Current Alive Master: 10.0.20.202(10.0.20.202:3306)
Fri Jun 14 11:30:27 2019 - [info] Alive Slaves:
Fri Jun 14 11:30:27 2019 - [info]   10.0.20.201(10.0.20.201:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 11:30:27 2019 - [info]     GTID ON
Fri Jun 14 11:30:27 2019 - [info]     Replicating from 10.0.20.202(10.0.20.202:3306)
Fri Jun 14 11:30:27 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Jun 14 11:30:27 2019 - [info]   10.0.20.203(10.0.20.203:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 11:30:27 2019 - [info]     GTID ON
Fri Jun 14 11:30:27 2019 - [info]     Replicating from 10.0.20.202(10.0.20.202:3306)
Fri Jun 14 11:30:27 2019 - [info]   10.0.20.204(10.0.20.204:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Fri Jun 14 11:30:27 2019 - [info]     GTID ON
Fri Jun 14 11:30:27 2019 - [info]     Replicating from 10.0.20.202(10.0.20.202:3306)
Fri Jun 14 11:30:27 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] Checking MHA is not monitoring or doing failover..
Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.201..
Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.203..
Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.204..
Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] 10.0.20.201 can be new master.
Fri Jun 14 11:30:27 2019 - [info] 
From:
10.0.20.202(10.0.20.202:3306) (current master)
 +--10.0.20.201(10.0.20.201:3306)
 +--10.0.20.203(10.0.20.203:3306)
 +--10.0.20.204(10.0.20.204:3306)

To:
10.0.20.201(10.0.20.201:3306) (new master)
 +--10.0.20.203(10.0.20.203:3306)
 +--10.0.20.204(10.0.20.204:3306)
 +--10.0.20.202(10.0.20.202:3306)
Fri Jun 14 11:30:27 2019 - [info] Checking whether 10.0.20.201(10.0.20.201:3306) is ok for the new master..
Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] 10.0.20.202(10.0.20.202:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Fri Jun 14 11:30:27 2019 - [info] 10.0.20.202(10.0.20.202:3306): Resetting slave pointing to the dummy host.
Fri Jun 14 11:30:27 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Jun 14 11:30:27 2019 - [info] 
Fri Jun 14 11:30:27 2019 - [info] * Phase 2: Rejecting updates Phase..
Fri Jun 14 11:30:27 2019 - [info] 
Fri Jun 14 11:30:27 2019 - [info] Executing master ip online change script to disable write on the current master:
Fri Jun 14 11:30:27 2019 - [info]   /etc/mha/scripts/master_ip_online_change --command=stop --orig_master_host=10.0.20.202 --orig_master_ip=10.0.20.202 --orig_master_port=3306 --orig_master_user='root' --new_master_host=10.0.20.201 --new_master_ip=10.0.20.201 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx



****************************

Disabled thi VIP - 10.0.20.199/24 on old master: 10.0.20.202 

Disabled the VIP successfully
***************************



Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri Jun 14 11:30:27 2019 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] Orig master binlog:pos is mysql-bin.000002:194.
Fri Jun 14 11:30:27 2019 - [info]  Waiting to execute all relay logs on 10.0.20.201(10.0.20.201:3306)..
Fri Jun 14 11:30:27 2019 - [info]  master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.201(10.0.20.201:3306). Executed 0 events.
Fri Jun 14 11:30:27 2019 - [info]   done.
Fri Jun 14 11:30:27 2019 - [info] Getting new master's binlog name and position..
Fri Jun 14 11:30:27 2019 - [info]  mysql-bin.000005:194
Fri Jun 14 11:30:27 2019 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.20.201', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Jun 14 11:30:27 2019 - [info] Executing master ip online change script to allow write on the new master:
Fri Jun 14 11:30:27 2019 - [info]   /etc/mha/scripts/master_ip_online_change --command=start --orig_master_host=10.0.20.202 --orig_master_ip=10.0.20.202 --orig_master_port=3306 --orig_master_user='root' --new_master_host=10.0.20.201 --new_master_ip=10.0.20.201 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx



*************************

Enabling the VIP - 10.0.20.199/24 on new master: 10.0.20.201 

Enabled the VIP successfully
***************************



Fri Jun 14 11:30:27 2019 - [info]  ok.
Fri Jun 14 11:30:27 2019 - [info] 
Fri Jun 14 11:30:27 2019 - [info] * Switching slaves in parallel..
Fri Jun 14 11:30:27 2019 - [info] 
Fri Jun 14 11:30:27 2019 - [info] -- Slave switch on host 10.0.20.203(10.0.20.203:3306) started, pid: 7081
Fri Jun 14 11:30:27 2019 - [info] 
Fri Jun 14 11:30:27 2019 - [info] -- Slave switch on host 10.0.20.204(10.0.20.204:3306) started, pid: 7082
Fri Jun 14 11:30:27 2019 - [info] 
Fri Jun 14 11:30:29 2019 - [info] Log messages from 10.0.20.203 ...
Fri Jun 14 11:30:29 2019 - [info] 
Fri Jun 14 11:30:27 2019 - [info]  Waiting to execute all relay logs on 10.0.20.203(10.0.20.203:3306)..
Fri Jun 14 11:30:27 2019 - [info]  master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.203(10.0.20.203:3306). Executed 0 events.
Fri Jun 14 11:30:27 2019 - [info]   done.
Fri Jun 14 11:30:27 2019 - [info]  Resetting slave 10.0.20.203(10.0.20.203:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306)..
Fri Jun 14 11:30:27 2019 - [info]  Executed CHANGE MASTER.
Fri Jun 14 11:30:28 2019 - [info]  Slave started.
Fri Jun 14 11:30:29 2019 - [info] End of log messages from 10.0.20.203 ...
Fri Jun 14 11:30:29 2019 - [info] 
Fri Jun 14 11:30:29 2019 - [info] -- Slave switch on host 10.0.20.203(10.0.20.203:3306) succeeded.
Fri Jun 14 11:30:29 2019 - [info] Log messages from 10.0.20.204 ...
Fri Jun 14 11:30:29 2019 - [info] 
Fri Jun 14 11:30:27 2019 - [info]  Waiting to execute all relay logs on 10.0.20.204(10.0.20.204:3306)..
Fri Jun 14 11:30:27 2019 - [info]  master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.204(10.0.20.204:3306). Executed 0 events.
Fri Jun 14 11:30:27 2019 - [info]   done.
Fri Jun 14 11:30:27 2019 - [info]  Resetting slave 10.0.20.204(10.0.20.204:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306)..
Fri Jun 14 11:30:27 2019 - [info]  Executed CHANGE MASTER.
Fri Jun 14 11:30:28 2019 - [info]  Slave started.
Fri Jun 14 11:30:29 2019 - [info] End of log messages from 10.0.20.204 ...
Fri Jun 14 11:30:29 2019 - [info] 
Fri Jun 14 11:30:29 2019 - [info] -- Slave switch on host 10.0.20.204(10.0.20.204:3306) succeeded.
Fri Jun 14 11:30:29 2019 - [info] Unlocking all tables on the orig master:
Fri Jun 14 11:30:29 2019 - [info] Executing UNLOCK TABLES..
Fri Jun 14 11:30:29 2019 - [info]  ok.
Fri Jun 14 11:30:29 2019 - [info] Starting orig master as a new slave..
Fri Jun 14 11:30:29 2019 - [info]  Resetting slave 10.0.20.202(10.0.20.202:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306)..
Fri Jun 14 11:30:29 2019 - [info]  Executed CHANGE MASTER.
Fri Jun 14 11:30:30 2019 - [info]  Slave started.
Fri Jun 14 11:30:30 2019 - [info] All new slave servers switched successfully.
Fri Jun 14 11:30:30 2019 - [info] 
Fri Jun 14 11:30:30 2019 - [info] * Phase 5: New master cleanup phase..
Fri Jun 14 11:30:30 2019 - [info] 
Fri Jun 14 11:30:30 2019 - [info]  10.0.20.201: Resetting slave info succeeded.
Fri Jun 14 11:30:30 2019 - [info] Switching master to 10.0.20.201(10.0.20.201:3306) completed successfully.

查看狀態

node01

[root@node01 ~]# mysql -uroot -p123456 -e 'show slave status\G'
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@node01 ~]# ip a | grep 20
    inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0
    inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1

node02

[root@node02 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running'
mysql: [Warning] Using a password on the command line interface can be insecure.
                  Master_Host: 10.0.20.201
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
[root@node02 ~]# ip a | grep 20
    inet 10.0.20.202/24 brd 10.0.20.255 scope global bond0

node03

[root@node03 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running'
mysql: [Warning] Using a password on the command line interface can be insecure.
                  Master_Host: 10.0.20.201
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

node04

[root@node04 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running'
mysql: [Warning] Using a password on the command line interface can be insecure.
                  Master_Host: 10.0.20.201
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

從上面各個數據庫的狀態能夠看出來，主庫已經變成了node01了，而且vip也漂移到node01的機器上了。