MySQL高可用方案--MHA部署及故障轉移

架構設計及必要配置

主機環境

IP                 主機名             擔任角色
192.168.192.128  node_master    MySQL-Master| MHA-Node
192.168.192.129  node_slave     MySQL-Slave | MHA-Node(備選Master)
192.168.192.130  manager_slave  MySQL-Slave | MHA-Manager
.....................................................................................................................................................
爲了節省機器,這裏選擇只讀的從庫192.168.192.129(從庫不對外提供讀的服務)做爲候選主庫,即candicate master,或是專門用於備份
一樣,爲了節省機器,這裏選擇192.168.192.130這臺從庫做爲manager server(實際生產環節中,機器充足的狀況下, 通常是專門選擇一臺機器做爲Manager server)html

舒適提示:
快速更改主機名(個人主機名是須要更改的)
[root@master1 ~]# hostnamectl set-hostname node_master
[root@node1 ~]# hostnamectl set-hostname node_slave
[root@node2 ~]# hostnamectl set-hostname manager_slave
.....................................................................................................................................................node

必要配置

1. 添加每臺機器的hosts文件實現主機名hostname登陸(3臺機器都要添加,不過這一步可也是能夠不作的)
[root@node_master ~]# vim /etc/hosts
..............
192.168.192.128 node_master
192.168.192.129 node_slave
192.168.192.130 manager_slave
也能夠添加完一臺使用scp命令進行hosts文件拷貝(前提是其餘機器沒有其餘的hosts文件設置)
[root@node_master ~]# scp /etc/hosts root@192.168.192.129:/etc/hosts
[root@node_master ~]# scp /etc/hosts root@192.168.192.130:/etc/hosts


2. 作服務器免密登陸(3臺機器都要添加,這一步是必需要作的,相當重要)
[root@node_master ~]# ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsa
[root@node_master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub "root@192.168.192.128"
[root@node_master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub "root@192.168.192.129"
[root@node_master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub "root@192.168.192.130"

測試可否免密登陸(正常是不須要輸入密碼直接登陸的)
[root@node_master ~]# ssh 192.168.192.128
Last login: Tue Dec 18 15:56:41 2018 from 192.168.192.1
[root@node_master ~]#

[root@node_master ~]# ssh 192.168.192.129
Last login: Tue Dec 18 16:03:56 2018 from 192.168.192.128
[root@node_slave ~]#

[root@node_master ~]# ssh 192.168.192.130
Last login: Tue Dec 18 16:04:02 2018 from 192.168.192.128
[root@manager_slave ~]# 
爲了確保萬無一失在最好也在其餘的兩臺機器上操做免密測試

安裝MySQL並實現主從

主從環境設計及安裝(MySQL5.7的版本)

環境設計(一主兩從):
192.168.192.128 MySQL-Master(主庫)
192.168.192.129 MySQL-Slave (從庫)
192.168.192.130 MySQL-Slave (從庫)
安裝教程:http://www.javashuo.com/article/p-zsycgvbg-cq.html (採用yum安裝方式)mysql

配置主從

1.主從my.cnf配置文件設置
    主從配置文件(yum安裝的MySQL配置文件是/etc/my.cnf)修改成以下配置

    ------主庫(192.168.192.128)配置------
    server-id=1                     #數據庫惟一ID,主從的標識號絕對不能重複
    log-bin=mysql-bin               #開啓bin-log,並指定文件目錄和文件名前綴
    binlog-ignore-db=mysql          #不一樣步mysql系統數據庫。若是是多個不一樣步庫,就以此格式另寫幾行;也能夠在一行,中間逗號隔開
    sync_binlog=1                   #確保binlog日誌寫入後與硬盤同步
    binlog_checksum=crc32           #跳過現有的採用checksum的事件,mysql5.6.5之後的版本中binlog_checksum=crc32,而低版本都是binlog_checksum=none
    binlog_format=mixed             #bin-log日誌文件格式,設置爲MIXED能夠防止主鍵重複
    validate_password_policy=0      #指定密碼策略
    validate_password = off         #禁用密碼策略

    配置完成後保存,並重啓MySQL服務
    [root@node_master ~]# systemctl restart mysqld

    ------從庫1(192.168.192.129)配置------
    server-id=2                     #數據庫惟一ID,主從的標識號絕對不能重複
    log-bin=mysql-bin               #開啓bin-log,並指定文件目錄和文件名前綴
    binlog-ignore-db=mysql          #不一樣步mysql系統數據庫(千萬要注意:主從同步中的過濾字段要一致,不然後面使用masterha_check_repl 檢查複製時就會出錯!)
    slave-skip-errors=all           #跳過全部的錯誤錯誤,繼續執行復制操做
    validate_password_policy=0      #指定密碼策略
    validate_password = off         #禁用密碼策略

    配置完成後保存,並重啓MySQL服務
    [root@node_slave ~]# systemctl restart mysqld

    ------從庫2(192.168.192.130)配置------
    server-id=3                     #數據庫惟一ID,主從的標識號絕對不能重複
    log-bin=mysql-bin               #開啓bin-log,並指定文件目錄和文件名前綴
    binlog-ignore-db=mysql          #不一樣步mysql系統數據庫(千萬要注意:主從同步中的過濾字段要一致,不然後面使用masterha_check_repl 檢查複製時就會出錯!)
    slave-skip-errors=all           #跳過全部的錯誤錯誤,繼續執行復制操做
    validate_password_policy=0      #指定密碼策略
    validate_password = off         #禁用密碼策略

    配置完成後保存,並重啓MySQL服務
    [root@manager_slave ~]# systemctl restart mysqld

    注意:
    主從設置時,若是設置了binlog-ignore-db 和 replicate-ignore-db 過濾規則,則主從必須相同。即要使用binlog-ignore-db過濾字段,則主從配置都使用這個,
    要是使用replicate-ignore-db過濾字段,則主從配置都使用這個,千萬不能主從配置使用的過濾字段不同!由於MHA 在啓動時候會檢測過濾規則,若是過濾規則不一樣,MHA 不啓動監控和故障轉移。

2.建立用戶mha管理的帳號(在三臺節點上都須要執行)
    mysql> grant super,reload,replication client,select on *.* to manager@'192.168.192.%' identified by 'Manager_1234';
    Query OK, 0 rows affected, 1 warning (0.00 sec)

    mysql> grant create,insert,update,delete,drop on *.* to manager@'192.168.192.%';
    Query OK, 0 rows affected (0.00 sec)

3.建立主從帳號(在三臺節點上都須要執行)
    mysql> grant reload,super,replication slave on *.*  to 'slave'@'192.168.192.%' identified by 'Slave_1234';
    Query OK, 0 rows affected, 1 warning (0.00 sec)

    mysql> flush privileges;
    Query OK, 0 rows affected (0.01 sec)

4.配置主從
    在主服務器(192.168.192.128)上執行
    mysql> show master status;
    +------------------+----------+--------------+------------------+-------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
    +------------------+----------+--------------+------------------+-------------------+
    | mysql-bin.000001 |     1169 |              | mysql            |                   |
    +------------------+----------+--------------+------------------+-------------------+
    1 row in set (0.00 sec)

    在兩臺從服務器(192.168.192.129,192.168.192.130)上執行
    設置以前先把從庫停掉
    mysql> stop slave;
    Query OK, 0 rows affected, 1 warning (0.00 sec)

    配置主從
    mysql> change master to master_host='192.168.192.128',master_port=3306,master_user='slave',master_password='Slave_1234',master_log_file='mysql-bin.000001',master_log_pos=1169;
    Query OK, 0 rows affected, 2 warnings (0.01 sec)

    啓動主從
    mysql> start slave;
    Query OK, 0 rows affected (0.00 sec)

    查看同步狀態(Slave_IO_Running和Slave_SQL_Running爲YES表示主從配置成功)
    mysql> show slave status\G;
    *************************** 1. row ***************************
                    Slave_IO_State: Waiting for master to send event
                        Master_Host: 192.168.192.128
                        Master_User: slave
                        Master_Port: 3306
                    Connect_Retry: 60
                    Master_Log_File: mysql-bin.000001
                Read_Master_Log_Pos: 1169
                    Relay_Log_File: node_slave-relay-bin.000002
                    Relay_Log_Pos: 320
            Relay_Master_Log_File: mysql-bin.000001
                Slave_IO_Running: Yes
                Slave_SQL_Running: Yes

具體主從原理篇點擊連接:http://www.javashuo.com/article/p-qkhuonyf-bx.htmllinux

具體主從部署篇點擊連接:http://www.javashuo.com/article/p-yfrxquwc-k.html  git

安裝及配置MHA

MHA下載地址

----------------------------------------------MHA下載地址----------------------------------------------
mha包括manager節點和data節點,其中:
data節點包括原有的MySQL複製結構中的主機,至少3臺,即1主2從,當master failover後,還能保證主從結構;只需安裝node包。
manager server:運行監控腳本,負責monitoring 和 auto-failover;須要安裝node包和manager包

下載地址:
MHA Node下載:   https://coding.net/u/brian_zhu/p/mha4/git/raw/master/mha4mysql-node-0.58.tar.gz
MHA Manager下載:https://coding.net/u/brian_zhu/p/mha4/git/raw/master/mha4mysql-manager-0.58.tar.gz

安裝MHA

1. MHA Node安裝
----------------------------------------------MHA Node安裝----------------------------------------------

在全部data數據節點機上安裝安裝MHA Node(三臺機器都要安裝MHA Node)

先安裝所需的perl模塊
[root@node_master ~]# yum -y install perl perl-DBD-MySQL perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN

下載解壓
[root@node_master ~]# wget https://coding.net/u/brian_zhu/p/mha4/git/raw/master/mha4mysql-node-0.58.tar.gz
[root@node_master ~]# tar zxf mha4mysql-node-0.58.tar.gz
[root@node_master ~]# cd mha4mysql-node-0.58

編譯安裝
[root@node_master mha4mysql-node-0.58]# perl Makefile.PL
[root@node_master mha4mysql-node-0.58]# make && make install


2. MHA Manager安裝
----------------------------------------------MHA Manager安裝----------------------------------------------

在manager節點(即192.168.192.130)上安裝MHA Manager(注意manager節點也要安裝MHA node)

安裝epel-release源
[root@manager_slave ~]# yum -y install epel-release

安裝perl的mysql包
[root@manager_slave ~]# yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles perl-Time-HiRes -y

下載解壓
[root@manager_slave ~]# wget https://coding.net/u/brian_zhu/p/mha4/git/raw/master/mha4mysql-manager-0.58.tar.gz
[root@manager_slave ~]# tar zxf mha4mysql-manager-0.58.tar.gz 

編譯安裝
[root@manager_slave ~]# cd mha4mysql-manager-0.58/
[root@manager_slave mha4mysql-manager-0.58]# perl Makefile.PL
[root@manager_slave mha4mysql-manager-0.58]# make && make install

安裝完MHA Manager後,在/usr/local/bin目錄下會生成如下腳本
[root@manager_slave mha4mysql-manager-0.58]# ll /usr/local/bin/
total 39060
-r-xr-xr-x 1 root root    17639 Dec 18 16:56 apply_diff_relay_logs
-rwxr-xr-x 1 root root 11739376 Oct 18 09:41 docker-compose
-rwxr-xr-x 1 root root 28160480 Oct 23 16:42 docker-machine
-r-xr-xr-x 1 root root     4807 Dec 18 16:56 filter_mysqlbinlog
-r-xr-xr-x 1 root root     1995 Dec 18 17:00 masterha_check_repl
-r-xr-xr-x 1 root root     1779 Dec 18 17:00 masterha_check_ssh
-r-xr-xr-x 1 root root     1865 Dec 18 17:00 masterha_check_status
-r-xr-xr-x 1 root root     3201 Dec 18 17:00 masterha_conf_host
-r-xr-xr-x 1 root root     2517 Dec 18 17:00 masterha_manager
-r-xr-xr-x 1 root root     2165 Dec 18 17:00 masterha_master_monitor
-r-xr-xr-x 1 root root     2373 Dec 18 17:00 masterha_master_switch
-r-xr-xr-x 1 root root     5172 Dec 18 17:00 masterha_secondary_check
-r-xr-xr-x 1 root root     1739 Dec 18 17:00 masterha_stop
-r-xr-xr-x 1 root root     8337 Dec 18 16:56 purge_relay_logs
-r-xr-xr-x 1 root root     7525 Dec 18 16:56 save_binary_logs

其中:
masterha_check_repl             檢查MySQL複製情況
masterha_check_ssh              檢查MHA的SSH配置情況
masterha_check_status           檢測當前MHA運行狀態
masterha_conf_host              添加或刪除配置的server信息
masterha_manager                啓動MHA
masterha_stop                   中止MHA
masterha_master_monitor         檢測master是否宕機
masterha_master_switch          控制故障轉移(自動或者手動)
masterha_secondary_check        多種線路檢測master是否存活

另外:
在../mha4mysql-manager-0.58/samples/scripts/下還有如下腳本,須要將其複製到/usr/local/bin

[root@manager_slave mha4mysql-manager-0.58]# ll ../mha4mysql-manager-0.58/samples/scripts/
total 32
-rwxr-xr-x 1 1000 1000  3648 Mar 23  2018 master_ip_failover        #自動切換時VIP管理腳本,不是必須,若是咱們使用keepalived的,咱們能夠本身編寫腳本完成對vip的管理,好比監控mysql,若是mysql異常,咱們中止keepalived就行,這樣vip就會自動漂移
-rwxr-xr-x 1 1000 1000  9870 Mar 23  2018 master_ip_online_change   #在線切換時VIP腳本,不是必須,一樣能夠能夠自行編寫簡單的shell完成
-rwxr-xr-x 1 1000 1000 11867 Mar 23  2018 power_manager             #故障發生後關閉master腳本,不是必須
-rwxr-xr-x 1 1000 1000  1360 Mar 23  2018 send_report               #故障切換髮送報警腳本,不是必須,可自行編寫簡單的shell完成

[root@manager_slave mha4mysql-manager-0.58]# cp ../mha4mysql-manager-0.58/samples/scripts/* /usr/local/bin/

配置MHA

1.MHA Manager配置(MHA的配置文件)
----------------------------------------------MHA Manager配置----------------------------------------------
在管理節點(192.168.192.130)上進行下面配置
[root@manager_slave mha4mysql-manager-0.58]# mkdir -p /etc/masterha
[root@manager_slave mha4mysql-manager-0.58]# cp samples/conf/app1.cnf /etc/masterha/
[root@manager_slave mha4mysql-manager-0.58]# vim /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1                                      #設置manager的工做目錄
manager_log=/var/log/masterha/app1/manager.log                              #設置manager的日誌
    
ssh_user=root                                                               #ssh免密鑰登陸的賬號名
user=manager                                                                #manager用戶
password=Manager_1234                                                       #manager用戶的密碼
repl_user=slave                                                             #mysql複製賬號,用來在主從機之間同步二進制日誌等
repl_password=Slave_1234                                                    #設置mysql中root用戶的密碼,這個密碼是前文中建立監控用戶的那個密碼
ping_interval=1                                                             #設置監控主庫,發送ping包的時間間隔,用來檢查master是否正常,默認是3秒,嘗試三次沒有迴應的時候自動進行railover
master_ip_failover_script= /usr/local/bin/master_ip_failover                #設置自動failover時候的切換腳本
master_ip_online_change_script= /usr/local/bin/master_ip_online_change      #設置手動切換時候的切換腳本
    
[server1]
hostname=192.168.192.128
port=3306
master_binlog_dir=/var/lib/mysql/                                           #設置master 保存binlog的位置,以便MHA能夠找到master的日誌,我這裏的也就是mysql的數據目錄
    
[server2]
hostname=192.168.192.129
port=3306
candidate_master=1                                                          #設置爲候選master,即master機宕掉後,優先啓用這臺做爲新master,若是設置該參數之後,發生主從切換之後將會將此從庫提高爲主庫,即便這個主庫不是集羣中事件最新的slave
check_repl_delay=0                                                          #默認狀況下若是一個slave落後master 100M的relay logs的話,MHA將不會選擇該slave做爲一個新的master,由於對於這個slave的恢復須要花費很長時間經過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略複製延時,這個參數對於設置了candidate_master=1的主機很是有用,由於這個候選主在切換的過程當中必定是新的master
master_binlog_dir=/var/lib/mysql/
    
[server3]
hostname=192.168.192.130
port=3306
#candidate_master=1
master_binlog_dir=/var/lib/mysql/
    
#[server4]
#hostname=host4
#no_master=1



2.設置relay log的清除方式(在兩臺slave節點上)
舒適提示:
MHA在發生切換的過程當中,從庫的恢復過程當中依賴於relay log的相關信息,因此這裏要將relay log的自動清除設置爲OFF,採用手動清除relay log的方式。
在默認狀況下,從服務器上的中繼日誌會在SQL線程執行完畢後被自動刪除。可是在MHA環境中,這些中繼日誌在恢復其餘從服務器時可能會被用到,所以須要禁用
中繼日誌的自動刪除功能。按期清除中繼日誌須要考慮到複製延時的問題。在ext3的文件系統下,刪除大的文件須要必定的時間,會致使嚴重的複製延時。爲了避
免複製延時,須要暫時爲中繼日誌建立硬連接,由於在linux系統中經過硬連接刪除大文件速度會很快。(在mysql數據庫中,刪除大表時,一般也採用創建硬連接的方式)

MHA節點中包含了pure_relay_logs命令工具,它能夠爲中繼日誌建立硬連接,執行SET GLOBAL relay_log_purge=1,等待幾秒鐘以便SQL線程切換到新的中繼日誌,
再執行SET GLOBAL relay_log_purge=0

pure_relay_logs腳本參數以下所示:
--user mysql                    用戶名
--password mysql                密碼
--port                          端口號
--workdir                       指定建立relay log的硬連接的位置,默認是/var/tmp,因爲系統不一樣分區建立硬連接文件會失敗,故須要執行硬連接具體位置,成功執行腳本後,硬連接的中繼日誌文件被刪除
--disable_relay_log_purge       默認狀況下,若是relay_log_purge=1,腳本會什麼都不清理,自動退出,經過設定這個參數,當relay_log_purge=1的狀況下會將relay_log_purge設置爲0。清理relay log以後,最後將參數設置爲OFF。

清除relay log的方法(兩臺slave節點)
[root@node_slave ~]# mysql -uroot -p12345 -e 'set global relay_log_purge=0'
[root@manager_slave ~]# mysql -uroot -p12345 -e 'set global relay_log_purge=0'

爲了定時提供一個按期清理relay的腳本
設置按期清理relay腳本(在兩臺slave節點上操做)
[root@node_slave ~]# vim /root/purge_relay_log.sh
#!/bin/bash
user=root
passwd=12345
port=3306
host=localhost
log_dir='/data/masterha/log'
work_dir='/data'
purge='/usr/local/bin/purge_relay_logs'

if [ ! -d $log_dir ]
then
    mkdir -p $log_dir
fi

$purge --user=$user --passwd=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1

給腳本添加執行權限
[root@node_slave ~]# chmod 755 /root/purge_relay_log.sh 

添加到crontab按期執行
[root@node_slave ~]# crontab -e
0 5 * * * /bin/bash /root/purge_relay_log.sh

測試腳本執行
purge_relay_logs腳本刪除中繼日誌不會阻塞SQL線程
下面咱們手動執行如下purge_relay_log看看具體的狀況
[root@node_slave ~]# /usr/local/bin/purge_relay_logs --user=root --host=localhost --password=12345 --disable_relay_log_purge --port=3306 --workdir=/data
2018-12-18 17:48:25: purge_relay_logs script started.
Found relay_log.info: /var/lib/mysql/relay-log.info
Opening /var/lib/mysql/node_slave-relay-bin.000001 ..
Opening /var/lib/mysql/node_slave-relay-bin.000002 ..
Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
2018-12-18 17:48:28: All relay log purging operations succeeded.

執行腳本看看
[root@node_slave ~]# sh purge_relay_log.sh 

生成的了一個日誌文件
[root@node_slave ~]# ll /data/masterha/log/
total 4
-rw-r--r-- 1 root root 234 Dec 18 17:49 purge_relay_logs.log 

測試MHA

檢查MHA集羣的各個狀態
1.檢查MHA集羣SSH
------------------------------檢查SSH配置------------------------------
檢查SSH配置(在manager節點(即192.168.192.130服務器上執行))
檢查MHA Manger到全部MHA Node的SSH鏈接狀態:
[root@manager_slave ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Tue Dec 18 17:53:34 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Dec 18 17:53:34 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Dec 18 17:53:34 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Dec 18 17:53:34 2018 - [info] Starting SSH connection tests..
Tue Dec 18 17:53:35 2018 - [debug] 
Tue Dec 18 17:53:34 2018 - [debug]  Connecting via SSH from root@192.168.192.128(192.168.192.128:22) to root@192.168.192.129(192.168.192.129:22)..
Tue Dec 18 17:53:34 2018 - [debug]   ok.
Tue Dec 18 17:53:34 2018 - [debug]  Connecting via SSH from root@192.168.192.128(192.168.192.128:22) to root@192.168.192.130(192.168.192.130:22)..
Tue Dec 18 17:53:35 2018 - [debug]   ok.
Tue Dec 18 17:53:36 2018 - [debug] 
Tue Dec 18 17:53:35 2018 - [debug]  Connecting via SSH from root@192.168.192.129(192.168.192.129:22) to root@192.168.192.128(192.168.192.128:22)..
Tue Dec 18 17:53:35 2018 - [debug]   ok.
Tue Dec 18 17:53:35 2018 - [debug]  Connecting via SSH from root@192.168.192.129(192.168.192.129:22) to root@192.168.192.130(192.168.192.130:22)..
Tue Dec 18 17:53:35 2018 - [debug]   ok.
Tue Dec 18 17:53:36 2018 - [debug] 
Tue Dec 18 17:53:35 2018 - [debug]  Connecting via SSH from root@192.168.192.130(192.168.192.130:22) to root@192.168.192.128(192.168.192.128:22)..
Tue Dec 18 17:53:35 2018 - [debug]   ok.
Tue Dec 18 17:53:35 2018 - [debug]  Connecting via SSH from root@192.168.192.130(192.168.192.130:22) to root@192.168.192.129(192.168.192.129:22)..
Tue Dec 18 17:53:36 2018 - [debug]   ok.
Tue Dec 18 17:53:36 2018 - [info] All SSH connection tests passed successfully.
出現上面的結果表示SSH配置成功

2.檢查MySQL複製情況
------------------------------檢查MySQL複製情況------------------------------
使用mha工具check檢查repl環境(在manager節點(即192.168.192.130服務器上執行))
[root@manager_slave ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf

這裏出現了一個小插曲
執行後出現了下面的錯誤
......................
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /usr/local/bin/master_ip_failover line 93.
Execution of /usr/local/bin/master_ip_failover aborted due to compilation errors.
Tue Dec 18 18:28:28 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln229]  Failed to get master_ip_failover_script status with return code 255:0.
Tue Dec 18 18:28:28 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/local/bin/masterha_check_repl line 48.
Tue Dec 18 18:28:28 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Tue Dec 18 18:28:28 2018 - [info] Got exit code 1 (Not master dead).

仍是出現如上報錯,緣由是:
原來Failover兩種方式:一種是虛擬IP地址,一種是全局配置文件。MHA並無限定使用哪種方式,而是讓用戶本身選擇,虛擬IP地址的方式會牽扯到其它的軟件,好比keepalive軟件,並且還要修改腳本master_ip_failover

解決方法:
先暫時註釋掉管理節點的/etc/masterha/app1.cnf文件中的master_ip_failover_script= /usr/local/bin/master_ip_failover這個選項。
後面引入keepalived後和修改該腳本之後再開啓該選項
[root@manager_slave /]# cat /etc/masterha/app1.cnf | grep master_ip_failover_script
#master_ip_failover_script= /usr/local/bin/master_ip_failover   

最後在經過masterha_check_repl腳本查看整個mysql集羣的複製狀態
[root@manager_slave /]# masterha_check_repl --conf=/etc/masterha/app1.cnf          
Tue Dec 18 18:32:43 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Dec 18 18:32:43 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Dec 18 18:32:43 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Dec 18 18:32:43 2018 - [info] MHA::MasterMonitor version 0.58.
Tue Dec 18 18:32:45 2018 - [info] GTID failover mode = 0
Tue Dec 18 18:32:45 2018 - [info] Dead Servers:
Tue Dec 18 18:32:45 2018 - [info] Alive Servers:
Tue Dec 18 18:32:45 2018 - [info]   192.168.192.128(192.168.192.128:3306)
Tue Dec 18 18:32:45 2018 - [info]   192.168.192.129(192.168.192.129:3306)
Tue Dec 18 18:32:45 2018 - [info]   192.168.192.130(192.168.192.130:3306)
......................
Tue Dec 18 18:32:48 2018 - [info] Checking replication health on 192.168.192.129..
Tue Dec 18 18:32:48 2018 - [info]  ok.
Tue Dec 18 18:32:48 2018 - [info] Checking replication health on 192.168.192.130..
Tue Dec 18 18:32:48 2018 - [info]  ok.
Tue Dec 18 18:32:48 2018 - [warning] master_ip_failover_script is not defined.
Tue Dec 18 18:32:48 2018 - [warning] shutdown_script is not defined.
Tue Dec 18 18:32:48 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
這個時候,發現整個複製環境情況是ok的了!!!

3.檢查MHA Manager的狀態
------------------------------檢查MHA Manager的狀態------------------------------
[root@manager_slave /]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

注意:若是正常,會顯示"PING_OK",不然會顯示"NOT_RUNNING",這表明MHA監控沒有開啓

開啓MHA Manager監控
使用下面命令放在後臺執行啓動動做
[root@manager_slave /]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
啓動參數介紹:
--remove_dead_master_conf      該參數表明當發生主從切換後,老的主庫的ip將會從配置文件中移除。
--manger_log                   日誌存放位置
--ignore_last_failover         在缺省狀況下,若是MHA檢測到連續發生宕機,且兩次宕機間隔不足8小時的話,則不會進行Failover,之因此這樣限制是爲了
                                避免ping-pong效應。該參數表明忽略上次MHA觸發切換產生的文件,默認狀況下,MHA發生切換後會在日誌目錄,也就是上面我
                                設置的/data產生app1.failover.complete文件,下次再次切換的時候若是發現該目錄下存在該文件將不容許觸發切換,除非
                                在第一次切換後收到刪除該文件,爲了方便,這裏設置爲--ignore_last_failover。
再次查看MHA Manager監控是否正常:
[root@manager_slave /]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:8162) is running(0:PING_OK), master:192.168.192.128

能夠看見已經在監控了,並且master的主機爲192.168.192.128

查看啓動日誌
[root@manager_slave /]# tail -n20 /var/log/masterha/app1/manager.log
    Relay log found at /var/lib/mysql, up to manager_slave-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/manager_slave-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Dec 18 18:35:11 2018 - [info] Slaves settings check done.
Tue Dec 18 18:35:11 2018 - [info] 
192.168.192.128(192.168.192.128:3306) (current master)
+--192.168.192.129(192.168.192.129:3306)
+--192.168.192.130(192.168.192.130:3306)

Tue Dec 18 18:35:11 2018 - [warning] master_ip_failover_script is not defined.
Tue Dec 18 18:35:11 2018 - [warning] shutdown_script is not defined.
Tue Dec 18 18:35:11 2018 - [info] Set master ping interval 1 seconds.
Tue Dec 18 18:35:11 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Dec 18 18:35:11 2018 - [info] Starting ping health check on 192.168.192.128(192.168.192.128:3306)..
Tue Dec 18 18:35:11 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
其中"Ping(SELECT) succeeded, waiting until MySQL doesn't respond.."說明整個系統已經開始監控了。

4.關閉MHA Manager監控
------------------------------關閉MHA Manager監控------------------------------
關閉就很簡單了,直接使用masterha_stop命令就能夠完成了
[root@manager_slave ~]# masterha_stop --conf=/etc/masterha/app1.cnf
MHA Manager is not running on app1(2:NOT_RUNNING).
查看MHA Manager監控,發現已關閉
[root@manager_slave ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

配置VIP

配置keeplived實現vip

vip配置能夠採用兩種方式,一種經過keepalived的方式管理虛擬ip浮動;另一種經過腳本方式啓動虛擬ip的方式(即不須要keepalived或者heartbeat相似的軟件)sql

經過keepalive的方式管理vipdocker

---------------------------------------------------------第一種方式:經過keepalive的方式管理vip---------------------------------------------------------
下載軟件進行並進行安裝(在兩臺master上都要安裝,準確的說一臺是master(192.168.192.128);另一臺是備選master(192.168.192.129),在沒有切換之前是slave)
[root@node_master ~]# yum -y install openssl-devel
[root@node_master ~]# wget http://www.keepalived.org/software/keepalived-2.0.10.tar.gz
[root@node_master ~]# tar zxf keepalived-2.0.10.tar.gz
[root@node_master ~]# cd keepalived-2.0.10/
[root@node_master keepalived-2.0.10]# ./configure --prefix=/usr/local/keepalived
[root@node_master keepalived-2.0.10]# make && make install
[root@node_master keepalived-2.0.10]# cp keepalived/etc/init.d/keepalived /etc/init.d/
[root@node_master keepalived-2.0.10]# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
[root@node_master keepalived-2.0.10]# mkdir /etc/keepalived
[root@node_master keepalived-2.0.10]# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
[root@node_master keepalived-2.0.10]# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

keepalived配置
------------在master上配置(192.168.192.128節點上的配置)------------------
[root@node_master keepalived-2.0.10]# cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
[root@node_master keepalived-2.0.10]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
    notification_email {
    1024331014@qq.com
    }
    notification_email_from 1024331014@qq.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens37
    virtual_router_id 51
    priority 150
    advert_int 1
    nopreempt

    authentication {
    auth_type PASS
    auth_pass 1111
    }

    virtual_ipaddress {
        192.168.192.131
    }
}
其中router_id MySQL HA表示設定keepalived組的名稱,將192.168.192.131這個虛擬ip綁定到該主機的ens37網卡上,而且設置了狀態爲backup模式,
將keepalived的模式設置爲非搶佔模式(nopreempt),priority 150表示設置的優先級爲150。

------------在candicate master上配置(192.168.192.129節點上的配置)------------------
[root@node_slave keepalived-2.0.10]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
    notification_email {
    1024331014@qq.com
    }
    notification_email_from 1024331014@qq.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens37
    virtual_router_id 51
    priority 120
    advert_int 1
    nopreempt

    authentication {
    auth_type PASS
    auth_pass 1111
    }

    virtual_ipaddress {
        192.168.192.131
    }
}

啓動keepalived服務
--------------在master上啓動並查看日誌(192.168.192.128節點上)------------------------------
[root@node_master keepalived-2.0.10]# /etc/init.d/keepalived start
Starting keepalived (via systemctl):                       [  OK  ]

查看VIP是否成功配置(下面有個192.168.192.131的地址)
[root@node_master keepalived-2.0.10]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:19:14:db brd ff:ff:ff:ff:ff:ff
    inet 192.168.52.129/24 brd 192.168.52.255 scope global noprefixroute dynamic ens33
        valid_lft 1250sec preferred_lft 1250sec
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:19:14:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.192.128/24 brd 192.168.192.255 scope global noprefixroute ens37
        valid_lft forever preferred_lft forever
    inet 192.168.192.131/32 scope global ens37
        valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:4b:9c:ed:04 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
        valid_lft forever preferred_lft forever

[root@node_master keepalived-2.0.10]# tail -10 /var/log/messages 
Dec 19 10:43:05 node_master Keepalived_vrrp[6856]: Sending gratuitous ARP on ens37 for 192.168.192.131
Dec 19 10:43:05 node_master Keepalived_vrrp[6856]: (VI_1) Sending/queueing gratuitous ARPs on ens37 for 192.168.192.131
Dec 19 10:43:05 node_master Keepalived_vrrp[6856]: Sending gratuitous ARP on ens37 for 192.168.192.131
Dec 19 10:43:05 node_master Keepalived_vrrp[6856]: Sending gratuitous ARP on ens37 for 192.168.192.131
Dec 19 10:43:05 node_master Keepalived_vrrp[6856]: Sending gratuitous ARP on ens37 for 192.168.192.131
Dec 19 10:43:05 node_master Keepalived_vrrp[6856]: Sending gratuitous ARP on ens37 for 192.168.192.131
Dec 19 10:43:07 node_master crond: sendmail: fatal: parameter inet_interfaces: no local interface found for ::1
Dec 19 10:44:02 node_master systemd: Started Session 90 of user root.
Dec 19 10:44:02 node_master systemd: Starting Session 90 of user root.
Dec 19 10:44:08 node_master crond: sendmail: fatal: parameter inet_interfaces: no local interface found for ::1

發現vip資源已經綁定到192.168.192.128這個master節點機上了

--------------在candicate master上啓動(192.168.192.129節點上)----------------------------
[root@node_slave keepalived-2.0.10]# /etc/init.d/keepalived start
Starting keepalived (via systemctl):                       [  OK  ]

[root@node_slave keepalived-2.0.10]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:f8:41:0f brd ff:ff:ff:ff:ff:ff
    inet 192.168.52.130/24 brd 192.168.52.255 scope global noprefixroute dynamic ens33
        valid_lft 1083sec preferred_lft 1083sec
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:f8:41:19 brd ff:ff:ff:ff:ff:ff
    inet 192.168.192.129/24 brd 192.168.192.255 scope global noprefixroute ens37
        valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:b0:14:23:57 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
        valid_lft forever preferred_lft forever

從上面的信息能夠看到keepalived已經配置成功。

注意:
上面兩臺服務器的keepalived都設置爲了BACKUP模式,在keepalived中2種模式,分別是master->backup模式和backup->backup模式。這兩種模式有很大區別。
在master->backup模式下,一旦主庫宕機,虛擬ip會自動漂移到從庫,當主庫修復後,keepalived啓動後,還會把虛擬ip搶佔過來,即便設置了非搶佔模式(nopreempt)搶佔ip的動做也會發生。
在backup->backup模式下,當主庫宕機後虛擬ip會自動漂移到從庫上,當原主庫恢復和keepalived服務啓動後,並不會搶佔新主的虛擬ip,即便是優先級高於從庫的優先級別,也不會發生搶佔。
爲了減小ip漂移次數,一般是把修復好的主庫當作新的備庫。

MHA引入keepalivedshell

MHA引入keepalived(MySQL服務進程掛掉時經過MHA 中止keepalived)
要想把keepalived服務引入MHA,只須要修改切換是觸發的腳本文件master_ip_failover便可,在該腳本中添加在master發生宕機時對keepalived的處理。
編輯腳本/usr/local/bin/master_ip_failover,修改後以下:
[root@manager_slave ~]# cp /usr/local/bin/master_ip_failover /usr/local/bin/master_ip_failover.bak
[root@manager_slave ~]# vim /usr/local/bin/master_ip_failover       #這裏有個須要注意的點就是腳本中的vip的地址須要作下修改
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.192.131';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

# A simple system call that enable the VIP on the new master
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

如今已經修改這個腳本了,如今打開/etc/masterha/app1.cnf文件中的master_ip_failover_script註釋,再檢查集羣狀態,看是否會報錯
[root@manager_slave ~]# grep 'master_ip_failover_script' /etc/masterha/app1.cnf     
master_ip_failover_script= /usr/local/bin/master_ip_failover

[root@manager_slave ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 
....................
Checking the Status of the script.. OK 
Wed Dec 19 10:55:37 2018 - [info]  OK.
Wed Dec 19 10:55:37 2018 - [warning] shutdown_script is not defined.
Wed Dec 19 10:55:37 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
能夠看出複製狀況正常!
/usr/local/bin/master_ip_failover添加或者修改的內容意思是當主庫數據庫發生故障時,會觸發MHA切換,MHA Manager會停掉主庫上的keepalived服務,
觸發虛擬ip漂移到備選從庫,從而完成切換。固然能夠在keepalived裏面引入腳本,這個腳本監控mysql是否正常運行,若是不正常,則調用該腳本殺掉keepalived進程。

使用腳本實現vip

經過腳本的方式管理VIP數據庫

---------------------------------------------------------第二種方式:經過腳本的方式管理VIP---------------------------------------------------------
爲了測試第二種方式,我把上keepalived停掉了
這裏是修改/usr/local/bin/master_ip_failover,修改完成後內容以下。還須要手動在master服務器上綁定一個vip

先在master節點(192.168.192.128)上綁定vip
[root@node_master ~]# ifconfig ens37:0 192.168.192.131/24         #這裏要注意的是網卡名和地址
[root@node_master ~]# ifconfig 
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.52.129  netmask 255.255.255.0  broadcast 192.168.52.255
        ether 00:0c:29:19:14:db  txqueuelen 1000  (Ethernet)
        RX packets 7850  bytes 8852518 (8.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2478  bytes 176378 (172.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens37: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.192.128  netmask 255.255.255.0  broadcast 192.168.192.255
        ether 00:0c:29:19:14:e5  txqueuelen 1000  (Ethernet)
        RX packets 4200  bytes 448286 (437.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5620  bytes 2671664 (2.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens37:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.192.131  netmask 255.255.255.0  broadcast 192.168.192.255
        ether 00:0c:29:19:14:e5  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

在manager(192.168.192.130)節點修改/usr/local/bin/master_ip_failover
[root@manager_slave ~]# cp /usr/local/bin/master_ip_failover /usr/local/bin/master_ip_failover.bak.keep
[root@manager_slave ~]# vim /usr/local/bin/master_ip_failover     #這裏有個須要注意的點就是腳本中的vip的地址須要作下修改
#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.192.131/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
    return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

檢查MHA的複製狀況是否正常
[root@manager_slave ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Checking the Status of the script.. OK 
Wed Dec 19 11:06:36 2018 - [info]  OK.
Wed Dec 19 11:06:36 2018 - [warning] shutdown_script is not defined.
Wed Dec 19 11:06:36 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

注意:
要將/etc/masterha/app1.cnf文件中的master_ip_failover_script註釋打開
爲了防止腦裂發生,推薦生產環境採用腳本的方式來管理虛擬ip,而不是使用keepalived來完成。到此爲止,基本MHA集羣已經配置完畢。
接下來就是實際的測試環節了。經過一些測試來看一下MHA究竟是如何進行工做的。

failover故障切換

自動切換

自動切換(必須先啓動MHA Manager,不然沒法自動切換(固然手動切換不須要開啓MHA Manager監控))vim

開啓MHA Manager監控(在192.168.192.130上執行,固然若是已經啓動了則不須要再次執行)
使用下面命令放在後臺執行啓動動做
[root@manager_slave /]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

停掉主庫(192.168.192.128)mysql服務,模擬主庫發生故障,進行自動failover操做
[root@node_master ~]# systemctl stop mysqld


查看MHA切換日誌,瞭解整個切換過程。在manager管理節點(192.168.192.130)上查看日誌
[root@manager_slave ~]# cat /var/log/masterha/app1/manager.log
................
................
----- Failover Report -----

app1: MySQL Master failover 192.168.192.128(192.168.192.128:3306) to 192.168.192.129(192.168.192.129:3306) succeeded

Master 192.168.192.128(192.168.192.128:3306) is down!

Check MHA Manager logs at manager_slave:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.192.128(192.168.192.128:3306)
The latest slave 192.168.192.129(192.168.192.129:3306) has all relay logs for recovery.
Selected 192.168.192.129(192.168.192.129:3306) as a new master.
192.168.192.129(192.168.192.129:3306): OK: Applying all logs succeeded.
192.168.192.129(192.168.192.129:3306): OK: Activated master IP address.
192.168.192.130(192.168.192.130:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.192.130(192.168.192.130:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.192.129(192.168.192.129:3306)
192.168.192.129(192.168.192.129:3306): Resetting slave info succeeded.
Master failover to 192.168.192.129(192.168.192.129:3306) completed successfully.
看到最後的Master failover to 192.168.192.129(192.168.192.129:3306) completed successfully.說明備選master如今已經上位了
從上面的輸出能夠看出整個MHA的切換過程,共包括如下的步驟:
1)配置文件檢查階段,這個階段會檢查整個集羣配置文件配置
2)宕機的master處理,這個階段包括虛擬ip摘除操做,主機關機操做(這個我這裏尚未實現,須要研究)
3)複製dead maste和最新slave相差的relay log,並保存到MHA Manger具體的目錄下
4)識別含有最新更新的slave
5)應用從master保存的二進制日誌事件(binlog events)
6)提高一個slave爲新的master進行復制
7)使其餘的slave鏈接新的master進行復制

最後啓動MHA Manger監控,查看集羣裏面如今誰是master
[root@manager_slave ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
[root@manager_slave ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:7036) is running(0:PING_OK), master:192.168.192.129 

手動切換

手動Failover(MHA Manager必須沒有運行)
手動failover,這種場景意味着在業務上沒有啓用MHA自動切換功能,當主服務器故障時,人工手動調用MHA來進行故障切換操做,具體命令以下:

確保mha manager關閉
[root@manager_slave ~]# masterha_stop --conf=/etc/masterha/app1.cnf

注意:若是MHA manager檢測到沒有dead的server,將報錯,並結束failover
[root@manager_slave ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.192.128 --dead_master_port=3306 --new_master_host=192.168.192.129 --new_master_port=3306 --ignore_last_failover
輸出的信息會詢問你是否進行切換(yes/NO): 輸入yes
.............
.............
----- Failover Report -----

app1: MySQL Master failover 192.168.192.128(192.168.192.128:3306) to 192.168.192.129(192.168.192.129:3306) succeeded

Master 192.168.192.128(192.168.192.128:3306) is down!

Check MHA Manager logs at manager_slave for details.

Started manual(interactive) failover.
Invalidated master IP address on 192.168.192.128(192.168.192.128:3306)
The latest slave 192.168.192.129(192.168.192.129:3306) has all relay logs for recovery.
Selected 192.168.192.129(192.168.192.129:3306) as a new master.
192.168.192.129(192.168.192.129:3306): OK: Applying all logs succeeded.
192.168.192.129(192.168.192.129:3306): OK: Activated master IP address.
192.168.192.130(192.168.192.130:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.192.130(192.168.192.130:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.192.129(192.168.192.129:3306)
192.168.192.129(192.168.192.129:3306): Resetting slave info succeeded.
Master failover to 192.168.192.129(192.168.192.129:3306) completed successfully.

咱們看到上面的輸出已經切換成功了,這樣即模擬了master(192.168.192.128)宕機的狀況下手動把192.168.192.129提高爲主庫的操做過程。

在線切換

在許多狀況下, 須要將現有的主服務器遷移到另一臺服務器上,好比主服務器硬件故障,RAID 控制卡須要重建,將主服務器移到性能更好的服務器上等等。維護主服務器引發性能降低,致使停機時間至少沒法寫入數據。另外,阻塞或殺掉當前運行的會話會致使主主之間數據不一致的問題發生。
MHA提供快速切換和優雅的阻塞寫入,這個切換過程只須要 0.5-2s 的時間,這段時間內數據是沒法寫入的。在不少狀況下,0.5-2s 的阻塞寫入是能夠接受的。所以切換主服務器不須要計劃分配維護時間窗口。

MHA在線切換的大概過程:
1)檢測複製設置和肯定當前主服務器
2)肯定新的主服務器
3)阻塞寫入到當前主服務器
4)等待全部從服務器遇上覆制
5)授予寫入到新的主服務器
6)從新設置從服務器

注意,在線切換的時候應用架構須要考慮如下兩個問題:
1)自動識別master和slave的問題(master的機器可能會切換),若是採用了vip的方式,基本能夠解決這個問題。
2)負載均衡的問題(能夠定義大概的讀寫比例,每臺機器可承擔的負載比例,當有機器離開集羣時,須要考慮這個問題)

爲了保證數據徹底一致性,在最快的時間內完成切換,MHA的在線切換必須知足如下條件纔會切換成功,不然會切換失敗。
1)全部slave的IO線程都在運行
2)全部slave的SQL線程都在運行
3)全部的show slave status的輸出中Seconds_Behind_Master參數小於或者等於running_updates_limit秒,若是在切換過程當中不指定running_updates_limit,那麼默認狀況下running_updates_limit爲1秒。
4)在master端,經過show processlist輸出,沒有一個更新花費的時間大於running_updates_limit秒。

在線切換步驟以下:

首先,manager節點上停掉MHA監控:
[root@manager_slave ~]# masterha_stop --conf=/etc/masterha/app1.cnf

其次,進行在線切換操做(模擬在線切換主庫操做,原主庫192.168.192.128變爲slave,192.168.192.129提高爲新的主庫)
[root@manager_slave ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.192.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
執行後出現了下面的錯誤
..........
Starting master switch from 192.168.192.128(192.168.192.128:3306) to 192.168.192.129(192.168.192.129:3306)? (yes/NO): yes
Thu Dec 20 12:04:47 2018 - [info] Checking whether 192.168.192.129(192.168.192.129:3306) is ok for the new master..
Thu Dec 20 12:04:47 2018 - [info]  ok.
Thu Dec 20 12:04:47 2018 - [info] 192.168.192.128(192.168.192.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Thu Dec 20 12:04:47 2018 - [info] 192.168.192.128(192.168.192.128:3306): Resetting slave pointing to the dummy host.
Thu Dec 20 12:04:47 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Dec 20 12:04:47 2018 - [info] 
Thu Dec 20 12:04:47 2018 - [info] * Phase 2: Rejecting updates Phase..
Thu Dec 20 12:04:47 2018 - [info] 
Thu Dec 20 12:04:47 2018 - [info] Executing master ip online change script to disable write on the current master:
Thu Dec 20 12:04:47 2018 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.192.128 --orig_master_ip=192.168.192.128 --orig_master_port=3306 --orig_master_user='manager' --new_master_host=192.168.192.129 --new_master_ip=192.168.192.129 --new_master_port=3306 --new_master_user='manager' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Thu Dec 20 12:04:47 2018 811566 Set read_only on the new master.. ok.
Thu Dec 20 12:04:47 2018 815337 Drpping app user on the orig master..
Got Error: Undefined subroutine &main::FIXME_xxx_drop_app_user called at /usr/local/bin/master_ip_online_change line 152.

Thu Dec 20 12:04:47 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53.

報錯緣由是:這是因爲沒法找到對FIXME_xxx_drop_app_user定義,因爲perl不熟,我暫時註釋掉相關drop user的行或FIXME_xxx等,不會影響其餘過程
解決方法:
[root@manager_slave ~]# cp /usr/local/bin/master_ip_online_change /usr/local/bin/master_ip_online_change.bak  #備份這個文件由於要對這個文件進行修改
[root@manager_slave ~]# vim /usr/local/bin/master_ip_online_change  #編輯這個文件
找到下面這兩條將其使用#註釋掉
FIXME_xxx_drop_app_user($orig_master_handler);
FIXME_xxx_create_app_user($new_master_handler);

從新執行在線切換
[root@manager_slave ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.192.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
........
Thu Dec 20 12:11:06 2018 - [info] -- Slave switch on host 192.168.192.130(192.168.192.130:3306) succeeded.
Thu Dec 20 12:11:06 2018 - [info] Unlocking all tables on the orig master:
Thu Dec 20 12:11:06 2018 - [info] Executing UNLOCK TABLES..
Thu Dec 20 12:11:06 2018 - [info]  ok.
Thu Dec 20 12:11:06 2018 - [info] Starting orig master as a new slave..
Thu Dec 20 12:11:06 2018 - [info]  Resetting slave 192.168.192.128(192.168.192.128:3306) and starting replication from the new master 192.168.192.129(192.168.192.129:3306)..
Thu Dec 20 12:11:06 2018 - [info]  Executed CHANGE MASTER.
Thu Dec 20 12:11:06 2018 - [info]  Slave started.
Thu Dec 20 12:11:06 2018 - [info] All new slave servers switched successfully.
Thu Dec 20 12:11:06 2018 - [info] 
Thu Dec 20 12:11:06 2018 - [info] * Phase 5: New master cleanup phase..
Thu Dec 20 12:11:06 2018 - [info] 
Thu Dec 20 12:11:06 2018 - [info]  192.168.192.129: Resetting slave info succeeded.
Thu Dec 20 12:11:06 2018 - [info] Switching master to 192.168.192.129(192.168.192.129:3306) completed successfully.

看到上面的輸出結果表示在線切換成功
其中參數的意思:
--orig_master_is_new_slave       切換時加上此參數是將原 master 變爲 slave 節點,若是不加此參數,原來的 master 將不啓動
--running_updates_limit=10000    故障切換時,候選master 若是有延遲的話, mha 切換不能成功,加上此參數表示延遲在此時間範圍內均可切換(單位爲s),可是切換的時間長短是由recover 時relay 日誌的大小決定

 至此整個了MySQL高可用方案--MHA部署完畢!!!

相關文章
相關標籤/搜索