MySQL高可用MHA原理及測試

MySQL高可用MHA原理及測試

文章來源: 陶老師運維筆記- 微信公衆號node

1. MHA 簡介

MySQL高可用MHA原理

MHA 簡介:mysql

MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案,它由日本的youshimaton開發,是一套優秀的做爲MySQL高可用性環境下故障切換和主從提高的高可用軟件。 在MySQL故障切換過程當中,MHA能作到在0~30秒以內自動完成數據庫的故障切換操做,而且在進行故障切換的過程當中,MHA能在最大程度上保證數據的一致性,以達到真正意義上的高可用。git

MHA優勢:github

  1. master failover和slave promotion很是快速。 2. 自動探測,多重檢測,切換過程當中支持調用其餘腳本的接口。
  2. master crash不會致使數據不一致,自動補齊數據,維護數據一致性。
  3. 不須要修改複製的任何設置,簡單易部署,對現有架構無影響。
  4. 不須要增長不少額外的機器來部署MHA,支持多實例集中管理。
  5. 沒有任何性能影響。
  6. 跨存儲引擎,支持任何引擎

MHA不支持的場景:sql

  • 多層次複製 (M1->M2->Slave)
  • MySQL5.0.45或更低版本
  • 複製規則(replication filteing rules(binlog-do-db,replicate-ignore-db,etc) 必須一致
  • Load Data [Local] Infile with SBR 不支持。

2. MHA工做原理

MHA工做原理總結爲如下幾條: (1)從宕機崩潰的master保存二進制日誌事件(binlog events); (2)識別含有最新更新的slave; (3)應用差別的中繼日誌(relay log) 到其餘slave; (4)應用從master保存的二進制日誌事件(binlog events); (5)提高一個slave爲新master; (6)使用其餘的slave鏈接新的master進行復制。數據庫

在MHA自動故障切換過程當中,MHA試圖從宕機的主服務器上保存二進制日誌,最大程度的保證數據的不丟失,但這並不老是可行的。例如,若是主服務器硬件故障或沒法經過ssh訪問,MHA無法保存二進制日誌,只進行故障轉移而丟失了最新的數據。使用MySQL 5.5的半同步複製,能夠大大下降數據丟失的風險。MHA能夠與半同步複製結合起來。若是隻有一個slave已經收到了最新的二進制日誌,MHA能夠將最新的二進制日誌應用於其餘全部的slave服務器上,所以能夠保證全部節點的數據一致性。vim

2.1 主庫故障處理場景

主庫故障:centos

Master Failover

  • 場景1: 所有從庫都有所有binlog event

這是一種最理想的狀況,可是事情常不可能這樣幸運。 bash

Failure Example

  • 場景2:Master有事務沒有同步到從庫 使用了半同步複製能夠避免這個風險。 服務器

    Master有事務沒有同步到從庫

  • 場景3:部分從庫缺失binlog event

    部分從庫缺失binlog event

2.2 問題及困難

主庫Failover的困難點,最近的從庫仍是缺失了主庫binlog event。

問題及困難

2.3 問題解決

目標實現:

image.png

保存binlog event

Saving binlog events from (crashed) master

找出最近master 的slave:

Understanding SHOW SLAVE STATUS

Identifying the latest slave

識別出各從庫丟失的event

Next issue: Applying diffs to other slaves

Identifying what events need to be applied

Relay log internals: 「at」 and 「end_log_pos」

Relay log internals: How to identify diffs

實施恢復

Steps for recovery

Recovery procedure

Automating failover

3. MHA組件架構

3.1 MHA組件架構

  • MHA manager: 管理節點,一般單獨部署在一臺獨立的服務器上,用來管理多個master/slave集羣,也可部署在一臺slave節點上,每一個master/slave集羣稱爲一個application。 MHA Manager會定時探測集羣中的master節點,當發現master節點出現故障時,它能夠自動將具備最新數據的slave節點提高爲新的master節點,而後將全部其它 的slave節點從新指向新的master節點。

  • MHA node: 數據節點,運行在每臺MariaDB服務器上(manager/master/slave),它經過監控具有解析和清理logs功能的腳原本加快故障轉移。

MHA組件

One Manager per Datacenter

3.2 軟件包功能介紹

Manager工具:

Manager工具包:
masterha_manger             啓動MHA 
masterha_check_ssh      檢查MHA的SSH配置情況 
masterha_check_repl         檢查MySQL複製情況 
masterha_master_monitor     檢測master是否宕機 
masterha_check_status       檢測當前MHA運行狀態 
masterha_master_switch  控制故障轉移(自動或者手動)
masterha_conf_host      添加或刪除配置的server信息
複製代碼

Node工具:

Node工具(全部集羣節點):
這些工具一般由MHA Manager的腳本觸發,無需人爲操做
save_binary_logs            保存和複製master的二進制日誌 
apply_diff_relay_logs       識別差別的中繼日誌事件並將其差別的事件應用於其餘的
purge_relay_logs            清除中繼日誌(不會阻塞SQL線程)
複製代碼

3.3 MHA處理過程

====== monitor node 監控節點======
(1) 監控全部節點,重點是master
(2) 監控到master宕機(實例(ssh能),主機(ssh不能連))
(3) 監控主從狀態
====== failover 故障轉移 ======
(3) 對比各節點的GTID號碼。
(3) 數據補償1:若是ssh能連,從節點當即保存本身缺失部分的二進制日誌
(4) 選主:對比各節點的GTID號碼便可,選一個最接近於主庫數據的從節點,恢復缺失的日誌,並將從庫切換爲主庫 stop slave  reset slave all
(5) 數據補償2:若是ssh不能連,計算兩個從庫的relaylog的差別,恢復到數據少的從庫中.
(6) 2號從庫change master to 到 新主,開啓新的主從關係
====== 應用透明=====
(7) 使用vip機制實現應用透明
====== 補充功能 ======
(8) 自動修復主庫(加入集羣)待開發...
(9) 二次數據補償的問題 (binlog server)
(10) 提醒功能(send_report)
(11) 權重的問題
複製代碼

4. MHA環境搭建

4.1 環境規劃

使用三臺機器來作一個簡易的MHA環境,MHA軟件版本爲mha-0.56。

IP Port DB角色 MHA角色 軟件版本
192.124.64.212 3307 DB1 master mha-node centos6,mha-0.56
192.124.64.213 3307 DB2 slave mha-node centos6,mha-0.56
192.124.64.214 3307 DB3 slave mha-node node-manager centos6,mha-0.56

安裝建議:

1.manager能夠單獨裝在任意一臺機器上; 2.一個manager能夠管理多套mysql集羣; 3.建議不要將manager裝在主庫上(防止主庫斷電,斷網); 4.全部數據庫必須安裝node包; 5.manager的依賴有node

4.2 各節點SSH互信

#各節點執行以下操做
ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa
#
ssh-copy-id -i /root/.ssh/id_rsa.pub  root@192.124.64.213
ssh-copy-id -i /root/.ssh/id_rsa.pub  root@192.124.64.212
ssh-copy-id -i /root/.ssh/id_rsa.pub  root@192.124.64.214
複製代碼

4.3 MySQL主從安裝

1.安裝MySQL

#使用自已寫的腳本安裝MySQL
mysql_install -P 3307 -r m -b 2G -v  5.6.27 
複製代碼

2.搭建主從關係

DB1作爲主,DB2,DB3爲從庫
#受權
grant replication client,replication slave on *.* to 'repl'@'10.%' IDENTIFIED BY 'repl123';
grant all privileges on *.* to mha@'10.%' identified by 'mha123';

#DB2,DB3創建主從關係
CHANGE MASTER TO
MASTER_HOST='192.124.64.212',
MASTER_PORT=3307,
MASTER_USER='repl',
MASTER_PASSWORD='repl123',
MASTER_AUTO_POSITION = 1;
#
start slave ;
show slave status\G
複製代碼

說明:

  • 從庫建議用命令設爲只讀 set global read_only=1,不要將該參數寫進配置文件中.
  • 從庫關閉中繼日誌的清除 set global relay_log_purge=0. MHA在發生切換的過程當中,從庫的恢復過程當中依賴於relay log的相關信息,因此這裏要將relay log的自動清除設置爲OFF,採用手動清除relay log的方式。

4.4 MHA軟件安裝

下載軟件並安裝: 全部節點(數據庫master,slave,MHA manager節點)都須要安裝MHA node。由於MHA manager也須要依賴MHA node。

#軟件下載
mha官網:https://code.google.com/archive/p/mysql-master-ha/
github下載地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads

全部節點安裝Node軟件依賴包
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

#在DB3節點上安裝mha-manager
yum install mha4mysql-manager-0.56-0.el6.noarch.rpm
複製代碼

5. MHA配置

爲保證MHA正常工做,須要配置MHA的配置文件,爲參數設置合理正確的值,這些參數包括服務器IP,數據庫用戶名密碼,工做目錄與日誌等。 MHA源碼安裝,則會有兩個配置文件模板,在路徑 $MHA_BASE/samples/conf/ 下的app1.cnf 和 masterha_default .cnf。

建立目錄:

mkdir /etc/mha/script -p
建立日誌目錄
mkdir -p /var/log/mha/
複製代碼

編輯mha配置文件:

vim /etc/mha/mysql3307.cnf
[server default]
manager_log=/var/log/mha/mysql3307/manager        
manager_workdir=/var/log/mha/mysql3307            
master_binlog_dir=/data1/mysql_3307/
user=mha                                   
password=mha123                               
ping_interval=2
repl_user=repl
repl_password=repl123
ssh_user=root
#master_ip_failover_script=/etc/mha/script/master_ip_failover
#shutdown_script= /etc/mha/script/power_manager
#report_script= /etc/mha/script/send_master_failover_mail

[server1]                                   
hostname=192.124.64.212
port=3307                                  
[server2]            
hostname=192.124.64.213
port=3307
[server3]
hostname=192.124.64.214
port=3307
複製代碼

6. 檢查狀態

1.互信檢查

$masterha_check_ssh  --conf=/etc/mha/mysql3307.cnf 
Sat Mar 21 23:14:28 2020 - [warning] Global configuration file /etc/masterha_default .cnf not found. Skipping.
Sat Mar 21 23:14:28 2020 - [info] Reading application default configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:14:28 2020 - [info] Reading server configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:14:28 2020 - [info] Starting SSH connection tests..
...
Sat Mar 21 23:14:29 2020 - [info] All SSH connection tests passed successfully.
複製代碼

2.檢查複製

masterha_check_repl --conf=/etc/mha/mysql3307.cnf 
Sat Mar 21 23:17:00 2020 - [warning] Global configuration file /etc/masterha_default .cnf not found. Skipping.
Sat Mar 21 23:17:00 2020 - [info] Reading application default configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:17:00 2020 - [info] Reading server configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:17:00 2020 - [info] MHA::MasterMonitor version 0.56.
Sat Mar 21 23:17:01 2020 - [info] GTID failover mode = 1
Sat Mar 21 23:17:01 2020 - [info] Dead Servers:
Sat Mar 21 23:17:01 2020 - [info] Alive Servers:
Sat Mar 21 23:17:01 2020 - [info]   192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info]   192.124.64.213(192.124.64.213:3307)
Sat Mar 21 23:17:01 2020 - [info]   192.124.64.214(192.124.64.214:3307)
Sat Mar 21 23:17:01 2020 - [info] Alive Slaves:
Sat Mar 21 23:17:01 2020 - [info]   192.124.64.213(192.124.64.213:3307)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:17:01 2020 - [info]     GTID ON
Sat Mar 21 23:17:01 2020 - [info]     Replicating from 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info]   192.124.64.214(192.124.64.214:3307)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:17:01 2020 - [info]     GTID ON
Sat Mar 21 23:17:01 2020 - [info]     Replicating from 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] Current Alive Master: 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] Checking slave configurations..
Sat Mar 21 23:17:01 2020 - [info]  read_only=1 is not set on slave 192.124.64.213(192.124.64.213:3307).
Sat Mar 21 23:17:01 2020 - [info]  read_only=1 is not set on slave 192.124.64.214(192.124.64.214:3307).
Sat Mar 21 23:17:01 2020 - [info] Checking replication filtering settings..
Sat Mar 21 23:17:01 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Mar 21 23:17:01 2020 - [info]  Replication filtering check ok.
Sat Mar 21 23:17:01 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Mar 21 23:17:01 2020 - [info] Checking SSH publickey authentication settings on the current master..
Warning: Permanently added '192.124.64.212' (RSA) to the list of known hosts.
Sat Mar 21 23:17:01 2020 - [info] HealthCheck: SSH to 192.124.64.212 is reachable.
Sat Mar 21 23:17:01 2020 - [info] 
192.124.64.212(192.124.64.212:3307) (current master)
 +--192.124.64.213(192.124.64.213:3307)
 +--192.124.64.214(192.124.64.214:3307)

Sat Mar 21 23:17:01 2020 - [info] Checking replication health on 192.124.64.213..
Sat Mar 21 23:17:01 2020 - [info]  ok.
Sat Mar 21 23:17:01 2020 - [info] Checking replication health on 192.124.64.214..
Sat Mar 21 23:17:01 2020 - [info]  ok.
Sat Mar 21 23:17:01 2020 - [warning] master_ip_failover_script is not defined.
Sat Mar 21 23:17:01 2020 - [warning] shutdown_script is not defined.
Sat Mar 21 23:17:01 2020 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
複製代碼

7. 啓動MHA

啓動MHA:

#查看MHA manager監控狀態,這裏沒有運行
# masterha_check_status --conf=/etc/mha/mysql3307.cnf 
mysql3307 is stopped(2:NOT_RUNNING).

#啓動MHA監控 --remove_dead_master_conf --ignore_last_failover
$nohup  masterha_manager --conf=/etc/mha/mysql3307.cnf  --remove_dead_master_conf --ignore_last_failover   >> /var/log/mha/mysql3307/mha-3307.log 2>&1 &

#檢查狀態
$masterha_check_status --conf=/etc/mha/mysql3307.cnf 
mysql3307 (pid:10265) is running(0:PING_OK), master:192.124.64.212
複製代碼

中止MHA監控:

中止MHA監控
masterha_stop --conf=/etc/mha/mysql3307.cnf 
複製代碼

8. 測試MHA

8.1 檢查當前狀態

$masterha_check_status --conf=/etc/mha/mysql3307.cnf 
mysql3307 (pid:10265) is running(0:PING_OK), master:192.124.64.212

$mysql -h 192.124.64.214 -P 3307 -e "set global relay_log_purge=0"  
$mysql -h 192.124.64.214 -P 3307 -e "show global variables like '%relay_log_purge%'"   
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| relay_log_purge | OFF   |
+-----------------+-------+

$mysql -h 192.124.64.214 -P 3307 -e "show slave status\G" |egrep 'Master_Host|Master_Port|Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master' -i    
                  Master_Host: 192.124.64.212
                  Master_Port: 3307
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 0
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
複製代碼

8.2 測試自動切換

1. 測試主庫故障,自動切換。

#kill master DB1
$kill mysql_pid; 
複製代碼

2. 查看詳細日誌:

觀察manager 日誌,末尾必須顯示successfully,纔算正常切換成功。 tail -f /var/log/mha/mysql3307/manager.log

cat /var/log/mha/mysql3307/manager 
Sat Mar 21 23:24:59 2020 - [info] MHA::MasterMonitor version 0.56.
Sat Mar 21 23:25:01 2020 - [info] GTID failover mode = 1
Sat Mar 21 23:25:01 2020 - [info] Dead Servers:
Sat Mar 21 23:25:01 2020 - [info] Alive Servers:
Sat Mar 21 23:25:01 2020 - [info]   192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:25:01 2020 - [info]   192.124.64.213(192.124.64.213:3307)
Sat Mar 21 23:25:01 2020 - [info]   192.124.64.214(192.124.64.214:3307)
Sat Mar 21 23:25:01 2020 - [info] Alive Slaves:
Sat Mar 21 23:25:01 2020 - [info]   192.124.64.213(192.124.64.213:3307)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:25:01 2020 - [info]     GTID ON
...
----- Failover Report -----

mysql3307: MySQL Master failover 192.124.64.212(192.124.64.212:3307) to 192.124.64.213(192.124.64.213:3307) succeeded

Master 192.124.64.212(192.124.64.212:3307) is down!

Check MHA Manager logs at LeDB-VM-124064214:/var/log/mha/mysql3307/manager for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.124.64.212(192.124.64.212:3307)
Selected 192.124.64.213(192.124.64.213:3307) as a new master.
192.124.64.213(192.124.64.213:3307): OK: Applying all logs succeeded.
192.124.64.213(192.124.64.213:3307): OK: Activated master IP address.
192.124.64.214(192.124.64.214:3307): OK: Slave started, replicating from 192.124.64.213(192.124.64.213:3307)
192.124.64.213(192.124.64.213:3307): Resetting slave info succeeded.
Master failover to 192.124.64.213(192.124.64.213:3307) completed successfully.
複製代碼

3.檢查結果

DB2變爲了主庫,DB3成爲了DB2的從庫。

masterha_check_status --conf=/etc/mha/mysql3307.cnf 
mysql3307 is stopped(2:NOT_RUNNING).
$mysql -h 192.124.64.213 -P 3307 -e "show slave status\G"
$mysql -h 192.124.64.214 -P 3307 -e "show slave status\G" |egrep 'Master_Host|Master_Port|Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master' -i
                  Master_Host: 192.124.64.213
                  Master_Port: 3307
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 0
複製代碼

8.3 手動切換

MHA Manager 必須沒有運行。 手動failover,這種場景意味着在業務上沒有啓用MHA自動切換功能,當主服務器故障時,人工手動調用MHA來進行故障切換操做,具體命令以下: 注意:若是,MHA manager檢測到沒有dead的server,將報錯,並結束failover。

#當前DB2 是master。關停MHA並kill 主庫192.124.64.213:3307。
#手動切換
$masterha_master_switch --master_state=dead --conf=/etc/mha/mysql3307.cnf --dead_master_host=192.124.64.213 --dead_master_port=3307 --new_master_host=192.124.64.212 --new_master_port=3307 --ignore_last_failover
複製代碼

輸出信息是交互式,會詢問你是否進行切換:建議閱讀輸出以理解切換手動切換過程。

Sat Mar 21 23:55:43 2020 - [info] MHA::MasterFailover version 0.56.
Sat Mar 21 23:55:43 2020 - [info] Starting master failover.
Sat Mar 21 23:55:43 2020 - [info] 
Sat Mar 21 23:55:43 2020 - [info] * Phase 1: Configuration Check Phase..
...
Sat Mar 21 23:55:46 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Mar 21 23:55:46 2020 - [info] 
Sat Mar 21 23:55:46 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
----- Failover Report -----

mysql3307: MySQL Master failover 192.124.64.213(192.124.64.213:3307) to 192.124.64.212(192.124.64.212:3307) succeeded

Master 192.124.64.213(192.124.64.213:3307) is down!

Check MHA Manager logs at LeDB-VM-124064214 for details.

Started manual(interactive) failover.
Invalidated master IP address on 192.124.64.213(192.124.64.213:3307)
Selected 192.124.64.212(192.124.64.212:3307) as a new master.
192.124.64.212(192.124.64.212:3307): OK: Applying all logs succeeded.
192.124.64.212(192.124.64.212:3307): OK: Activated master IP address.
192.124.64.214(192.124.64.214:3307): OK: Slave started, replicating from 192.124.64.212(192.124.64.212:3307)
192.124.64.212(192.124.64.212:3307): Resetting slave info succeeded.
Master failover to 192.124.64.212(192.124.64.212:3307) completed successfully.
複製代碼

參考:


陶老師運維筆記
相關文章
相關標籤/搜索