1、原理分析
node
一、MHA的簡介mysql
MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案,它由日本DeNA公司youshimaton(現就任於Facebook公司)開發,是一套優秀的做爲MySQL高可用性環境下故障切換和主從提高的高可用軟件。在MySQL故障切換過程當中,MHA能作到在0~30秒以內自動完成數據庫的故障切換操做,而且在進行故障切換的過程當中,MHA能在最大程度上保證數據的一致性,以達到真正意義上的高可用。sql
二、MHA組成數據庫
該軟件由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。MHA Manager能夠單獨部署在一臺獨立的機器上管理多個master-slave集羣,也能夠部署在一臺slave節點上。MHA Node運行在每臺MySQL服務器上,MHA Manager會定時探測集羣中的master節點,當master出現故障時,它能夠自動將數據的slave提高爲新的master,而後將全部其餘的slave從新指向新的master。整個故障轉移過程對應用程序徹底透明。vim
Manager工具包主要包括如下幾個工具:centos
> masterha_check_ssh #檢查MHA的SSH配置情況安全
masterha_check_repl #檢查MySQL複製情況服務器
masterha_manger #啓動MHA網絡
masterha_check_status #檢測當前MHA運行狀態架構
masterha_master_monitor #檢測master是否宕機
masterha_master_switch #控制故障轉移(自動或者手動)
masterha_conf_host #添加或刪除配置的server信息
Node工具包(這些工具一般由MHA Manager的腳本觸發,無需人爲操做)主要包括如下幾個工具:
> save_binary_logs #保存和複製master的二進制日誌
apply_diff_relay_logs #識別差別的中繼日誌事件並將其差別的事件應用於其餘的slave
filter_mysqlbinlog #去除沒必要要的ROLLBACK事件(MHA已再也不使用這個工具)
purge_relay_logs #清除中繼日誌(不會阻塞SQL線程)
注意:
爲了儘量的減小主庫硬件損壞宕機形成的數據丟失,所以在配置MHA的同時建議配置成MySQL 5.5的半同步複製。
異步複製(Asynchronous replication)
MySQL默認的複製便是異步的,主庫在執行完客戶端提交的事務後會當即將結果返給給客戶端,並不關心從庫是否已經接收並處理,這樣就會有一個問題,主若是crash掉了,此時主上已經提交的事務可能並無傳到從上,若是此時,強行將從提高爲主,可能致使新主上的數據不完整。
全同步複製(Fully synchronous replication)
指當主庫執行完一個事務,全部的從庫都執行了該事務才返回給客戶端。由於須要等待全部從庫執行完該事務才能返回,因此全同步複製的性能必然會收到嚴重的影響。須要有超時時間。
半同步複製(Semisynchronous replication)
介於異步複製和全同步複製之間,主庫在執行完客戶端提交的事務後不是馬上返回給客戶端,而是等待至少一個從庫接收到並寫到relay log中才返回給客戶端。相對於異步複製,半同步複製提升了數據的安全性,同時它也形成了必定程度的延遲,這個延遲最少是一個TCP/IP往返的時間。因此,半同步複製最好在低延時的網絡中使用。
三、MHA工做原理
(1)從宕機崩潰的master保存二進制日誌事件(binlog events);
(2)識別含有最新更新的slave;
(3)應用差別的中繼日誌(relay log) 到其餘slave;
(4)應用從master保存的二進制日誌事件(binlog events);
(5)提高一個slave爲新master;
(6)使用其餘的slave鏈接新的master進行復制。
在MHA自動故障切換過程當中,MHA試圖從宕機的主服務器上保存二進制日誌,較大程度的保證數據的不丟失,但這並不老是可行的。例如,若是主服務器硬件故障或沒法經過ssh訪問,MHA無法保存二進制日誌,只進行故障轉移而丟失了的數據。使用MySQL 5.5的半同步複製,能夠大大下降數據丟失的風險。MHA能夠與半同步複製結合起來。若是隻有一個slave已經收到了的二進制日誌,MHA能夠將的二進制日誌應用於其餘全部的slave服務器上,所以能夠保證全部節點的數據一致性。
目前MHA主要支持一主多從的架構,要搭建MHA,要求一個複製集羣中必須最少有三臺數據庫服務器,一主二從,即一臺充當master,一臺充當備用master,另一臺充當從庫,由於至少須要三臺服務器。
2、實驗環境
1 系統版本
cat /etc/redhat-release
統一版本,統一規範。
2 內核參數:uname -r
3 主機配置參數:準備4臺乾淨的主機node{1,2,3,4}
互相可以解析主機名,因爲節點上配置文件,不少都是大致相同的,只須要修改一份讓後使用for循環複製給其餘節點便可,簡單方便,因此這裏實現主機名的認證。
角色 ip地址 主機名 server_id 類型
Master 192.168.159.11 node1 1 寫入
Slave 192.168.159.151 node2 2 讀
Slave 192.168.159.120 node3 3 讀
MHA-Manager 192.168.159.121 node4 - 監控複製組
4 實現互相可以解析主機名
[root@vin ~]# cat /etc/hosts
192.168.159.11 node1.com node1
192.168.159.151 node2.com node2
192.168.159.120 node3.com node3
192.168.159.121 node4.com node4
5 實現主機間的互相無密鑰通訊
因爲使用MHA時,Manager須要驗證各個節點之間的ssh連通性,因此咱們在這裏須要實現給節點之間的無密鑰通訊,這裏採用了一個簡單的方法,那就是在某個節點上生成ssh密鑰對,實現本主機的認證,而後將認證文件以及公私鑰都複製到其餘節點,這樣就不須要,每一個節點都去建立密鑰對,再實現認證了。
> 操做在node4:Manager 節點上操做:
[root@node4 ~]# ssh-keygen -t rsa
[root@node4 ~]#ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.159.151
[root@node4 ~]#ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.159.120
[root@node4 ~]#ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.159.121
master節點:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.159.120
ssh-copy_id -i /root/.ssh/id_rsa.pub 192.168.159.121
slave1節點:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.159.151
ssh-copy_id -i /root/.ssh/id_rsa.pub 192.168.159.121
slave2節點:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub 192.168.159.120
ssh-copy_id -i /root/.ssh/id_rsa.pub 192.168.159.151
3、實現主從複製集羣
一、初始主節點master配置:
vim /etc/my.cnf
[mysqld]
server-id = 1
log-bin = master-log
relay-log = relay-log
skip_name_resolve = ON
二、全部slave節點依賴的配置:
[mysqld]
server-id = 2 #複製集羣中的各節點的id均必須惟一;
relay-log = relay-log
log-bin = master-log
read_only = ON
relay_log_purge = 0 #是否自動清空再也不須要中繼日誌
skip_name_resolve = ON
三、按上述要求分別配置好主從節點以後,按MySQL複製配置架構的配置方式將其配置完成
並啓動master節點和各slave節點,以及爲各slave節點啓動其IO和SQL線程,確保主從複製運行無誤。操做以下:
master節點上,爲slave建立可用於複製同步數據的用戶:
> MariaDB [(none)]>GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO slave@'192.168.%.%' IDENTIFIED BY 'magedu';
MariaDB [(none)]> FLUSH PRIVILEGES;
MariaDB [(none)]> SHOW MASTER STATUS;
各slave節點上:
[root@node3 ~]# mysql
MariaDB [(none)]> CHANGE MASTER TO
MASTER_HOST='192.168.159.151',
MASTER_USER='slave',
MASTER_PASSWORD='magedu',
MASTER_LOG_FILE='master-bin.000003',
MASTER_LOG_POS=415;
MariaDB [(none)]> START SLAVE;
MariaDB [(none)]> SHOW SLAVE STATUS\G
四、啓動slave
登陸數據庫後執行:slave start;
查看slave狀態:show slave status\G;
若是顯示:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
標誌着slave啓動成功,而且正常
五、在每一個slave上也須要建立,可供其餘的slave進行數據同步的受權用戶
這個受權用戶是爲了當master宕機後,每一臺slave都有可能成爲新的master,而這樣一來就須要有可供其餘slave進行數據同步的受權用戶,因此,提早要在每臺mysql服務器上設置可供其餘slave同步數據的受權用戶,這個受權用戶在進行master轉換的時候,是須要MHA manager進行管理轉換指定給slave的,因此在MHA manager的配置文件中有個repl_slave=xxx 選項,指定的就是slave帳戶,因此每臺機器上須要指定與MHA配置文件中相同的slave用戶名,本實驗,指定的該用戶是slave。
在master,slave1,slave2都要建立:
grant replication slave,replication client on *.* to slave@'%' identified by 'magedu';
4、安裝配置MHA
一、進行MHA安裝包安裝
Manager 節點:
#yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-devel
#yum install mha4mysql-manager-0.56-0.el6.noarch.rpm
#yum install mha4mysql-node-0.56-0.el6.norch.rpm
在全部節點上:
#yum install perl-DBD-MySQL
#yum install mha4mysql-node-0.56-0.el6.norch.rpm
二、初始化MHA,進行配置
Manager 節點須要爲每一個監控的master/slave集羣提供一個專用的配置文件,而全部的master/slave集羣也可共享全局配置。全局配置文件默認爲/etc/masterha_default.cnf,其爲可選配置。若是僅監控一組master/slave集羣,也可直接經過application的配置來提供各服務器的默認配置信息。而每一個application的配置文件路徑爲自定義。
三、 定義MHA管理配置文件 爲MHA專門建立一個管理用戶,方便之後使用,在mysql的主節點上,三個節點自動同步
mkdir -p /etc/masterha 建立配置文件的放置目錄(自定義)
vim /etc/masterha/app1.cnf 配置文件內容以下;
[server default] #適用於server1,2,3個server的配置
user=admin #mha管理用戶
password=magedu #mha管理密碼
manager_workdir=/etc/masterha/app1 #masterha本身的工做路徑
manager_log=/etc/masterha/manager.log #masterha本身的日誌文件
remote_workdir=/tmp/masterha/app1 #每一個遠程主機的工做目錄在何處
ssh_user=root #基於ssh的密鑰認證
repl_user=slave #數據庫用戶名
repl_password=magedu #數據庫密碼
ping_interval=1 #ping間隔時長
[server1] #節點1
hostname=192.168.159.151 #節點1主機地址
ssh_port=22 #節點1的ssh端口
candidate_master=1 #未來可不能夠成爲master候選節點/主節點
[server2]
hostname=192.168.159.120
ssh_port=22
candidate_master=1
[server2]
hostname=192.168.159.121
ssh_port=22
candidate_master=1
四、 檢測各節點間ssh互信通訊配置是否Ok:
[root@node4 ~]# masterha_check_ssh -conf=/etc/masterha/app1.cnf
輸出信息最後一行相似以下信息,表示其經過檢測。
[info]All SSH connection tests passed successfully. 檢查管理的MySQL複製集羣的鏈接配置參數是否OK:
[root@node4 ~]#masterha_check_repl -conf=/etc/masterha/app1.cnf
顯示MySQL Replication Health is OK.
若是測試時會報錯,多是從節點上沒有帳號,由於這個架構,任何一個從節點,將有可能成爲主節點,因此也須要建立帳號。
所以,這裏只要在mater節點上再次執行如下操做便可:
MariaDB [(none)]>GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO slave@'192.168.%.%' IDENTIFIED BY 'magedu';
MariaDB [(none)]> FLUSH PRIVILEGES;
Manager節點上再次運行,就顯示Ok了。
5、啓動MHA
[root@node4 ~]#nohup masterha_manager -conf=/etc/masterha/app1.cnf &> /etc/masterha/manager.log &
# 啓動成功後,可用過以下命令來查看master節點的狀態:
[root@node4 ~]#masterha_check_status -conf=/etc/masterha/app1.cnf
app1 (pid:4978)is running(0:PING_OK),master:192.168.159.11
上面的信息中「app1 (pid:4978)is running(0:PING_OK)」表示MHA服務運行OK,
不然,則會顯示爲相似「app1 is stopped(1:NOT_RUNNINg).」
若是要中止MHA,須要使用master_stop命令。
[root@node4 ~]#masterha_stop -conf=/etc/masterha/app1.cnf
6、測試MHA測試故障轉移
(1)在master節點關閉mariadb服務,模擬主節點數據崩潰
#killall -9 mysqld mysqld_safe
#rm -rf /var/lib/mysql/*
(2)在manager節點查看日誌:
/etc/masterha/manager.log 日誌文件中出現以下信息,表示manager檢測到192.168.159.151節點故障,然後自動執行故障轉移,將192.168.159.121提高爲主節
點。注意,故障轉移完成後,manager將會自動中止,此時使用masterha_check_status命令檢測將會遇到錯誤提示,以下所示:
#masterha_check_status -conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RINNING).
7、實驗報錯
報錯記錄1:
[root@data01 ~]# masterha_check_repl--conf=/etc/masterha/app1.cnf
Tue Apr 7 22:31:06 2015 - [warning] Global configuration file/etc/masterha_default.cnf not found. Skipping.
Tue Apr 7 22:31:07 2015 - [info] Reading application default configuration from/etc/masterha/app1.cnf..
Tue Apr 7 22:31:07 2015 - [info] Reading server configuration from/etc/masterha/app1.cnf..
Tue Apr 7 22:31:07 2015 - [info] MHA::MasterMonitor version 0.56.
Tue Apr 7 22:31:07 2015 - [error][/usr/local/share/perl5/MHA/Server.pm,ln303] Getting relay log directory orcurrent relay logfile from replication table failed on192.168.52.130(192.168.52.130:3306)!
Tue Apr 7 22:31:07 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln424] Error happened on checking configurations. at /usr/local/share/perl5/MHA/ServerManager.pmline 315
Tue Apr 7 22:31:07 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln523] Error happened on monitoring servers.
Tue Apr 7 22:31:07 2015 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
[root@centos7 ~]#
解決辦法:在192.168.159.151上面,vim /etc/my.cnf,在裏面添加
relay-log=/home/data/mysql/binlog/mysql-relay-bin
而後重啓mysql,再去從新設置slave鏈接。
STOP SLAVE;
RESET SLAVE;
CHANGE MASTER TO
MASTER_HOST='192.168.159.151',
MASTER_USER='slave',
MASTER_PASSWORD='magedu',
MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=415;
START SLAVE;
Ok,搞定了。
報錯記錄2:
[root@data01 perl]# masterha_check_repl--conf=/etc/masterha/app1.cnf
Thu Apr 9 00:54:32 2015 - [warning] Global configuration file/etc/masterha_default.cnf not found. Skipping.
Thu Apr 9 00:54:32 2015 - [info] Reading application default configuration from/etc/masterha/app1.cnf..
Thu Apr 9 00:54:32 2015 - [info] Reading server configuration from/etc/masterha/app1.cnf..
Thu Apr 9 00:54:32 2015 - [info] MHA::MasterMonitor version 0.56.
Thu Apr 9 00:54:32 2015 - [error][/usr/local/share/perl5/MHA/Server.pm,ln306] Getting relay log directory orcurrent relay logfile from replication table failed on 192.168.52.130(192.168.52.130:3306)!
Thu Apr 9 00:54:32 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln424] Error happened on checking configurations. at/usr/local/share/perl5/MHA/ServerManager.pm line 315
Thu Apr 9 00:54:32 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln523] Error happened on monitoring servers.
Thu Apr 9 00:54:32 2015 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
[root@data01 perl]#
解決方法:
/etc/masterha/app1.cnf文件裏面的參數配置,user和repl_user都是mysql帳號,須要建立好,這裏是只建立了repl_user而沒有建立好user帳號:
user=root
password=magedu
repl_user=repl
repl_password=magedu
在mysql節點上,創建容許manager 訪問數據庫的「 manager manager 」帳戶,主要用於SHOW SLAVESTATUS,RESET SLAVE; 因此須要執行以下命令:
GRANT SUPER,RELOAD,REPLICATIONCLIENT,SELECT ON *.* TO slave@'192.168.%.%' IDENTIFIED BY 'magedu';
錯誤記錄3:
[root@oraclem1 ~]# masterha_check_repl--conf=/etc/masterha/app1.cnf
Thu Apr 9 23:09:05 2015 - [warning] Global configuration file/etc/masterha_default.cnf not found. Skipping.
Thu Apr 9 23:09:05 2015 - [info] Reading application default configuration from/etc/masterha/app1.cnf..
Thu Apr 9 23:09:05 2015 - [info] Reading server configuration from/etc/masterha/app1.cnf..
Thu Apr 9 23:09:05 2015 - [info] MHA::MasterMonitor version 0.56.
Thu Apr 9 23:09:05 2015 - [error][/usr/local/share/perl5/MHA/ServerManager.pm,ln781] Multi-master configuration is detected, but two or more masters areeither writable (read-only is not set) or dead! Check configurations fordetails. Master configurations are as below:
Master 192.168.52.130(192.168.52.130:3306),replicating from 192.168.52.129(192.168.52.129:3306)
Master 192.168.52.129(192.168.52.129:3306),replicating from 192.168.52.130(192.168.52.130:3306)
Thu Apr 9 23:09:05 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln424] Error happened on checking configurations. at/usr/local/share/perl5/MHA/MasterMonitor.pm line 326
Thu Apr 9 23:09:05 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln523] Error happened on monitoring servers.
Thu Apr 9 23:09:05 2015 - [info] Got exit code 1 (Not master dead)
MySQL Replication Health is NOT OK!
[root@oraclem1 ~]
解決辦法:
mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)
報錯記錄4:
Thu Apr 9 23:54:32 2015 - [info] Checking SSH publickey authentication andchecking recovery script configurations on all alive slave servers..
Thu Apr 9 23:54:32 2015 - [info] Executing command : apply_diff_relay_logs --command=test--slave_user='manager' --slave_host=192.168.52.130 --slave_ip=192.168.52.130--slave_port=3306 --workdir=/var/tmp --target_version=5.6.12-log--manager_version=0.56 --relay_dir=/home/data/mysql/data--current_relay_log=mysqld-relay-bin.000011 --slave_pass=xxx
Thu Apr 9 23:54:32 2015 - [info] Connecting to root@192.168.52.130(192.168.52.130:22)..
Can't exec "mysqlbinlog": No suchfile or directory at /usr/local/share/perl5/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc1:0, please verify PATH, LD_LIBRARY_PATH, and client options
at/usr/local/bin/apply_diff_relay_logs line 493
Thu Apr 9 23:54:32 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln205] Slaves settings check failed!
Thu Apr 9 23:54:32 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln413] Slave configuration failed.
Thu Apr 9 23:54:32 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln424] Error happened on checking configurations. at /usr/local/bin/masterha_check_repl line 48
Thu Apr 9 23:54:32 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln523] Error happened on monitoring servers.
Thu Apr 9 23:54:32 2015 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
[root@oraclem1 ~]#
解決辦法:
[root@data02 ~]# type mysqlbinlog
mysqlbinlog is/usr/local/mysql/bin/mysqlbinlog
[root@data02 ~]#
[root@data02 ~]# ln -s/usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
報錯記錄5:
Thu Apr 9 23:57:24 2015 - [info] Connecting to root@192.168.52.130(192.168.52.130:22)..
Checking slave recovery environment settings..
Relay log found at /home/data/mysql/data, up to mysqld-relay-bin.000013
Temporary relay log file is /home/data/mysql/data/mysqld-relay-bin.000013
Testing mysql connection and privileges..sh: mysql: command not found
mysql command failed with rc 127:0!
at/usr/local/bin/apply_diff_relay_logs line 375
main::check()called at /usr/local/bin/apply_diff_relay_logs line 497
eval{...} called at /usr/local/bin/apply_diff_relay_logs line 475
main::main()called at /usr/local/bin/apply_diff_relay_logs line 120
Thu Apr 9 23:57:24 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln205] Slaves settings check failed!
Thu Apr 9 23:57:24 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln413] Slave configuration failed.
Thu Apr 9 23:57:24 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln424] Error happened on checking configurations. at /usr/local/bin/masterha_check_repl line 48
Thu Apr 9 23:57:24 2015 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln523] Error happened on monitoring servers.
Thu Apr 9 23:57:24 2015 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
解決辦法:
ln -s /usr/local/mysql/bin/mysql/usr/bin/mysql