MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案,它由日本DeNA公司youshimaton(現就任於Facebook公司)開發,是一套優秀的做爲MySQL高可用性環境下故障切換和主從提高的高可用軟件。在MySQL故障切換過程當中,MHA能作到在0~30秒以內自動完成數據庫的故障切換操做,而且在進行故障切換的過程當中,MHA能在最大程度上保證數據的一致性,以達到真正意義上的高可用。html
該軟件由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。MHA Manager能夠單獨部署在一臺獨立的機器上管理多個master-slave集羣,也能夠部署在一臺slave節點上。MHA Node運行在每臺MySQL服務器上,MHA Manager會定時探測集羣中的master節點,當master出現故障時,它能夠自動將最新數據的slave提高爲新的master,而後將全部其餘的slave從新指向新的master。整個故障轉移過程對應用程序徹底透明。node
在MHA自動故障切換過程當中,MHA試圖從宕機的主服務器上保存二進制日誌,最大程度的保證數據的不丟失,但這並不老是可行的。例如,若是主服務器硬件故障或沒法經過ssh訪問,MHA無法保存二進制日誌,只進行故障轉移而丟失了最新的數據。使用MySQL 5.5的半同步複製,能夠大大下降數據丟失的風險。MHA能夠與半同步複製結合起來。若是隻有一個slave已經收到了最新的二進制日誌,MHA能夠將最新的二進制日誌應用於其餘全部的slave服務器上,所以能夠保證全部節點的數據一致性。mysql
目前MHA主要支持一主多從的架構,要搭建MHA,要求一個複製集羣中必須最少有三臺數據庫服務器,一主二從,即一臺充當master,一臺充當備用master,另一臺充當從庫,由於至少須要三臺服務器,出於機器成本的考慮,淘寶也在該基礎上進行了改造,目前淘寶TMHA已經支持一主一從。(出自:《深刻淺出MySQL(第二版)》)linux
官方介紹:https://code.google.com/p/mysql-master-ha/redis
下圖展現瞭如何經過MHA Manager管理多組主從複製。sql
能夠將MHA工做原理總結爲以下:shell
(1)從宕機崩潰的master保存二進制日誌事件(binlog events); (2)識別含有最新更新的slave; (3)應用差別的中繼日誌(relay log)到其餘的slave; (4)應用從master保存的二進制日誌事件(binlog events); (5)提高一個slave爲新的master; (6)使其餘的slave鏈接新的master進行復制;
MHA軟件由兩部分組成,Manager工具包和Node工具包,具體的說明以下。數據庫
Manager工具包主要包括如下幾個工具:bash
masterha_check_ssh 檢查MHA的SSH配置情況
masterha_check_repl 檢查MySQL複製情況
masterha_manger 啓動MHA
masterha_check_status 檢測當前MHA運行狀態
masterha_master_monitor 檢測master是否宕機
masterha_master_switch 控制故障轉移(自動或者手動)
masterha_conf_host 添加或刪除配置的server信息
Node工具包(這些工具一般由MHA Manager的腳本觸發,無需人爲操做)主要包括如下幾個工具:服務器
save_binary_logs 保存和複製master的二進制日志
apply_diff_relay_logs 識別差別的中繼日誌事件並將其差別的事件應用於其餘的slave
filter_mysqlbinlog 去除沒必要要的ROLLBACK事件(MHA已再也不使用這個工具)
purge_relay_logs 清除中繼日誌(不會阻塞SQL線程)
注意:爲了儘量的減小主庫硬件損壞宕機形成的數據丟失,所以在配置MHA的同時建議配置成MySQL 5.5的半同步複製。關於半同步複製原理各位本身進行查閱。(不是必須)
時間同步(同步後確認各服務器時間是否一致,不一致須要修改一下時區)
關閉防火牆
安裝MySQL數據庫(實驗環境爲MySQL5.6)
軟件包連接:https://pan.baidu.com/s/1o934VZc
在master-db1 192.168.1.11上操做:
[root@master-db1 ~]# echo -e "\n" |ssh-keygen -t dsa -N "" [root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.1.12 [root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.1.13 [root@master-db1 ~]# ssh-copy-id -i .ssh/id_dsa.pub root@192.168.1.14
另外三臺按照上面方法配置便可
注意:binlog-do-db 和 replicate-ignore-db 設置必須相同。 MHA 在啓動時候會檢測過濾規則,若是過濾規則不一樣,MHA 不啓動監控和故障轉移。
1.備份主庫數據
[root@master-db1 ~]# mysqldump --master-data=2 --single-transaction -R --triggers -A > all.sql
2.在Master 192.168.1.11和Candicate master 192.168.1.12上建立複製用戶(slave若是配置爲no-master能夠不建立,不然也應當建立複製用戶):
mysql> grant replication slave on *.* to 'repl'@'192.168.1.%' identified by '123456'; mysql> flush privileges;
3.查看主庫備份時的binlog名稱和位置:
mysql> show master status; +------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 407 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
4.把備份複製到192.168.1.12和192.168.1.13
[root@master-db1 ~]# scp all.sql 192.168.1.12:/root [root@master-db1 ~]# scp all.sql 192.168.1.13:/root
5.分別在兩臺服務器上導入備份
[root@slave-db1 ~]# mysql < all.sql [root@slave-db2 ~]# mysql < all.sql
6.分別在兩臺服務器上執行復制相關命令
mysql> CHANGE MASTER TO MASTER_HOST='192.168.1.11',MASTER_USER='repl', MASTER_PASSWORD='123456',MASTER_LOG_FILE='mysql-bin.000002',MASTER_LOG_POS=407; Query OK, 0 rows affected, 2 warnings (0.12 sec) mysql> start slave; Query OK, 0 rows affected (0.08 sec) mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.11 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 407 Relay_Log_File: relay-log.000002 Relay_Log_Pos: 283 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes
7.建立mha管理的帳號,在全部mysql服務器上都須要執行:
mysql> grant all privileges on *.* to 'root'@'192.168.1.%' identified by '123456'; mysql> flush privileges;
若是是在slave服務器上安裝的manager,則須要建立以本機hostname名鏈接的帳號,否則masterha_check_repl測試通不過。
GRANT ALL PRIVILEGES ON *.* TO 'root'@'master(主機名)' IDENTIFIED BY '123456'
1.安裝MHA的Perl依賴包
在全部的mysql(192.168.1.11-13)上安裝
[root@master-db1 ~]# yum install perl-DBD-MySQL -y [root@slave-db1 ~]# yum install perl-DBD-MySQL -y [root@slave-db2 ~]# yum install perl-DBD-MySQL -y
在mha-monitor(192.168.1.14)上安裝MHA Manger依賴的perl模塊
[root@mha-monitor ~]# yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
2.在全部的服務器(192.168.1.11-14)上安裝MHA Node軟件包
[root@master-db1 ~]# tar xf mha4mysql-node-0.56.tar.gz [root@master-db1 ~]# cd mha4mysql-node-0.56 [root@master-db1 mha4mysql-node-0.56]# perl Makefile.PL *** Module::AutoInstall version 1.03
*** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.609) - DBD::mysql ...loaded. (4.013) *** Module::AutoInstall configuration finished. Checking if your kit is complete... Looks good Writing Makefile for mha4mysql::node [root@master-db1 mha4mysql-node-0.56]# make &&make install
3.在mha-monitor(192.168.1.14)上安裝MHA Manager軟件包
[root@mha-monitor ~]# tar xf mha4mysql-manager-0.56.tar.gz [root@mha-monitor ~]# cd mha4mysql-manager-0.56 [root@mha-monitor mha4mysql-manager-0.56]# perl Makefile.PL *** Module::AutoInstall version 1.03
*** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.609) - DBD::mysql ...loaded. (4.013) - Time::HiRes ...loaded. (1.9721) - Config::Tiny ...loaded. (2.12) - Log::Dispatch ...loaded. (2.26) - Parallel::ForkManager ...loaded. (0.7.9) - MHA::NodeConst ...missing. ==> Auto-install the 1 mandatory module(s) from CPAN? [y] y *** Dependencies will be installed the next time you type 'make'. *** Module::AutoInstall configuration finished. Checking if your kit is complete... Looks good Warning: prerequisite MHA::NodeConst 0 not found. Writing Makefile for mha4mysql::manager [root@mha-monitor mha4mysql-manager-0.56]# make &&make install
安裝完成後會在/usr/local/bin目錄下面生成如下腳本文件,前面已經說過這些腳本的做用,這裏再也不重複
[root@mha-monitor mha4mysql-manager-0.56]# ll /usr/local/bin 總用量 124
-r-xr-xr-x 1 root root 16367 1月 17 22:28 apply_diff_relay_logs -r-xr-xr-x 1 root root 4807 1月 17 22:28 filter_mysqlbinlog -r-xr-xr-x 1 root root 1995 1月 17 22:29 masterha_check_repl -r-xr-xr-x 1 root root 1779 1月 17 22:29 masterha_check_ssh -r-xr-xr-x 1 root root 1865 1月 17 22:29 masterha_check_status -r-xr-xr-x 1 root root 3201 1月 17 22:29 masterha_conf_host -r-xr-xr-x 1 root root 2517 1月 17 22:29 masterha_manager -r-xr-xr-x 1 root root 2165 1月 17 22:29 masterha_master_monitor -r-xr-xr-x 1 root root 2373 1月 17 22:29 masterha_master_switch -r-xr-xr-x 1 root root 5171 1月 17 22:29 masterha_secondary_check -r-xr-xr-x 1 root root 1739 1月 17 22:29 masterha_stop -r-xr-xr-x 1 root root 8261 1月 17 22:28 purge_relay_logs -r-xr-xr-x 1 root root 7525 1月 17 22:28 save_binary_logs
在/root/mha4mysql-manager-0.56/samples/scripts/下有些示例腳本複製到/usr/local/bin/下,這些腳本不完整,須要本身修改,這是軟件開發着留給咱們本身發揮的,若是開啓下面的任何一個腳本對應的參數,而對應這裏的腳本又沒有修改,則會拋錯
[root@mha-monitor mha4mysql-manager-0.56]# ll /root/mha4mysql-manager-0.56/samples/scripts/ 總用量 32 -rwxr-xr-x 1 4984 users 3648 4月 1 2014 master_ip_failover #自動切換時vip管理的腳本,不是必須,若是咱們使用keepalived的,咱們能夠本身編寫腳本完成對vip的管理,好比監控mysql,若是mysql異常,咱們中止keepalived就行,這樣vip就會自動漂移 -rwxr-xr-x 1 4984 users 9870 4月 1 2014 master_ip_online_change #在線切換時vip的管理,不是必須,一樣能夠能夠自行編寫簡單的shell完成 -rwxr-xr-x 1 4984 users 11867 4月 1 2014 power_manager #故障發生後關閉主機的腳本,不是必須 -rwxr-xr-x 1 4984 users 1360 4月 1 2014 send_report #因故障切換後發送報警的腳本,不是必須,可自行編寫簡單的shell完成。
[root@mha-monitor scripts]# cp /root/mha4mysql-manager-0.56/samples/scripts/* /usr/local/bin/
1.建立MHA的工做目錄,而且建立相關配置文件(在軟件包解壓後的目錄裏面有樣例配置文件)。
[root@mha-monitor ~]# mkdir -p /etc/masterha [root@mha-monitor ~]# cp /root/mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/ [root@mha-monitor ~]# ll /etc/masterha/ 總用量 4
-rw-r--r-- 1 root root 257 1月 17 22:40 app1.cnf
2.修改app1.cnf配置文件,修改後的文件內容以下
[server default] manager_log=/var/log/masterha/app1/manager.log //設置manager的日誌
manager_workdir=/var/log/masterha/app1 //設置manager的工做目錄
master_binlog_dir=/Data/apps/mysql-5.6.36/data/ //設置master 保存binlog的位置,以便MHA能夠找到master的日誌,我這裏的也就是mysql的數據目錄
master_ip_failover_script=/usr/local/bin/master_ip_failover //設置自動failover時候的切換腳本
master_ip_online_change_script=/usr/local/bin/master_ip_online_change //設置手動切換時候的切換腳本
password=123456//設置mysql中root用戶的密碼,這個密碼是前文中建立監控用戶的那個密碼
user=root //設置監控用戶root
ping_interval=1 //設置監控主庫,發送ping包的時間間隔,默認是3秒,嘗試三次沒有迴應的時候自動進行railover
remote_workdir=/tmp//設置遠端mysql在發生切換時binlog的保存位置
repl_password=123456 //設置複製用戶的密碼
repl_user=repl//設置複製用戶
report_script=/usr/local/send_report //設置發生切換後發送的報警的腳本
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.1.11 -s 192.168.1.12 #實現多路由監測Master的可用性 shutdown_script="" //設置故障發生後關閉故障主機腳本(該腳本的主要做用是關閉主機放在發生腦裂,這裏沒有使用)
ssh_user=root //設置ssh的登陸用戶名
[server1] hostname=192.168.1.11 port=3306 [server2] candidate_master=1 //設置爲候選master,若是設置該參數之後,發生主從切換之後將會將此從庫提高爲主庫,即便這個主庫不是集羣中事件最新的slave
check_repl_delay=0 //默認狀況下若是一個slave落後master 100M的relay logs的話,MHA將不會選擇該slave做爲一個新的master,由於對於這個slave的恢復須要花費很長時間,經過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略複製延時,這個參數對於設置了candidate_master=1的主機很是有用,由於這個候選主在切換的過程當中必定是新的master
hostname=192.168.1.12 port=3306 [server3] hostname=192.168.1.13 port=3306 no_master=1
3.設置relay log的清除方式(在每一個slave節點上):
[root@slave-db1 ~]# mysql -e 'set global relay_log_purge=0' [root@slave-db2 ~]# mysql -e 'set global relay_log_purge=0'
注意:
MHA在發生切換的過程當中,從庫的恢復過程當中依賴於relay log的相關信息,因此這裏要將relay log的自動清除設置爲OFF,採用手動清除relay log的方式。在默認狀況下,從服務器上的中繼日誌會在SQL線程執行完畢後被自動刪除。可是在MHA環境中,這些中繼日誌在恢復其餘從服務器時可能會被用到,所以須要禁用中繼日誌的自動刪除功能。按期清除中繼日誌須要考慮到複製延時的問題。在ext3的文件系統下,刪除大的文件須要必定的時間,會致使嚴重的複製延時。爲了不復制延時,須要暫時爲中繼日誌建立硬連接,由於在linux系統中經過硬連接刪除大文件速度會很快。(在mysql數據庫中,刪除大表時,一般也採用創建硬連接的方式)
MHA節點中包含了pure_relay_logs命令工具,它能夠爲中繼日誌建立硬連接,執行SET GLOBAL relay_log_purge=1,等待幾秒鐘以便SQL線程切換到新的中繼日誌,再執行SET GLOBAL relay_log_purge=0。
pure_relay_logs腳本參數以下所示:
--user mysql 用戶名 --password mysql 密碼 --port 端口號 --workdir 指定建立relay log的硬連接的位置,默認是/var/tmp,因爲系統不一樣分區建立硬連接文件會失敗,故須要執行硬連接具體位置,成功執行腳本後,硬連接的中繼日誌文件被刪除 --disable_relay_log_purge 默認狀況下,若是relay_log_purge=1,腳本會什麼都不清理,自動退出,經過設定這個參數,當relay_log_purge=1的狀況下會將relay_log_purge設置爲0。清理relay log以後,最後將參數設置爲OFF。
設置按期清理relay腳本(兩臺slave服務器)
[root@192.168.1.12 ~]# cat purge_relay_log.sh #!/bin/bash user=root passwd=123456 port=3306 log_dir='/data/masterha/log' work_dir='/Data/apps' purge='/usr/local/bin/purge_relay_logs'
if [ ! -d $log_dir ] then mkdir $log_dir -p fi $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
添加執行權限,並添加到crontab按期執行,另一臺相同操做
[root@slave-db1 ~]#chmod +x purge_relay_log.sh [root@slave-db1 ~]#crontab -l 0 4 * * * /bin/bash /root/purge_relay_log.sh
purge_relay_logs腳本刪除中繼日誌不會阻塞SQL線程。下面咱們手動執行看看什麼狀況。
[root@slave-db1 ~]# purge_relay_logs --user=root --password=123456 --port=3306 --host=192.168.1.12 -disable_relay_log_purge --workdir=/Data/apps/
2018-01-17 23:07:59: purge_relay_logs script started. Found relay_log.info: /Data/apps/mysql-5.6.36/data/relay-log.info Opening /Data/apps/mysql-5.6.36/data/relay-log.000001 .. Opening /Data/apps/mysql-5.6.36/data/relay-log.000002 .. Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok. 2018-01-17 23:08:02: All relay log purging operations succeeded.
4.因爲自帶的腳本master_ip_failover有些問題須要自行修改,修改內容以下:
#!/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '192.168.1.250/24'; # Virtual IP my $gateway = '192.168.1.1'; #Gateway IP my $interface = 'eth0'; my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1"; my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { #$orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
} # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
} sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
5.檢查SSH配置
[root@mha-monitor ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf Wed Jan 17 23:13:30 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Jan 17 23:13:30 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed Jan 17 23:13:30 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed Jan 17 23:13:30 2018 - [info] Starting SSH connection tests.. Wed Jan 17 23:13:33 2018 - [debug] Wed Jan 17 23:13:30 2018 - [debug] Connecting via SSH from root@192.168.1.11(192.168.1.11:22) to root@192.168.1.12(192.168.1.12:22).. Wed Jan 17 23:13:32 2018 - [debug] ok. Wed Jan 17 23:13:32 2018 - [debug] Connecting via SSH from root@192.168.1.11(192.168.1.11:22) to root@192.168.1.13(192.168.1.13:22).. Wed Jan 17 23:13:33 2018 - [debug] ok. Wed Jan 17 23:13:33 2018 - [debug] Wed Jan 17 23:13:31 2018 - [debug] Connecting via SSH from root@192.168.1.12(192.168.1.12:22) to root@192.168.1.11(192.168.1.11:22).. Wed Jan 17 23:13:32 2018 - [debug] ok. Wed Jan 17 23:13:32 2018 - [debug] Connecting via SSH from root@192.168.1.12(192.168.1.12:22) to root@192.168.1.13(192.168.1.13:22).. Wed Jan 17 23:13:33 2018 - [debug] ok. Wed Jan 17 23:13:33 2018 - [debug] Wed Jan 17 23:13:31 2018 - [debug] Connecting via SSH from root@192.168.1.13(192.168.1.13:22) to root@192.168.1.11(192.168.1.11:22).. Wed Jan 17 23:13:33 2018 - [debug] ok. Wed Jan 17 23:13:33 2018 - [debug] Connecting via SSH from root@192.168.1.13(192.168.1.13:22) to root@192.168.1.12(192.168.1.12:22).. Wed Jan 17 23:13:33 2018 - [debug] ok. Wed Jan 17 23:13:33 2018 - [info] All SSH connection tests passed successfully.
能夠看見各個節點ssh驗證都是ok的。
6.檢查整個複製環境情況。
[root@mha-monitor ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf ..... Checking the Status of the script.. OK Wed Jan 17 23:18:04 2018 - [info] OK. Wed Jan 17 23:18:04 2018 - [warning] shutdown_script is not defined. Wed Jan 17 23:18:04 2018 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
7.開啓MHA Manager監控
[root@mha-monitor ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2>&1 & [1] 7191 [root@mha-monitor ~]# nohup: 忽略輸入並把輸出追加到"nohup.out" [root@mha-monitor ~]# jobs [1]+ Running nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2>&1 &
啓動參數介紹:
--remove_dead_master_conf 該參數表明當發生主從切換後,老的主庫的ip將會從配置文件中移除。 --manger_log 日誌存放位置 --ignore_last_failover 在缺省狀況下,若是MHA檢測到連續發生宕機,且兩次宕機間隔不足8小時的話,則不會進行Failover,之因此這樣限制是爲了不ping-pong效應。該參數表明忽略上次MHA觸發切換產生的文件,默認狀況下,MHA發生切換後會在日誌目錄,也就是上面我設置的/data產生app1.failover.complete文件,下次再次切換的時候若是發現該目錄下存在該文件將不容許觸發切換,除非在第一次切換後收到刪除該文件,爲了方便,這裏設置爲--ignore_last_failover。
8.查看MHA Manager監控狀態:
[root@mha-monitor ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:7191) is running(0:PING_OK), master:192.168.1.11
能夠看見已經在監控了,並且master的主機爲192.168.1.11
1.模擬MySQL故障,查看VIP漂移和MySQL自動切換狀況
注:切換後MHA服務會自動中止,官方給出的緣由是
Running MHA Manager from daemontools Currently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was killed by accident,
the manager stops working. To run as a daemon, daemontool. or any external daemon program can be used.
Here is an example to run from daemontools.
master上中止mysql服務器
[root@master-db1 ~]# service mysqld stop Shutting down MySQL..... [肯定]
在manager上查看MHA服務和切換日誌
[root@mha-monitor ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING). [1]+ Done nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover /var/log/masterha/app1/manager.log 2>&1 [root@mha-monitor ~]# tail -20 /var/log/masterha/app1/manager.log ----- Failover Report ----- app1: MySQL Master failover 192.168.1.11(192.168.1.11:3306) to 192.168.1.12(192.168.1.12:3306) succeeded Master 192.168.1.11(192.168.1.11:3306) is down! Check MHA Manager logs at mha-monitor:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 192.168.1.11(192.168.1.11:3306) The latest slave 192.168.1.12(192.168.1.12:3306) has all relay logs for recovery. Selected 192.168.1.12(192.168.1.12:3306) as a new master. 192.168.1.12(192.168.1.12:3306): OK: Applying all logs succeeded. 192.168.1.12(192.168.1.12:3306): OK: Activated master IP address. 192.168.1.13(192.168.1.13:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.1.13(192.168.1.13:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.1.12(192.168.1.12:3306) 192.168.1.12(192.168.1.12:3306): Resetting slave info succeeded. Master failover to 192.168.1.12(192.168.1.12:3306) completed successfully.
看到最後的Master failover to 192.168.1.12(192.168.1.12:3306) completed successfully.說明備選master如今已經上位了。
從上面的輸出能夠看出整個MHA的切換過程,共包括如下的步驟:
1.配置文件檢查階段,這個階段會檢查整個集羣配置文件配置
2.宕機的master處理,這個階段包括虛擬ip摘除操做,主機關機操做(這個我這裏尚未實現,須要研究)
3.複製dead maste和最新slave相差的relay log,並保存到MHA Manger具體的目錄下
4.識別含有最新更新的slave
5.應用從master保存的二進制日誌事件(binlog events)
6.提高一個slave爲新的master進行復制
7.使其餘的slave鏈接新的master進行復制
在slave-db2上查看主從複製狀況(192.168.1.13)
mysql> show slave status\G;ges; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.12 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 635947 Relay_Log_File: relay-log.000002 Relay_Log_Pos: 283 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes
啓動MHA Manger監控,查看集羣裏面如今誰是master
2.將MySQL故障服務器從新加入MHA環境步驟
1.把故障服務器設爲新的slave 2.從新啓動MHA manager 3.查看MHA狀態
3在線手動切換主從
在許多狀況下, 須要將現有的主服務器遷移到另一臺服務器上。 好比主服務器硬件故障,RAID 控制卡須要重建,將主服務器移到性能更好的服務器上等等。維護主服務器引發性能降低, 致使停機時間至少沒法寫入數據。 另外, 阻塞或殺掉當前運行的會話會致使主主之間數據不一致的問題發生。 MHA 提供快速切換和優雅的阻塞寫入,這個切換過程只須要 0.5-2s 的時間,這段時間內數據是沒法寫入的。在不少狀況下,0.5-2s 的阻塞寫入是能夠接受的。所以切換主服務器不須要計劃分配維護時間窗口。
MHA在線切換的大概過程:
1.檢測複製設置和肯定當前主服務器 2.肯定新的主服務器 3.阻塞寫入到當前主服務器 4.等待全部從服務器遇上覆制 5.授予寫入到新的主服務器 6.從新設置從服務器
注意,在線切換的時候應用架構須要考慮如下兩個問題
1.自動識別master和slave的問題(master的機器可能會切換),若是採用了vip的方式,基本能夠解決這個問題。 2.負載均衡的問題(能夠定義大概的讀寫比例,每臺機器可承擔的負載比例,當有機器離開集羣時,須要考慮這個問題)
在線切換步驟以下:
1.原master出現故障 masterha_stop --conf=/etc/masterha/app1.cnf #中止 masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.11 --dead_master_port=3306 --new_master_host=192.168.1.12 --new_master_port=3306 --ignore_last_failover
2.把原master變爲slave切換
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.12 --new_master_port=3306 --orig_master_is_new_slave
注意:因爲在線進行切換須要調用到master_ip_online_change這個腳本,可是因爲該腳本不完整,須要本身進行相應的修改,我google到後發現仍是有問題,腳本中new_master_password這個變量獲取不到,致使在線切換失敗,因此進行了相關的硬編碼,直接把mysql的root用戶密碼賦值給變量new_master_password,若是有哪位大牛知道緣由,請指點指點。這個腳本還能夠管理vip。下面貼出腳本:
#!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper; my $_tstart; my $_running_interval = 0.1; my ( $command, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, ); my $vip = '192.168.0.88/24'; # Virtual IP my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down"; my $ssh_user = "root"; my $new_master_password='123456'; my $orig_master_password='123456'; GetOptions( 'command=s' => \$command, #'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, #'orig_master_password=s' => \$orig_master_password, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, #'new_master_password=s' => \$new_master_password, ); exit &main(); sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return $curdate . " " . sprintf( "%06d", $microsec ); } sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } } sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads; my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute(); while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 ); if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); } if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); } push @threads, $ref; } return @threads; } sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect(); # Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 ); ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand #$orig_master_handler->disable_log_bin_local(); #print current_time_us() . " Drpping app user on the orig master..\n"; #FIXME_xxx_drop_app_user($orig_master_handler); ## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } ## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); ## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; #$orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect(); ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery. # If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master #$new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; #FIXME_xxx_create_app_user($new_master_handler); #$new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; die; }
爲了保證數據徹底一致性,在最快的時間內完成切換,MHA的在線切換必須知足如下條件纔會切換成功,不然會切換失敗。
1.全部slave的IO線程都在運行 2.全部slave的SQL線程都在運行 3.全部的show slave status的輸出中Seconds_Behind_Master參數小於或者等於running_updates_limit秒,若是在切換過程當中不指定running_updates_limit,那麼默認狀況下running_updates_limit爲1秒。 4.在master端,經過show processlist輸出,沒有一個更新花費的時間大於running_updates_limit秒。
最後補充一下郵件發送腳本send_report
#!/usr/bin/perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Mail::Sender; use Getopt::Long; #new_master_host and new_slave_hosts are set only when recovering master succeeded my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body ); my $smtp='smtp.163.com'; my $mail_from='xxxx'; my $mail_user='xxxxx'; my $mail_pass='xxxxx'; my $mail_to=['xxxx','xxxx']; GetOptions( 'orig_master_host=s' => \$dead_master_host, 'new_master_host=s' => \$new_master_host, 'new_slave_hosts=s' => \$new_slave_hosts, 'subject=s' => \$subject, 'body=s' => \$body, ); mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body); sub mailToContacts { my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_; open my $DEBUG, "> /tmp/monitormail.log" or die "Can't open the debug file:$!\n"; my $sender = new Mail::Sender { ctype => 'text/plain; charset=utf-8', encoding => 'utf-8', smtp => $smtp, from => $mail_from, auth => 'LOGIN', TLS_allowed => '0', authid => $user, authpwd => $passwd, to => $mail_to, subject => $subject, debug => $DEBUG }; $sender->MailMsg( { msg => $msg, debug => $DEBUG } ) or print $Mail::Sender::Error; return 1; } # Do whatever you want here exit 0;
總結:
目前高可用方案能夠必定程度上實現數據庫的高可用,好比前面文章介紹的MMM,heartbeat+drbd,Cluster等。還有percona的Galera Cluster等。這些高可用軟件各有優劣。在進行高可用方案選擇時,主要是看業務還有對數據一致性方面的要求。最後出於對數據庫的高可用和數據一致性的要求,推薦使用MHA架構。