MySQL高可用架構-MHA環境部署記錄

 

1、MHA介紹html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案,它由日本DeNA公司youshimaton(現就任於Facebook公司)開發,是日本的一位
MySQL專家採用Perl語言編寫的一個腳本管理工具,該工具僅適用於MySQLReplication(二層)環境,目的在於維持Master主庫的高可用性。是一套優秀的做爲MySQL高可用性
環境下故障切換和主從提高的高可用軟件。在MySQL故障切換過程當中,MHA能作到在0~30秒以內自動完成數據庫的故障切換操做,而且在進行故障切換的過程當中,MHA能在最大程度上
保證數據的一致性,以達到真正意義上的高可用。
 
MHA是自動的master故障轉移和Slave提高的軟件包.它是基於標準的MySQL複製(異步/半同步).該軟件由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。
1)MHA Manager能夠單獨部署在一臺獨立的機器上管理多個master-slave集羣,也能夠部署在一臺slave節點上。MHA Manager會定時探測集羣中的node節點,當發現master
   出現故障的時候,它能夠自動將具備最新數據的slave提高爲新的master,而後將全部其它的slave導向新的master上.整個故障轉移過程對應用程序是透明的。
2)MHA Node運行在每臺MySQL服務器上,它經過監控具有解析和清理logs功能的腳原本加快故障轉移的。
 
在MHA自動故障切換過程當中,MHA試圖從宕機的主服務器上保存二進制日誌,最大程度的保證數據的不丟失,但這並不老是可行的。例如,若是主服務器硬件故障或沒法經過 ssh 訪問,
MHA無法保存二進制日誌,只進行故障轉移而丟失了最新的數據。使用MySQL 5.5的半同步複製,能夠大大下降數據丟失的風險。MHA能夠與半同步複製結合起來。若是隻有一個slave
已經收到了最新的二進制日誌,MHA能夠將最新的二進制日誌應用於其餘全部的slave服務器上,所以能夠保證全部節點的數據一致性。
 
目前MHA主要支持一主多從的架構,要搭建MHA,要求一個複製集羣中必須最少有三臺數據庫服務器,一主二從,即一臺充當master,一臺充當備用master,另一臺充當從庫,由於至
少須要三臺服務器,出於機器成本的考慮,淘寶也在該基礎上進行了改造,目前淘寶TMHA已經支持一主一從。

2、MHA工做架構說明node

展現瞭如何經過MHA Manager管理多組主從複製。能夠將MHA工做原理總結爲以下:mysql

1
2
3
4
5
6
7
8
相較於其它HA軟件,MHA的目的在於維持MySQL Replication中Master庫的高可用性,其最大特色是能夠修復多個Slave之間的差別日誌,最終使全部Slave保持數據一致,
而後從中選擇一個充當新的Master,並將其它Slave指向它。工做流程主要以下:
1)從宕機崩潰的master保存二進制日誌事件(binlog events);
2)識別含有最新更新的slave;
3)應用差別的中繼日誌(relay log)到其餘的slave;
4)應用從master保存的二進制日誌事件(binlog events);
5)提高一個slave爲新的master;
6)使其餘的slave鏈接新的master進行復制;

MHA工做原理linux

1
2
3
4
5
當master出現故障時,經過對比slave之間I /O 線程讀取master binlog的位置,選取最接近的slave作爲latest slave。其它slave經過與latest slave對比生成差別中繼日誌。
在latest slave上應用從master保存的binlog,同時將latest slave提高爲master。最後在其它slave上應用相應的差別中繼日誌並開始重新的master開始複製。
 
在MHA實現Master故障切換過程當中,MHA Node會試圖訪問故障的master(經過SSH),若是能夠訪問(不是硬件故障,好比InnoDB數據文件損壞等),會保存二進制文件,以最大程度
保證數據不丟失。MHA和半同步複製一塊兒使用會大大下降數據丟失的危險。

MHA軟件的架構:由兩部分組成,Manager工具包和Node工具包,具體的說明以下。
Manager工具包主要包括如下幾個工具:redis

1
2
3
4
5
6
7
masterha_check_ssh              檢查MHA的SSH配置情況
masterha_check_repl             檢查MySQL複製情況
masterha_manger                 啓動MHA
masterha_check_status           檢測當前MHA運行狀態
masterha_master_monitor         檢測master是否宕機
masterha_master_switch          控制故障轉移(自動或者手動)
masterha_conf_host              添加或刪除配置的server信息

Node工具包(這些工具一般由MHA Manager的腳本觸發,無需人爲操做)主要包括如下幾個工具:sql

1
2
3
4
5
6
7
8
9
10
11
12
save_binary_logs(保存二進制日誌)             保存和複製master的二進制日誌
apply_diff_relay_logs(應用差別中繼日誌)      識別差別的中繼日誌事件並將其差別的事件應用於其餘的slave
filter_mysqlbinlog                          去除沒必要要的ROLLBACK事件(MHA已再也不使用這個工具)
purge_relay_logs(清理中繼日誌)               清除中繼日誌(不會阻塞SQL線程)
.....................................................................................................
MHA如何保持數據的一致性呢?主要經過MHA node的如下幾個工具實現,可是這些工具由mha manager觸發:
save_binary_logs         若是master的二進制日誌能夠存取的話,保存複製master的二進制日誌,最大程度保證數據不丟失
apply_diff_relay_logs    相對於最新的slave,生成差別的中繼日誌並將全部差別事件應用到其餘全部的slave
 
注意:
對比的是relay log,relay log越新就越接近於master,才能保證數據是最新的。
purge_relay_logs刪除中繼日誌而不阻塞sql線程

MHA的優點shell

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1)故障切換快
在主從複製集羣中,只要從庫在複製上沒有延遲,MHA一般能夠在數秒內實現故障切換。9-10秒內檢查到master故障,能夠選擇在7-10秒關閉master以免出現裂腦,幾秒鐘內,
將差別中繼日誌(relay log)應用到新的master上,所以總的宕機時間一般爲10-30秒。恢復新的master後,MHA並行的恢復其他的slave。即便在有數萬臺slave,也不會
影響master的恢復時間。
 
DeNA在超過150個MySQL(主要5.0 /5 .1版本)主從環境下使用了MHA。當mater故障後,MHA在4秒內就完成了故障切換。在傳統的主動/被動集羣解決方案中,4秒內完成故障切換是不可能的。
 
2)master故障不會致使數據不一致
當目前的master出現故障時,MHA自動識別slave之間中繼日誌(relay log)的不一樣,並應用到全部的slave中。這樣全部的salve可以保持同步,只要全部的slave處於存活
狀態。和Semi-Synchronous Replication一塊兒使用,(幾乎)能夠保證沒有數據丟失。
 
3)無需修改當前的MySQL設置
MHA的設計的重要原則之一就是儘量地簡單易用。MHA工做在傳統的MySQL版本5.0和以後版本的主從複製環境中。和其它高可用解決方法比,MHA並不須要改變MySQL的部署環境。
MHA適用於異步和半同步的主從複製。
 
啓動/中止/升級/降級/安裝/卸載MHA不須要改變(包擴啓動/中止)MySQL複製。當須要升級MHA到新的版本,不須要中止MySQL,僅僅替換到新版本的MHA,而後重啓MHA Manager
就行了。
 
MHA運行在MySQL 5.0開始的原生版本上。一些其它的MySQL高可用解決方案須要特定的版本(好比MySQL集羣、帶全局事務ID的MySQL等等),但並不只僅爲了master的高可用才遷移應用的。在大多數狀況下,已經部署了比較舊MySQL應用,而且不想僅僅爲了實現Master的高可用,花太多的時間遷移到不一樣的存儲引擎或更新的前沿發行版。MHA工做的包括5.0 /5 .1 /5 .5的原生版本的MySQL上,因此並不須要遷移。
 
4)無需增長大量的服務器
MHA由MHA Manager和MHA Node組成。MHA Node運行在須要故障切換/恢復的MySQL服務器上,所以並不須要額外增長服務器。MHA Manager運行在特定的服務器上,所以須要
增長一臺(實現高可用須要2臺),可是MHA Manager能夠監控大量(甚至上百臺)單獨的master,所以,並不須要增長大量的服務器。即便在一臺slave上運行MHA Manager也是
能夠的。綜上,實現MHA並沒用額外增長大量的服務。
 
5)無性能降低
MHA適用與異步或半同步的MySQL複製。監控master時,MHA僅僅是每隔幾秒(默認是3秒)發送一個 ping 包,並不發送重查詢。能夠獲得像原生MySQL複製同樣快的性能。
 
6)適用於任何存儲引擎
MHA能夠運行在只要MySQL複製運行的存儲引擎上,並不只限制於InnoDB,即便在不易遷移的傳統的MyISAM引擎環境,同樣可使用MHA。

3、MHA高可用環境部署記錄數據庫

1)機器環境vim

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ip地址             主機名           角色
182.48.115.236    Node_Master     寫入,數據節點
182.48.115.237    Node_Slave      讀,數據節點,備選Master(candicate master)
182.48.115.238    Manager_Slave   讀,數據節點,也做爲Manager server(即也做爲manager節點)
........................................................................................................
爲了節省機器,這裏選擇只讀的從庫182.48.115.237(從庫不對外提供讀的服務)做爲候選主庫,即candicate master,或是專門用於備份
一樣,爲了節省機器,這裏選擇182.48.115.238這臺從庫做爲manager server(實際生產環節中,機器充足的狀況下, 通常是專門選擇一臺機器做爲Manager server)
........................................................................................................
 
關閉三臺機器的iptables和selinux
 
部署節點之間 ssh 無密碼登錄的信任關係(即在全部節點上作 ssh 免密碼登陸,包括對節點本機的信任)
[root@Node_Master ~] # ssh-copy-id 182.48.115.236
[root@Node_Master ~] # ssh-copy-id 182.48.115.237
[root@Node_Master ~] # ssh-copy-id 182.48.115.238
 
[root@Node_Slave ~] # ssh-copy-id 182.48.115.236
[root@Node_Slave ~] # ssh-copy-id 182.48.115.237
[root@Node_Slave ~] # ssh-copy-id 182.48.115.238
 
[root@Manager_Slave ~] # ssh-copy-id 182.48.115.236
[root@Manager_Slave ~] # ssh-copy-id 182.48.115.237
[root@Manager_Slave ~] # ssh-copy-id 182.48.115.238
 
如今3臺節點已經能實現兩兩互相 ssh 通了,不須要輸入密碼便可。若是不能實現任何兩臺主機互相之間能夠無密碼登陸,後面的環節可能會有問題。

2)實現主機名hostname登陸(在三臺節點上都須要執行)(這一步不是必需要操做的)bash

1
2
3
4
5
6
7
8
9
分別設置三臺節點機器的主機名(主機名上面已提出),並綁定hosts.
三臺機器的 /etc/hosts 文件的綁定信息以下:
[root@Node_Master ~] # vim /etc/hosts
.......
182.48.115.236    Node_Master
182.48.115.237    Node_Slave
182.48.115.238    Manager_Slave
 
相互驗證下使用主機名登錄是否正常,是否能夠相互使用主機名 ssh 無密碼登錄到對方。

3)準備好Mysql主從環境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
架構以下,一主二從的架構:
主庫:182.48.115.236    從庫:182.48.115.237
主庫:182.48.115.236    從庫:182.48.115.238
 
Mysql主從環境部署能夠參考:http: //www .cnblogs.com /kevingrace/p/6256603 .html
.......................................................................................
------主庫配置------
server- id =1      
log-bin=mysql-bin   
binlog-ignore-db=mysql 
sync_binlog = 1    
binlog_checksum = none 
binlog_format = mixed
------從庫1配置-------
server- id =2  
log-bin=mysql-bin  
binlog-ignore-db=mysql        // 千萬要注意:主從同步中的過濾字段要一致,不然後面使用masterha_check_repl 檢查複製時就會出錯!
slave-skip-errors = all
------從庫2配置-------
server- id =3
log-bin=mysql-bin  
binlog-ignore-db=mysql  
slave-skip-errors = all
 
而後主庫受權給從庫鏈接的權限,設置後,最好在從庫上驗證下是否能使用授予的權限鏈接主庫。
而後在從庫上根據主庫的「show master status;」 信心進行change master.....同步設置。
 
注意:
主從設置時,若是設置了bbinlog-ignore-db 和 replicate-ignore-db 過濾規則,則主從必須相同。即要使用binlog-ignore-db過濾字段,則主從配置都使用這個,
要是使用replicate-ignore-db過濾字段,則主從配置都使用這個,千萬不能主從配置使用的過濾字段不同!由於MHA 在啓動時候會檢測過濾規則,若是過濾規則不一樣,MHA 不啓動監控和故障轉移。
.......................................................................................

4)建立用戶mha管理的帳號(在三臺節點上都須要執行)

1
2
3
4
5
6
7
8
9
10
11
12
mysql> GRANT SUPER,RELOAD,REPLICATION CLIENT,SELECT ON *.* TO manager@ '182.48.115.%'  IDENTIFIED BY  'manager_1234' ;
Query OK, 0 rows affected (0.06 sec)
 
mysql> GRANT CREATE,INSERT,UPDATE,DELETE,DROP ON*.* TO manager@ '182.48.115.%' ;
Query OK, 0 rows affected (0.05 sec)
 
建立主從帳號(在三臺節點上都須要執行):
mysql> GRANT RELOAD, SUPER, REPLICATION SLAVE ON*.* TO  'repl' @ '182.48.115.%'  IDENTIFIED BY  'repl_1234' ;
Query OK, 0 rows affected (0.09 sec)
 
mysql> flush privileges;
Query OK, 0 rows affected (0.06 sec)

5)開始安裝mha
mha包括manager節點和data節點,其中:
data節點包括原有的MySQL複製結構中的主機,至少3臺,即1主2從,當master failover後,還能保證主從結構;只需安裝node包。
manager server:運行監控腳本,負責monitoring 和 auto-failover;須要安裝node包和manager包。

5.1)在全部data數據節點機上安裝安裝MHA node

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
下載mha4mysql-node-0.56. tar .gz
下載地址:http: //pan .baidu.com /s/1cphgLo
提取密碼:7674   
 
[root@Node_Master ~] # yum -y install perl-DBD-MySQL      //先安裝所需的perl模塊
[root@Node_Master ~] # tar -zvxf mha4mysql-node-0.56.tar.gz
[root@Node_Master ~] # cd mha4mysql-node-0.56
[root@Node_Master mha4mysql-node-0.56] # perl Makefile.PL
................................................................................................................
這一步可能報錯以下:
1)Can't  locate  ExtUtils /MakeMaker .pm  in  @INC (@INC contains: inc  /usr/local/lib64/perl5  /usr/local/share/perl5 ......
解決辦法:
[root@Node_Master mha4mysql-node-0.56] # yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
 
2)Can't  locate  CPAN.pm  in  @INC (@INC contains: inc  /usr/local/lib64/perl5  /usr/local/share/perl5  /usr/lib64/perl5 ....
解決辦法:
[root@Node_Master mha4mysql-node-0.56] # yum install -y perl-CPAN
................................................................................................................
[root@Node_Master mha4mysql-node-0.56] # make && make install

5.2)在manager節點(即182.48.115.238)上安裝MHA Manager(注意manager節點也要安裝MHA node)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
首先下載第三方yum源
[root@Manager_Slave ~] # rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
 
安裝perl的mysql包:
[root@Manager_Slave ~] # yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles perl-Time-HiRes -y
 
安裝MHA Manager軟件包:
下載地址:https: //pan .baidu.com /s/1slyfXN3
提取密碼:86wb
[root@Manager_Slave ~] # tar -vxf mha4mysql-manager-0.56.tar
[root@Manager_Slave ~] # cd mha4mysql-manager-0.56    
[root@Manager_Slave mha4mysql-manager-0.56] # perl Makefile.PL
[root@Manager_Slave mha4mysql-manager-0.56] # make && make install
 
安裝完MHA Manager後,在 /usr/local/bin 目錄下會生成如下腳本:
[root@Manager_Slave mha4mysql-manager-0.56] # ll /usr/local/bin/
總用量 84
-r-xr-xr-x. 1 root root 16367 5月  31 21:37 apply_diff_relay_logs
-r-xr-xr-x. 1 root root  4807 5月  31 21:37 filter_mysqlbinlog
-r-xr-xr-x. 1 root root  1995 5月  31 22:23 masterha_check_repl
-r-xr-xr-x. 1 root root  1779 5月  31 22:23 masterha_check_ssh
-r-xr-xr-x. 1 root root  1865 5月  31 22:23 masterha_check_status
-r-xr-xr-x. 1 root root  3201 5月  31 22:23 masterha_conf_host
-r-xr-xr-x. 1 root root  2517 5月  31 22:23 masterha_manager
-r-xr-xr-x. 1 root root  2165 5月  31 22:23 masterha_master_monitor
-r-xr-xr-x. 1 root root  2373 5月  31 22:23 masterha_master_switch
-r-xr-xr-x. 1 root root  5171 5月  31 22:23 masterha_secondary_check
-r-xr-xr-x. 1 root root  1739 5月  31 22:23 masterha_stop
-r-xr-xr-x. 1 root root  8261 5月  31 21:37 purge_relay_logs
-r-xr-xr-x. 1 root root  7525 5月  31 21:37 save_binary_logs
 
其中:
masterha_check_repl             檢查MySQL複製情況
masterha_check_ssh              檢查MHA的SSH配置情況
masterha_check_status           檢測當前MHA運行狀態
masterha_conf_host              添加或刪除配置的server信息
masterha_manager                啓動MHA
masterha_stop                   中止MHA
masterha_master_monitor         檢測master是否宕機
masterha_master_switch          控制故障轉移(自動或者手動)
masterha_secondary_check        多種線路檢測master是否存活
 
另外:
在.. /mha4mysql-manager-0 .56 /samples/scripts 下還有如下腳本,須要將其複製到 /usr/local/bin
[root@Manager_Slave mha4mysql-manager-0.56] # cd samples/scripts/
[root@Manager_Slave scripts] # ll
總用量 32
-rwxr-xr-x. 1 4984  users   3648 4月   1 2014 master_ip_failover             // 自動切換時VIP管理腳本,不是必須,若是咱們使用keepalived的,咱們能夠本身編寫腳本完成對vip的管理,好比監控mysql,若是mysql異常,咱們中止keepalived就行,這樣vip就會自動漂移
-rwxr-xr-x. 1 4984  users   9870 4月   1 2014 master_ip_online_change        // 在線切換時VIP腳本,不是必須,一樣能夠能夠自行編寫簡單的shell完成
-rwxr-xr-x. 1 4984  users  11867 4月   1 2014 power_manager                  // 故障發生後關閉master腳本,不是必須
-rwxr-xr-x. 1 4984  users   1360 4月   1 2014 send_report                    // 故障切換髮送報警腳本,不是必須,可自行編寫簡單的shell完成
[root@Manager_Slave scripts] # cp ./* /usr/local/bin/

5.3)在管理節點(182.48.115.238)上進行下面配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[root@Manager_Slave mha4mysql-manager-0.56] # mkdir -p /etc/masterha
[root@Manager_Slave mha4mysql-manager-0.56] # cp samples/conf/app1.cnf /etc/masterha/
[root@Manager_Slave mha4mysql-manager-0.56] # vim /etc/masterha/app1.cnf
[server default]
manager_workdir= /var/log/masterha/app1             // 設置manager的工做目錄
manager_log= /var/log/masterha/app1/manager .log     // 設置manager的日誌
  
ssh_user=root                                      //ssh 免密鑰登陸的賬號名
repl_user=repl                                     //mysql 複製賬號,用來在主從機之間同步二進制日誌等
repl_password=repl_1234                            // 設置mysql中root用戶的密碼,這個密碼是前文中建立監控用戶的那個密碼
ping_interval=1                                    // 設置監控主庫,發送 ping 包的時間間隔,用來檢查master是否正常,默認是3秒,嘗試三次沒有迴應的時候自動進行railover
master_ip_failover_script=  /usr/local/bin/master_ip_failover                // 設置自動failover時候的切換腳本
master_ip_online_change_script=  /usr/local/bin/master_ip_online_change      // 設置手動切換時候的切換腳本
  
[server1]
hostname =182.48.115.236
port=3306
master_binlog_dir= /data/mysql/data/        // 設置master 保存binlog的位置,以便MHA能夠找到master的日誌,我這裏的也就是mysql的數據目錄
  
[server2]
hostname =182.48.115.237
port=3306
candidate_master=1           // 設置爲候選master,即master機宕掉後,優先啓用這臺做爲新master,若是設置該參數之後,發生主從切換之後將會將此從庫提高爲主庫,即便這個主庫不是集羣中事件最新的slave
check_repl_delay=0          // 默認狀況下若是一個slave落後master 100M的relay logs的話,MHA將不會選擇該slave做爲一個新的master,由於對於這個slave的恢復須要花費很長時間,經過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略複製延時,這個參數對於設置了candidate_master=1的主機很是有用,由於這個候選主在切換的過程當中必定是新的master
master_binlog_dir= /data/mysql/data/
  
[server3]
hostname =182.48.115.238
port=3306
#candidate_master=1
master_binlog_dir= /data/mysql/data/
  
#[server4]
#hostname=host4
#no_master=1

5.4)設置relay log的清除方式(在兩臺slave節點上)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
[root@Node_Slave ~] # mysql -p123456 -e 'set global relay_log_purge=0'
[root@Manager_Slave ~] # mysql -p123456 -e 'set global relay_log_purge=0'
..................................................................................................
舒適提示:
MHA在發生切換的過程當中,從庫的恢復過程當中依賴於relay log的相關信息,因此這裏要將relay log的自動清除設置爲OFF,採用手動清除relay log的方式。
在默認狀況下,從服務器上的中繼日誌會在SQL線程執行完畢後被自動刪除。可是在MHA環境中,這些中繼日誌在恢復其餘從服務器時可能會被用到,所以須要禁用
中繼日誌的自動刪除功能。按期清除中繼日誌須要考慮到複製延時的問題。在ext3的文件系統下,刪除大的文件須要必定的時間,會致使嚴重的複製延時。爲了避
免複製延時,須要暫時爲中繼日誌建立硬連接,由於在linux系統中經過硬連接刪除大文件速度會很快。(在mysql數據庫中,刪除大表時,一般也採用創建硬連接的方式)
 
MHA節點中包含了pure_relay_logs命令工具,它能夠爲中繼日誌建立硬連接,執行SET GLOBAL relay_log_purge=1,等待幾秒鐘以便SQL線程切換到新的中繼日誌,
再執行SET GLOBAL relay_log_purge=0。
 
pure_relay_logs腳本參數以下所示:
--user mysql            用戶名
--password mysql        密碼
--port                  端口號
--workdir               指定建立relay log的硬連接的位置,默認是 /var/tmp ,因爲系統不一樣分區建立硬連接文件會失敗,故須要執行硬連接具體位置,成功執行腳本後,硬連接的中繼日誌文件被刪除
--disable_relay_log_purge     默認狀況下,若是relay_log_purge=1,腳本會什麼都不清理,自動退出,經過設定這個參數,當relay_log_purge=1的狀況下會將relay_log_purge設置爲0。清理relay log以後,最後將參數設置爲OFF。
 
設置按期清理relay腳本(在兩臺slave節點上操做)
[root@Node_Slave ~] # vim /root/purge_relay_log.sh
#!/bin/bash
user=root
passwd =123456
port=3306
host=localhost
log_dir= '/data/masterha/log'
work_dir= '/data'
purge= '/usr/local/bin/purge_relay_logs'
 
if  [ ! -d $log_dir ]
then
    mkdir  $log_dir -p
fi
 
$purge --user=$user --host=$host --password=$ passwd  --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir /purge_relay_logs .log 2>&1
 
[root@Node_Slave ~] # chmod 755 /root/purge_relay_log.sh
 
添加到 crontab 按期執行
[root@Node_Slave ~] # crontab -e
0 4 * * *  /bin/bash  /root/purge_relay_log .sh
 
purge_relay_logs腳本刪除中繼日誌不會阻塞SQL線程。下面手動執行看看什麼狀況。
[root@Node_Slave ~] # /usr/local/bin/purge_relay_logs --user=root --host=localhost --password=123456 --disable_relay_log_purge --port=3306 --workdir=/data
2017-05-31 23:27:13: purge_relay_logs script started.
  Found relay_log.info:  /data/mysql/data/relay-log .info
  Opening  /data/mysql/data/mysql-relay-bin .000002 ..
  Opening  /data/mysql/data/mysql-relay-bin .000003 ..
  Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files ( if  it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
2017-05-31 23:27:17: All relay log purging operations succeeded.
 
[root@Node_Slave ~] # ll /data/masterha/log/
總用量 4
-rw-r--r--. 1 root root 905 5月  31 23:26 purge_relay_logs.log

5.5)檢查SSH配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
檢查MHA Manger到全部MHA Node的SSH鏈接狀態:
[root@Manager_Slave ~] # masterha_check_ssh --conf=/etc/masterha/app1.cnf
Wed May 31 23:06:01 2017 - [warning] Global configuration  file  /etc/masterha_default .cnf not found. Skipping.
Wed May 31 23:06:01 2017 - [info] Reading application default configuration from  /etc/masterha/app1 .cnf..
Wed May 31 23:06:01 2017 - [info] Reading server configuration from  /etc/masterha/app1 .cnf..
Wed May 31 23:06:01 2017 - [info] Starting SSH connection tests..
Wed May 31 23:06:04 2017 - [debug]
Wed May 31 23:06:01 2017 - [debug]  Connecting via SSH from root@182.48.115.236(182.48.115.236:22) to root@182.48.115.237(182.48.115.237:22)..
Wed May 31 23:06:02 2017 - [debug]   ok.
Wed May 31 23:06:02 2017 - [debug]  Connecting via SSH from root@182.48.115.236(182.48.115.236:22) to root@182.48.115.238(182.48.115.238:22)..
Wed May 31 23:06:03 2017 - [debug]   ok.
Wed May 31 23:06:04 2017 - [debug]
Wed May 31 23:06:01 2017 - [debug]  Connecting via SSH from root@182.48.115.237(182.48.115.237:22) to root@182.48.115.236(182.48.115.236:22)..
Wed May 31 23:06:03 2017 - [debug]   ok.
Wed May 31 23:06:03 2017 - [debug]  Connecting via SSH from root@182.48.115.237(182.48.115.237:22) to root@182.48.115.238(182.48.115.238:22)..
Wed May 31 23:06:04 2017 - [debug]   ok.
Wed May 31 23:06:04 2017 - [debug]
Wed May 31 23:06:02 2017 - [debug]  Connecting via SSH from root@182.48.115.238(182.48.115.238:22) to root@182.48.115.236(182.48.115.236:22)..
Wed May 31 23:06:03 2017 - [debug]   ok.
Wed May 31 23:06:03 2017 - [debug]  Connecting via SSH from root@182.48.115.238(182.48.115.238:22) to root@182.48.115.237(182.48.115.237:22)..
Wed May 31 23:06:04 2017 - [debug]   ok.
Wed May 31 23:06:04 2017 - [info] All SSH connection tests passed successfully.
 
能夠看見各個節點 ssh 驗證都是ok的。

5.6)使用mha工具check檢查repl環境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
經過masterha_check_repl腳本查看整個mysql集羣的複製狀態
[root@Manager_Slave ~] # masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed May 31 23:43:43 2017 - [warning] Global configuration  file  /etc/masterha_default .cnf not found. Skipping.
Wed May 31 23:43:43 2017 - [info] Reading application default configuration from  /etc/masterha/app1 .cnf..
Wed May 31 23:43:43 2017 - [info] Reading server configuration from  /etc/masterha/app1 .cnf..
Wed May 31 23:43:43 2017 - [info] MHA::MasterMonitor version 0.56.
Wed May 31 23:43:43 2017 - [error][ /usr/local/share/perl5/MHA/ServerManager .pm, ln301] Got MySQL error when connecting 182.48.115.237(182.48.115.237:3306) :1045:Access denied  for  user  'root' @ '182.48.115.238'  (using password: NO), but this is not a MySQL crash. Check MySQL server settings.
  at  /usr/local/share/perl5/MHA/ServerManager .pm line 297
Wed May 31 23:43:43 2017 - [error][ /usr/local/share/perl5/MHA/ServerManager .pm, ln301] Got MySQL error when connecting 182.48.115.236(182.48.115.236:3306) :1045:Access denied  for  user  'root' @ '182.48.115.238'  (using password: NO), but this is not a MySQL crash. Check MySQL server settings.
  at  /usr/local/share/perl5/MHA/ServerManager .pm line 297
Wed May 31 23:43:43 2017 - [error][ /usr/local/share/perl5/MHA/ServerManager .pm, ln301] Got MySQL error when connecting 182.48.115.238(182.48.115.238:3306) :1045:Access denied  for  user  'root' @ '182.48.115.238'  (using password: NO), but this is not a MySQL crash. Check MySQL server settings.
  at  /usr/local/share/perl5/MHA/ServerManager .pm line 297
Wed May 31 23:43:43 2017 - [error][ /usr/local/share/perl5/MHA/ServerManager .pm, ln309] Got fatal error, stopping operations
Wed May 31 23:43:43 2017 - [error][ /usr/local/share/perl5/MHA/MasterMonitor .pm, ln424] Error happened on checking configurations.  at  /usr/local/share/perl5/MHA/MasterMonitor .pm line 326
Wed May 31 23:43:43 2017 - [error][ /usr/local/share/perl5/MHA/MasterMonitor .pm, ln523] Error happened on monitoring servers.
Wed May 31 23:43:43 2017 - [info] Got  exit  code 1 (Not master dead).
 
MySQL Replication Health is NOT OK!
 
發現上面的複製環節是不ok的!!!
緣由是經過root用戶遠程鏈接節點的mysql不通
..............................................................................................................
解決辦法:在三個節點機器上的mysql上受權,容許182.48.115.%的機器經過root用戶無密碼登錄,即
mysql> update mysql.user  set  password=password( "" ) where user= "root"  and host= "182.48.115.%" ;    // 若是沒有這個權限,就grant命令建立這個用戶權限
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
 
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
 
mysql>  select  user,host,password from mysql.user;
+---------+--------------+-------------------------------------------+
| user    | host         | password                                  |
+---------+--------------+-------------------------------------------+
.........
| root    | 182.48.115.% |                                           |
+---------+--------------+-------------------------------------------+
11 rows  in  set  (0.00 sec)
..............................................................................................................
 
而後再次經過masterha_check_repl腳本查看整個mysql集羣的複製狀態
[root@Manager_Slave ~] # masterha_check_repl --conf=/etc/masterha/app1.cnf
..............................
Bareword  "FIXME_xxx"  not allowed  while  "strict subs"  in  use at  /usr/local/bin/master_ip_failover  line 93.
 
仍是出現如上報錯,緣由是:
原來Failover兩種方式:一種是虛擬IP地址,一種是全局配置文件。MHA並無限定使用哪種方式,而是讓用戶本身選擇,虛擬IP地址的方式會牽扯到其它的軟件,好比keepalive軟件,並且還要修改腳本master_ip_failover。
 
解決辦法以下:
添加軟鏈接(全部節點):
[root@Manager_Slave ~] # ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
[root@Manager_Slave ~] # ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
 
先暫時註釋掉管理節點的 /etc/masterha/app1 .cnf文件中的master_ip_failover_script=  /usr/local/bin/master_ip_failover 這個選項。
後面引入keepalived後和修改該腳本之後再開啓該選項。
[root@Manager_Slave ~] # cat /etc/masterha/app1.cnf
.........                             
#master_ip_failover_script= /usr/local/bin/master_ip_failover
 
最後在經過masterha_check_repl腳本查看整個mysql集羣的複製狀態
[root@Manager_Slave ~] # masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Jun  1 00:20:58 2017 - [warning] Global configuration  file  /etc/masterha_default .cnf not found. Skipping.
Thu Jun  1 00:20:58 2017 - [info] Reading application default configuration from  /etc/masterha/app1 .cnf..
 
Thu Jun  1 00:20:58 2017 - [info]  read_only=1 is not  set  on slave 182.48.115.237(182.48.115.237:3306).
Thu Jun  1 00:20:58 2017 - [warning]  relay_log_purge=0 is not  set  on slave 182.48.115.237(182.48.115.237:3306).
Thu Jun  1 00:20:58 2017 - [info]  read_only=1 is not  set  on slave 182.48.115.238(182.48.115.238:3306).
Thu Jun  1 00:20:58 2017 - [warning]  relay_log_purge=0 is not  set  on slave 182.48.115.238(182.48.115.238:3306).
Thu Jun  1 00:20:58 2017 - [info] Checking replication filtering settings..
Thu Jun  1 00:20:58 2017 - [info]  binlog_do_db= , binlog_ignore_db= mysql
Thu Jun  1 00:20:58 2017 - [info]  Replication filtering check ok.
Thu Jun  1 00:20:58 2017 - [info] GTID (with auto-pos) is not supported
Thu Jun  1 00:20:58 2017 - [info] Starting SSH connection tests..
Thu Jun  1 00:21:02 2017 - [info] All SSH connection tests passed successfully.
...........
 
Thu Jun  1 00:21:07 2017 - [info] Checking replication health on 182.48.115.237..
Thu Jun  1 00:21:07 2017 - [info]  ok.
Thu Jun  1 00:21:07 2017 - [info] Checking replication health on 182.48.115.238..
Thu Jun  1 00:21:07 2017 - [info]  ok.
Thu Jun  1 00:21:07 2017 - [warning] master_ip_failover_script is not defined.
Thu Jun  1 00:21:07 2017 - [warning] shutdown_script is not defined.
Thu Jun  1 00:21:07 2017 - [info] Got  exit  code 0 (Not master dead).
 
MySQL Replication Health is OK.
 
這個時候,發現整個複製環境情況是ok的了!!

6)管理mha操做
6.1)檢查MHA Manager的狀態

1
2
3
4
5
經過master_check_status腳本查看Manager的狀態
[root@Manager_Slave ~] # masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
 
注意:若是正常,會顯示 "PING_OK" ,不然會顯示 "NOT_RUNNING" ,這表明MHA監控沒有開啓

6.2)開啓MHA Manager監控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
使用下面命令放在後臺執行啓動動做
[root@Manager_Slave ~] # nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
 
啓動參數介紹:
--remove_dead_master_conf      該參數表明當發生主從切換後,老的主庫的ip將會從配置文件中移除。
--manger_log                   日誌存放位置
--ignore_last_failover         在缺省狀況下,若是MHA檢測到連續發生宕機,且兩次宕機間隔不足8小時的話,則不會進行Failover,之因此這樣限制是爲了
                                避免 ping -pong效應。該參數表明忽略上次MHA觸發切換產生的文件,默認狀況下,MHA發生切換後會在日誌目錄,也就是上面我
                                設置的 /data 產生app1.failover.complete文件,下次再次切換的時候若是發現該目錄下存在該文件將不容許觸發切換,除非
                                在第一次切換後收到刪除該文件,爲了方便,這裏設置爲--ignore_last_failover。
再次查看MHA Manager監控是否正常:
[root@Manager_Slave ~] # masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:2542) is running(0:PING_OK), master:182.48.115.236
 
能夠看見已經在監控了,並且master的主機爲182.48.115.236
 
查看啓動日誌
[root@Manager_Slave ~] # tail -n20 /var/log/masterha/app1/manager.log
   Checking slave recovery environment settings..
     Opening  /data/mysql/data/relay-log .info ... ok.
     Relay log found at  /data/mysql/data , up to mysql-relay-bin.000006
     Temporary relay log  file  is  /data/mysql/data/mysql-relay-bin .000006
     Testing mysql connection and privileges..Warning: Using a password on the  command  line interface can be insecure.
  done .
     Testing mysqlbinlog output..  done .
     Cleaning up  test  file (s)..  done .
Thu Jun  1 00:37:29 2017 - [info] Slaves settings check  done .
Thu Jun  1 00:37:29 2017 - [info]
182.48.115.236(182.48.115.236:3306) (current master)
  +--182.48.115.237(182.48.115.237:3306)
  +--182.48.115.238(182.48.115.238:3306)
 
Thu Jun  1 00:37:29 2017 - [warning] master_ip_failover_script is not defined.
Thu Jun  1 00:37:29 2017 - [warning] shutdown_script is not defined.
Thu Jun  1 00:37:29 2017 - [info] Set master  ping  interval 1 seconds.
Thu Jun  1 00:37:29 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or  more  routes.
Thu Jun  1 00:37:29 2017 - [info] Starting  ping  health check on 182.48.115.236(182.48.115.236:3306)..
Thu Jun  1 00:37:29 2017 - [info] Ping(SELECT) succeeded, waiting  until  MySQL doesn't respond..
 
其中 "Ping(SELECT) succeeded, waiting until MySQL doesn't respond.." 說明整個系統已經開始監控了。

6.3)關閉MHA Manage監控

1
2
3
4
5
6
7
8
9
關閉很簡單,使用masterha_stop命令完成。
[root@Manager_Slave ~] # masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+  Exit 1                   nohup  masterha_manager --conf= /etc/masterha/app1 .cnf --remove_dead_master_conf --ignore_last_failover <  /dev/null  /var/log/masterha/app1/manager .log 2>&1
[root@Manager_Slave ~] #
 
查看MHA Manager監控,發現已關閉
[root@Manager_Slave ~] # masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

7)配置VIP
vip配置能夠採用兩種方式,一種經過keepalived的方式管理虛擬ip浮動;另一種經過腳本方式啓動虛擬ip的方式(即不須要keepalived或者heartbeat相似的軟件)。

第一種方式:經過keepalive的方式管理vip

7.1)下載軟件進行並進行安裝(在兩臺master上都要安裝,準確的說一臺是master(182.48.115.236);另一臺是備選master(182.48.115.237),在沒有切換之前是slave)

1
2
3
4
5
6
7
8
9
10
11
12
[root@Node_Master ~] # yum install -y openssl-devel
[root@Node_Master ~] # wget http://www.keepalived.org/software/keepalived-1.3.5.tar.gz
[root@Node_Master ~] # tar -zvxf keepalived-1.3.5.tar.gz
[root@Node_Master ~] # cd keepalived-1.3.5
[root@Node_Master keepalived-1.3.5] # ./configure --prefix=/usr/local/keepalived
[root@Node_Master keepalived-1.3.5] # make && make install
 
[root@Node_Master keepalived-1.3.5] # cp keepalived/etc/init.d/keepalived /etc/init.d/
[root@Node_Master keepalived-1.3.5] # cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
[root@Node_Master keepalived-1.3.5] # mkdir /etc/keepalived
[root@Node_Master keepalived-1.3.5] # cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
[root@Node_Master keepalived-1.3.5] # cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

7.2)keepalived配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
------------在master上配置(182.48.115.236節點上的配置)------------------
[root@Node_Master ~] # cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
[root@Node_Master ~] # vim /etc/keepalived/keepalived.conf
! Configuration File  for  keepalived
 
global_defs {
      notification_email {
      wangshibo@huanqiu.cn
    }
    notification_email_from ops@huanqiu.cn
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id MySQL-HA
}
 
vrrp_instance VI_1 {
     state BACKUP
     interface eth1
     virtual_router_id 51
     priority 150
     advert_int 1
     nopreempt
 
     authentication {
     auth_type PASS
     auth_pass 1111
     }
 
     virtual_ipaddress {
         182.48.115.239
     }
}
 
其中router_id MySQL HA表示設定keepalived組的名稱,將182.48.115.239這個虛擬ip綁定到該主機的eth1網卡上,而且設置了狀態爲backup模式,
將keepalived的模式設置爲非搶佔模式(nopreempt),priority 150表示設置的優先級爲150。
 
------------在candicate master上配置(182.48.115.237節點上的配置)------------------
[root@Node_Slave ~] # vim /etc/keepalived/keepalived.conf
! Configuration File  for  keepalived
 
global_defs {
      notification_email {
      wangshibo@huanqiu.cn
    }
    notification_email_from ops@huanqiu.cn
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id MySQL-HA
}
 
vrrp_instance VI_1 {
     state BACKUP
     interface eth1
     virtual_router_id 51
     priority 120
     advert_int 1
     nopreempt
 
     authentication {
     auth_type PASS
     auth_pass 1111
     }
 
     virtual_ipaddress {
         182.48.115.239
     }
}

7.3)啓動keepalived服務

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
--------------在master上啓動並查看日誌----------------
[root@Node_Master ~] # /etc/init.d/keepalived start
正在啓動 keepalived:                                      [肯定]
[root@Node_Master ~] # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
     link /loopback  00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1 /8  scope host lo
     inet6 ::1 /128  scope host
        valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     link /ether  52:54:00:5f:58: dc  brd ff:ff:ff:ff:ff:ff
     inet 182.48.115.236 /27  brd 182.48.115.255 scope global eth0
     inet 182.48.115.239 /32  scope global eth0
     inet6 fe80::5054:ff:fe5f:58dc /64  scope link
        valid_lft forever preferred_lft forever
 
[root@Node_Master ~] # tail -100 /var/log/messages
..........
Jun  1 02:12:10 percona1 Keepalived_vrrp[10329]: VRRP_Instance(VI_1) Sending /queueing  gratuitous ARPs on eth0  for  182.48.115.239
Jun  1 02:12:10 percona1 Keepalived_vrrp[10329]: Sending gratuitous ARP on eth0  for  182.48.115.239
Jun  1 02:12:10 percona1 Keepalived_vrrp[10329]: Sending gratuitous ARP on eth0  for  182.48.115.239
Jun  1 02:12:10 percona1 Keepalived_vrrp[10329]: Sending gratuitous ARP on eth0  for  182.48.115.239
Jun  1 02:12:10 percona1 Keepalived_vrrp[10329]: Sending gratuitous ARP on eth0  for  182.48.115.239
 
發現vip資源已經綁定到182.48.115.236這個master節點機上了
 
--------------在candicate master上啓動----------------
[root@Node_Slave ~] # /etc/init.d/keepalived start
正在啓動 keepalived:                                      [肯定]
[root@Node_Slave ~] # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
     link /loopback  00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1 /8  scope host lo
     inet6 ::1 /128  scope host
        valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     link /ether  52:54:00:1b:6e:53 brd ff:ff:ff:ff:ff:ff
     inet 182.48.115.237 /27  brd 182.48.115.255 scope global eth0
     inet6 fe80::5054:ff:fe1b:6e53 /64  scope link
        valid_lft forever preferred_lft forever
 
.....................................................................
從上面的信息能夠看到keepalived已經配置成功。
 
注意:
上面兩臺服務器的keepalived都設置爲了BACKUP模式,在keepalived中2種模式,分別是master->backup模式和backup->backup模式。這兩種模式有很大區別。
在master->backup模式下,一旦主庫宕機,虛擬ip會自動漂移到從庫,當主庫修復後,keepalived啓動後,還會把虛擬ip搶佔過來,即便設置了非搶佔模式(nopreempt)
搶佔ip的動做也會發生。在backup->backup模式下,當主庫宕機後虛擬ip會自動漂移到從庫上,當原主庫恢復和keepalived服務啓動後,並不會搶佔新主的虛擬ip,即便是
優先級高於從庫的優先級別,也不會發生搶佔。爲了減小ip漂移次數,一般是把修復好的主庫當作新的備庫。

7.4)MHA引入keepalived(MySQL服務進程掛掉時經過MHA 中止keepalived)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
要想把keepalived服務引入MHA,只須要修改切換是觸發的腳本文件master_ip_failover便可,在該腳本中添加在master發生宕機時對keepalived的處理。
編輯腳本 /usr/local/bin/master_ip_failover ,修改後以下:
 
[root@Manager_Slave ~] # vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
 
use strict;
use warnings FATAL =>  'all' ;
 
use Getopt::Long;
 
my (
     $ command ,          $ssh_user,        $orig_master_host, $orig_master_ip,
     $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
 
my $vip =  '182.48.115.239' ;
my $ssh_start_vip =  "/etc/init.d/keepalived start" ;
my $ssh_stop_vip =  "/etc/init.d/keepalived stop" ;
 
GetOptions(
     'command=s'           => \$ command ,
     'ssh_user=s'          => \$ssh_user,
     'orig_master_host=s'  => \$orig_master_host,
     'orig_master_ip=s'    => \$orig_master_ip,
     'orig_master_port=i'  => \$orig_master_port,
     'new_master_host=s'   => \$new_master_host,
     'new_master_ip=s'     => \$new_master_ip,
     'new_master_port=i'   => \$new_master_port,
);
 
exit  &main();
 
sub main {
 
     print  "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n" ;
 
     if  ( $ command  eq  "stop"  || $ command  eq  "stopssh"  ) {
 
         my $exit_code = 1;
         eval  {
             print  "Disabling the VIP on old master: $orig_master_host \n" ;
             &stop_vip();
             $exit_code = 0;
         };
         if  ($@) {
             warn  "Got Error: $@\n" ;
             exit  $exit_code;
         }
         exit  $exit_code;
     }
     elsif ( $ command  eq  "start"  ) {
 
         my $exit_code = 10;
         eval  {
             print  "Enabling the VIP - $vip on the new master - $new_master_host \n" ;
             &start_vip();
             $exit_code = 0;
         };
         if  ($@) {
             warn $@;
             exit  $exit_code;
         }
         exit  $exit_code;
     }
     elsif ( $ command  eq  "status"  ) {
         print  "Checking the Status of the script.. OK \n" ;
         #`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
         exit  0;
     }
     else  {
         &usage();
         exit  1;
     }
}
 
# A simple system call that enable the VIP on the new master
sub start_vip() {
     ` ssh  $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
      return  0  unless  ($ssh_user);
     ` ssh  $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
     print
     "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n" ;
}
 
 
如今已經修改這個腳本了,如今打開在上面提到過的參數,再檢查集羣狀態,看是否會報錯
[root@Manager_Slave ~] # grep 'master_ip_failover_script' /etc/masterha/app1.cnf
master_ip_failover_script=  /usr/local/bin/master_ip_failover
 
[root@Manager_Slave ~] # masterha_check_repl --conf=/etc/masterha/app1.cnf
.......
Checking the Status of the script.. OK
Thu Jun  1 03:31:57 2017 - [info]  OK.
Thu Jun  1 03:31:57 2017 - [warning] shutdown_script is not defined.
Thu Jun  1 03:31:57 2017 - [info] Got  exit  code 0 (Not master dead).
 
MySQL Replication Health is OK.
 
能夠看出複製狀況正常!
/usr/local/bin/master_ip_failover 添加或者修改的內容意思是當主庫數據庫發生故障時,會觸發MHA切換,MHA Manager會停掉主庫上的keepalived服務,
觸發虛擬ip漂移到備選從庫,從而完成切換。固然能夠在keepalived裏面引入腳本,這個腳本監控mysql是否正常運行,若是不正常,則調用該腳本殺掉keepalived進程。 

第二種方式:經過腳本的方式管理VIP
這裏是修改/usr/local/bin/master_ip_failover,修改完成後內容以下。還須要手動在master服務器上綁定一個vip

1)如今master節點上綁定vip

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@Master_node ~] # ifconfig eth0:0 182.48.115.239/27            //本機子網掩碼是27,通常都是24
[root@Master_node ~] # ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:5F:58:DC 
           inet addr:182.48.115.236  Bcast:182.48.115.255  Mask:255.255.255.224
           inet6 addr: fe80::5054:ff:fe5f:58dc /64  Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:25505 errors:0 dropped:0 overruns:0 frame:0
           TX packets:3358 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:3254957 (3.1 MiB)  TX bytes:482420 (471.1 KiB)
 
eth0:0    Link encap:Ethernet  HWaddr 52:54:00:5F:58:DC 
           inet addr:182.48.115.239  Bcast:182.48.115.255  Mask:255.255.255.224
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 
lo        Link encap:Local Loopback 
           inet addr:127.0.0.1  Mask:255.0.0.0
           inet6 addr: ::1 /128  Scope:Host
           UP LOOPBACK RUNNING  MTU:65536  Metric:1
           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

2)manager節點修改/usr/local/bin/master_ip_failover

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
[root@Manager_Slave ~] # cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
 
use strict;
use warnings FATAL =>  'all' ;
 
use Getopt::Long;
 
my (
     $ command ,          $ssh_user,        $orig_master_host, $orig_master_ip,
     $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
 
my $vip =  '182.48.115.239/27' ;
my $key =  '1' ;
my $ssh_start_vip =  "/sbin/ifconfig eth0:$key $vip" ;
my $ssh_stop_vip =  "/sbin/ifconfig eth0:$key down" ;
 
GetOptions(
     'command=s'           => \$ command ,
     'ssh_user=s'          => \$ssh_user,
     'orig_master_host=s'  => \$orig_master_host,
     'orig_master_ip=s'    => \$orig_master_ip,
     'orig_master_port=i'  => \$orig_master_port,
     'new_master_host=s'   => \$new_master_host,
     'new_master_ip=s'     => \$new_master_ip,
     'new_master_port=i'   => \$new_master_port,
);
 
exit  &main();
 
sub main {
 
     print  "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n" ;
 
     if  ( $ command  eq  "stop"  || $ command  eq  "stopssh"  ) {
 
         my $exit_code = 1;
         eval  {
             print  "Disabling the VIP on old master: $orig_master_host \n" ;
             &stop_vip();
             $exit_code = 0;
         };
         if  ($@) {
             warn  "Got Error: $@\n" ;
             exit  $exit_code;
         }
         exit  $exit_code;
     }
     elsif ( $ command  eq  "start"  ) {
 
         my $exit_code = 10;
         eval  {
             print  "Enabling the VIP - $vip on the new master - $new_master_host \n" ;
             &start_vip();
             $exit_code = 0;
         };
         if  ($@) {
             warn $@;
             exit  $exit_code;
         }
         exit  $exit_code;
     }
     elsif ( $ command  eq  "status"  ) {
         print  "Checking the Status of the script.. OK \n" ;
         exit  0;
     }
     else  {
         &usage();
         exit  1;
     }
}
 
sub start_vip() {
     ` ssh  $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
      return  0  unless  ($ssh_user);
     ` ssh  $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
     print
     "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n" ;
}
 
注意要將 /etc/masterha/app1 .cnf文件中的master_ip_failover_script註釋打開
爲了防止腦裂發生,推薦生產環境採用腳本的方式來管理虛擬ip,而不是使用keepalived來完成。到此爲止,基本MHA集羣已經配置完畢。
接下來就是實際的測試環節了。經過一些測試來看一下MHA究竟是如何進行工做的。

8)failover故障切換 
1)自動切換(必須先啓動MHA Manager,不然沒法自動切換。(固然手動切換不須要開啓MHA Manager監控))

1
2
3
4
5
1)在master主庫上使用sysbench生成測試數據
[root@Master_node ~] # yum install sysbench -y
 
在主庫(182.48.115.236)上進行sysbench數據生成,在sbtest庫下生成sbtest表,共100W記錄。
[root@Master_node ~] # sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 --mysql-user=root --mysql-socket=/local/mysql/var/mysql.sock --mysql-password=123456 --db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex prepare

1.2)在candicate master(182.48.115.237)上停掉slave sql線程,模擬主從延時。

1
2
3
4
mysql> stop slave io_thread;
Query OK, 0 rows affected (0.08 sec)
 
注意:另一臺slave沒有中止io線程,因此還在繼續接收日誌。

1.3)模擬sysbench壓力測試

1
2
在主庫上(182.48.115.236)進行壓力測試,持續時間爲3分鐘,產生大量的binlog
[root@Master_node ~] # sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on --num-threads=16 --max-requests=0 --oltp-dist-type=uniform --max-time=180 --mysql-user=root --mysql-socket=/local/mysql/var/mysql.sock --mysql-password=123456 --db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex run

1.4)開啓在candicate master(182.48.115.237)上的IO線程,追趕落後於master的binlog。

1
2
mysql> start slave io_thread;    
Query OK, 0 rows affected (0.00 sec)

1.5)殺掉主庫(182.48.115.236)mysql進程,模擬主庫發生故障,進行自動failover操做。

1
[root@Master_node ~] # pkill -9 mysqld

1.6)查看MHA切換日誌,瞭解整個切換過程。在manager管理節點(182.48.115.238)上查看日誌

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[root@Manager_Slave ~] # cat /var/log/masterha/app1/manager.log
........
........
----- Failover Report -----
 
app1: MySQL Master failover 182.48.115.236 to 182.48.115.237 succeeded
 
Master 182.48.115.236 is down!
 
Check MHA Manager logs at server01: /var/log/masterha/app1/manager .log  for  details.
 
Started automated(non-interactive) failover.
Invalidated master IP address on 182.48.115.236.
The latest slave 182.48.115.237(182.48.115.237:3306) has all relay logs  for  recovery.
Selected 182.48.115.237 as a new master.
182.48.115.237: OK: Applying all logs succeeded.
182.48.115.237: OK: Activated master IP address.
192.168.0.70: This host has the latest relay log events.
Generating relay  diff  files from the latest slave succeeded.
192.168.0.70: OK: Applying all logs succeeded. Slave started, replicating from 182.48.115.237.
182.48.115.237: Resetting slave info succeeded.
Master failover to 182.48.115.237(182.48.115.237:3306) completed successfully.
 
看到最後的Master failover to 182.48.115.237(182.48.115.237:3306) completed successfully.說明備選master如今已經上位了。
 
從上面的輸出能夠看出整個MHA的切換過程,共包括如下的步驟:
1)配置文件檢查階段,這個階段會檢查整個集羣配置文件配置
2)宕機的master處理,這個階段包括虛擬ip摘除操做,主機關機操做(這個我這裏尚未實現,須要研究)
3)複製dead maste和最新slave相差的relay log,並保存到MHA Manger具體的目錄下
4)識別含有最新更新的slave
5)應用從master保存的二進制日誌事件(binlog events)
6)提高一個slave爲新的master進行復制
7)使其餘的slave鏈接新的master進行復制
 
最後啓動MHA Manger監控,查看集羣裏面如今誰是master
[root@Manager_Slave ~] # masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:13301) is running(0:PING_OK), master:182.48.115.237

2)手動Failover(MHA Manager必須沒有運行)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
手動failover,這種場景意味着在業務上沒有啓用MHA自動切換功能,當主服務器故障時,人工手動調用MHA來進行故障切換操做,具體命令以下:
 
確保mha manager關閉
[root@Manager_Slave ~] # masterha_stop --conf=/etc/masterha/app1.cnf
 
注意:若是MHA manager檢測到沒有dead的server,將報錯,並結束failover:
[root@Manager_Slave ~] # masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=182.48.115.236 --dead_master_port=3306 --new_master_host=182.48.115.237 --new_master_port=3306 --ignore_last_failover
輸出的信息會詢問你是否進行切換:
........
----- Failover Report -----
 
app1: MySQL Master failover 182.48.115.236 to 182.48.115.237 succeeded
 
Master 182.48.115.236 is down!
 
Check MHA Manager logs at server01  for  details.
 
Started manual(interactive) failover.
Invalidated master IP address on 182.48.115.236.
The latest slave 182.48.115.237(182.48.115.237:3306) has all relay logs  for  recovery.
Selected 182.48.115.237 as a new master.
182.48.115.237: OK: Applying all logs succeeded.
182.48.115.237: OK: Activated master IP address.
192.168.0.70: This host has the latest relay log events.
Generating relay  diff  files from the latest slave succeeded.
192.168.0.70: OK: Applying all logs succeeded. Slave started, replicating from 182.48.115.237.
182.48.115.237: Resetting slave info succeeded.
Master failover to 182.48.115.237(182.48.115.237:3306) completed successfully.
 
這樣即模擬了master宕機的狀況下手動把192.168.0.60提高爲主庫的操做過程。

9)在線進行切換

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
在許多狀況下, 須要將現有的主服務器遷移到另一臺服務器上,好比主服務器硬件故障,RAID 控制卡須要重建,將主服務器移到性能更好的服務器上等等。維護主服務器引發性能降低,
致使停機時間至少沒法寫入數據。 另外, 阻塞或殺掉當前運行的會話會致使主主之間數據不一致的問題發生。 MHA提供快速切換和優雅的阻塞寫入,這個切換過程只須要 0.5-2s 的時
間,這段時間內數據是沒法寫入的。在不少狀況下,0.5-2s 的阻塞寫入是能夠接受的。所以切換主服務器不須要計劃分配維護時間窗口。
 
MHA在線切換的大概過程:
1)檢測複製設置和肯定當前主服務器
2)肯定新的主服務器
3)阻塞寫入到當前主服務器
4)等待全部從服務器遇上覆制
5)授予寫入到新的主服務器
6)從新設置從服務器
 
注意,在線切換的時候應用架構須要考慮如下兩個問題:
1)自動識別master和slave的問題(master的機器可能會切換),若是採用了vip的方式,基本能夠解決這個問題。
2)負載均衡的問題(能夠定義大概的讀寫比例,每臺機器可承擔的負載比例,當有機器離開集羣時,須要考慮這個問題)
 
爲了保證數據徹底一致性,在最快的時間內完成切換,MHA的在線切換必須知足如下條件纔會切換成功,不然會切換失敗。
1)全部slave的IO線程都在運行
2)全部slave的SQL線程都在運行
3)全部的show slave status的輸出中Seconds_Behind_Master參數小於或者等於running_updates_limit秒,若是在切換過程當中不指定running_updates_limit,那麼
   默認狀況下running_updates_limit爲1秒。
4)在master端,經過show processlist輸出,沒有一個更新花費的時間大於running_updates_limit秒。
 
在線切換步驟以下:
首先,manager節點上停掉MHA監控:
[root@Manager_Slave ~] # masterha_stop --conf=/etc/masterha/app1.cnf
 
其次,進行在線切換操做(模擬在線切換主庫操做,原主庫182.48.115.236變爲slave,182.48.115.237提高爲新的主庫)
[root@Manager_Slave ~] # masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=182.48.115.237 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
.........
Thu Jun  1 00:28:02 2014 - [info]  Executed CHANGE MASTER.
Thu Jun  1 00:28:02 2014 - [info]  Slave started.
Thu Jun  1 00:28:02 2014 - [info] All new slave servers switched successfully.
Thu Jun  1 00:28:02 2014 - [info]
Thu Jun  1 00:28:02 2014 - [info] * Phase 5: New master cleanup phease..
Thu Jun  1 00:28:02 2014 - [info]
Thu Jun  1 00:28:02 2014 - [info]  192.168.0.60: Resetting slave info succeeded.
Thu Jun  1 00:28:02 2014 - [info] Switching master to 192.168.0.60(192.168.0.60:3306) completed successfully.
 
其中參數的意思:
--orig_master_is_new_slave       切換時加上此參數是將原 master 變爲 slave 節點,若是不加此參數,原來的 master 將不啓動
--running_updates_limit=10000   故障切換時,候選master 若是有延遲的話, mha 切換不能成功,加上此參數表示延遲在此時間範圍內均可切換(單位爲s),可是切換的
                                  時間長短是由recover 時relay 日誌的大小決定
注意:
因爲在線進行切換須要調用到master_ip_online_change這個腳本,可是因爲該腳本不完整,須要進行相應的修改,腳本調整以下:
[root@Manager_Slave ~] # vim /usr/local/bin/master_ip_online_change
#!/usr/bin/env perl
 
use strict;
use warnings FATAL =>  'all' ;
 
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw(  sleep  gettimeofday tv_interval );
use Data::Dumper;
 
my $_tstart;
my $_running_interval = 0.1;
my (
   $ command ,          $orig_master_host, $orig_master_ip,
   $orig_master_port, $orig_master_user,
   $new_master_host,  $new_master_ip,    $new_master_port,
   $new_master_user, 
);
 
 
my $vip =  '182.48.115.239/27' ;   # Virtual IP
my $key =  "1" ;
my $ssh_start_vip =  "/sbin/ifconfig eth1:$key $vip" ;
my $ssh_stop_vip =  "/sbin/ifconfig eth1:$key down" ;
my $ssh_user =  "root" ;
my $new_master_password= '123456' ;
my $orig_master_password= '123456' ;
GetOptions(
   'command=s'               => \$ command ,
   #'ssh_user=s'             => \$ssh_user, 
   'orig_master_host=s'      => \$orig_master_host,
   'orig_master_ip=s'        => \$orig_master_ip,
   'orig_master_port=i'      => \$orig_master_port,
   'orig_master_user=s'      => \$orig_master_user,
   #'orig_master_password=s' => \$orig_master_password,
   'new_master_host=s'       => \$new_master_host,
   'new_master_ip=s'         => \$new_master_ip,
   'new_master_port=i'       => \$new_master_port,
   'new_master_user=s'       => \$new_master_user,
   #'new_master_password=s'  => \$new_master_password,
);
 
exit  &main();
 
sub current_time_us {
   my ( $sec, $microsec ) = gettimeofday();
   my $curdate = localtime($sec);
   return  $curdate .  " "  . sprintf(  "%06d" , $microsec );
}
 
sub sleep_until {
   my $elapsed = tv_interval($_tstart);
   if  ( $_running_interval > $elapsed ) {
     sleep ( $_running_interval - $elapsed );
   }
}
 
sub get_threads_util {
   my $dbh                    =  shift ;
   my $my_connection_id       =  shift ;
   my $running_time_threshold =  shift ;
   my $ type                    shift ;
   $running_time_threshold = 0 unless ($running_time_threshold);
   $ type                    = 0 unless ($ type );
   my @threads;
 
   my $sth = $dbh->prepare( "SHOW PROCESSLIST" );
   $sth->execute();
 
   while  ( my $ref = $sth->fetchrow_hashref() ) {
     my $ id          = $ref->{Id};
     my $user       = $ref->{User};
     my $host       = $ref->{Host};
     my $ command     = $ref->{Command};
     my $state      = $ref->{State};
     my $query_time = $ref->{Time};
     my $info       = $ref->{Info};
     $info =~ s/^\s*(.*?)\s*$/$1/  if  defined($info);
     next  if  ( $my_connection_id == $ id  );
     next  if  ( defined($query_time) && $query_time < $running_time_threshold );
     next  if  ( defined($ command )    && $ command  eq  "Binlog Dump"  );
     next  if  ( defined($user)       && $user  eq  "system user"  );
     next
       if  ( defined($ command )
       && $ command  eq  "Sleep"
       && defined($query_time)
       && $query_time >= 1 );
 
     if  ( $ type  >= 1 ) {
       next  if  ( defined($ command ) && $ command  eq  "Sleep"  );
       next  if  ( defined($ command ) && $ command  eq  "Connect"  );
     }
 
     if  ( $ type  >= 2 ) {
       next  if  ( defined($info) && $info =~ m/^ select /i  );
       next  if  ( defined($info) && $info =~ m/^show /i  );
     }
 
     push @threads, $ref;
   }
   return  @threads;
}
 
sub main {
   if  ( $ command  eq  "stop"  ) {
     ## Gracefully killing connections on the current master
     # 1. Set read_only= 1 on the new master
     # 2. DROP USER so that no app user can establish new connections
     # 3. Set read_only= 1 on the current master
     # 4. Kill current queries
     # * Any database access failure will result in script die.
     my $exit_code = 1;
     eval  {
       ## Setting read_only=1 on the new master (to avoid accident)
       my $new_master_handler = new MHA::DBHelper();
 
       # args: hostname, port, user, password, raise_error(die_on_error)_or_not
       $new_master_handler->connect( $new_master_ip, $new_master_port,
         $new_master_user, $new_master_password, 1 );
       print current_time_us() .  " Set read_only on the new master.. " ;
       $new_master_handler->enable_read_only();
       if  ( $new_master_handler->is_read_only() ) {
         print  "ok.\n" ;
       }
       else  {
         die  "Failed!\n" ;
       }
       $new_master_handler->disconnect();
 
       # Connecting to the orig master, die if any database error happens
       my $orig_master_handler = new MHA::DBHelper();
       $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
         $orig_master_user, $orig_master_password, 1 );
 
       ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
       #$orig_master_handler->disable_log_bin_local();
       #print current_time_us() . " Drpping app user on the orig master..\n";
       #FIXME_xxx_drop_app_user($orig_master_handler);
 
       ## Waiting for N * 100 milliseconds so that current connections can exit
       my $time_until_read_only = 15;
       $_tstart = [gettimeofday];
       my @threads = get_threads_util( $orig_master_handler->{dbh},
         $orig_master_handler->{connection_id} );
       while  ( $time_until_read_only > 0 && $ #threads >= 0 ) {
         if  ( $time_until_read_only % 5 == 0 ) {
           printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n" ,
             current_time_us(), $ #threads + 1, $time_until_read_only * 100;
           if  ( $ #threads < 5 ) {
             print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump .  "\n"
               foreach (@threads);
           }
         }
         sleep_until();
         $_tstart = [gettimeofday];
         $time_until_read_only--;
         @threads = get_threads_util( $orig_master_handler->{dbh},
           $orig_master_handler->{connection_id} );
       }
 
       ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
       print current_time_us() .  " Set read_only=1 on the orig master.. " ;
       $orig_master_handler->enable_read_only();
       if  ( $orig_master_handler->is_read_only() ) {
         print  "ok.\n" ;
       }
       else  {
         die  "Failed!\n" ;
       }
 
       ## Waiting for M * 100 milliseconds so that current update queries can complete
       my $time_until_kill_threads = 5;
       @threads = get_threads_util( $orig_master_handler->{dbh},
         $orig_master_handler->{connection_id} );
       while  ( $time_until_kill_threads > 0 && $ #threads >= 0 ) {
         if  ( $time_until_kill_threads % 5 == 0 ) {
           printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n" ,
             current_time_us(), $ #threads + 1, $time_until_kill_threads * 100;
           if  ( $ #threads < 5 ) {
             print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump .  "\n"
               foreach (@threads);
           }
         }
         sleep_until();
         $_tstart = [gettimeofday];
         $time_until_kill_threads--;
         @threads = get_threads_util( $orig_master_handler->{dbh},
           $orig_master_handler->{connection_id} );
       }
 
 
 
                 print  "Disabling the VIP on old master: $orig_master_host \n" ;
                 &stop_vip();    
 
 
       ## Terminating all threads
       print current_time_us() .  " Killing all application threads..\n" ;
       $orig_master_handler->kill_threads(@threads)  if  ( $ #threads >= 0 );
       print current_time_us() .  " done.\n" ;
       #$orig_master_handler->enable_log_bin_local();
       $orig_master_handler->disconnect();
 
       ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
       $exit_code = 0;
     };
     if  ($@) {
       warn  "Got Error: $@\n" ;
       exit  $exit_code;
     }
     exit  $exit_code;
   }
   elsif ( $ command  eq  "start"  ) {
     ## Activating master ip on the new master
     # 1. Create app user with write privileges
     # 2. Moving backup script if needed
     # 3. Register new master's ip to the catalog database
 
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
     my $exit_code = 10;
     eval  {
       my $new_master_handler = new MHA::DBHelper();
 
       # args: hostname, port, user, password, raise_error_or_not
       $new_master_handler->connect( $new_master_ip, $new_master_port,
         $new_master_user, $new_master_password, 1 );
 
       ## Set read_only=0 on the new master
       #$new_master_handler->disable_log_bin_local();
       print current_time_us() .  " Set read_only=0 on the new master.\n" ;
       $new_master_handler->disable_read_only();
 
       ## Creating an app user on the new master
       #print current_time_us() . " Creating app user on the new master..\n";
       #FIXME_xxx_create_app_user($new_master_handler);
       #$new_master_handler->enable_log_bin_local();
       $new_master_handler->disconnect();
 
       ## Update master ip on the catalog database, etc
                 print  "Enabling the VIP - $vip on the new master - $new_master_host \n" ;
                 &start_vip();
                 $exit_code = 0;
     };
     if  ($@) {
       warn  "Got Error: $@\n" ;
       exit  $exit_code;
     }
     exit  $exit_code;
   }
   elsif ( $ command  eq  "status"  ) {
 
     # do nothing
     exit  0;
   }
   else  {
     &usage();
     exit  1;
   }
}
 
# A simple system call that enable the VIP on the new master
sub start_vip() {
     ` ssh  $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
     ` ssh  $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
   print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n" ;
   die;
}

10)修復宕機後的master節點

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#!/usr/bin/perl
 
#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 
## Note: This is a sample script and is not complete. Modify the script based on your environment.
 
use strict;
use warnings FATAL =>  'all' ;
use Mail::Sender;
use Getopt::Long;
 
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp= 'smtp.163.com' ;
my $mail_from= 'xxxx' ;
my $mail_user= 'xxxxx' ;
my $mail_pass= 'xxxxx' ;
my $mail_to=[ 'xxxx' , 'xxxx' ];
GetOptions(
   'orig_master_host=s'  => \$dead_master_host,
   'new_master_host=s'   => \$new_master_host,
   'new_slave_hosts=s'   => \$new_slave_hosts,
   'subject=s'           => \$subject,
   'body=s'              => \$body,
);
 
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
 
sub mailToContacts {
     my ( $smtp, $mail_from, $user, $ passwd , $mail_to, $subject, $msg ) = @_;
     open  my $DEBUG,  "> /tmp/monitormail.log"
         or die  "Can't open the debug      file:$!\n" ;
     my $sender = new Mail::Sender {
         ctype       =>  'text/plain; charset=utf-8' ,
         encoding    =>  'utf-8' ,
         smtp        => $smtp,
         from        => $mail_from,
         auth        =>  'LOGIN' ,
         TLS_allowed =>  '0' ,
         authid      => $user,
         authpwd     => $ passwd ,
         to          => $mail_to,
         subject     => $subject,
         debug       => $DEBUG
     };
 
     $sender->MailMsg(
         {   msg   => $msg,
             debug => $DEBUG
         }
     ) or print $Mail::Sender::Error;
     return  1;
}
 
 
 
# Do whatever you want here
 
exit  0;

目前高可用方案能夠必定程度上實現數據庫的高可用,出於對數據庫的高可用和數據一致性的要求,推薦使用MHA架構。

相關文章
相關標籤/搜索