MySQL高可用方案MHA的部署和原理

MHA(Master High Availability)是一套相對成熟的MySQL高可用方案,能作到在0~30s內自動完成數據庫的故障切換操做,在master服務器不宕機的狀況下,基本能保證數據的一致性。html

 

它由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。其中,MHA Manager能夠單獨部署在一臺獨立的機器上管理多個master-slave集羣,也能夠部署在一臺slave上。MHA Node則運行在每一個mysql節點上,MHA Manager會定時探測集羣中的master節點,當master出現故障時,它自動將最新數據的slave提高爲master,而後將其它全部的slave指向新的master。node

 

在MHA自動故障切換過程當中,MHA試圖保存master的二進制日誌,從而最大程度地保證數據不丟失,當這並不老是可行的,譬如,主服務器硬件故障或沒法經過ssh訪問,MHA就無法保存二進制日誌,這樣就只進行了故障轉移但丟失了最新數據。可結合MySQL 5.5中推出的半同步複製來下降數據丟失的風險。python

 

MHA軟件由兩部分組成:Manager工具包和Node工具包,具體說明以下:mysql

MHA Manager:redis

1. masterha_check_ssh:檢查MHA的SSH配置情況sql

2. masterha_check_repl:檢查MySQL的複製情況shell

3. masterha_manager:啓動MHA數據庫

4. masterha_check_status:檢測當前MHA運行狀態vim

5. masterha_master_monitor:檢測master是否宕機服務器

6. masterha_master_switch:控制故障轉移(自動或手動)

7. masterha_conf_host:添加或刪除配置的server信息

8. masterha_stop:關閉MHA

 

MHA Node:

save_binary_logs:保存或複製master的二進制日誌

apply_diff_relay_logs:識別差別的relay log並將差別的event應用到其它slave中

filter_mysqlbinlog:去除沒必要要的ROLLBACK事件(MHA已再也不使用這個工具)

purge_relay_logs:消除中繼日誌(不會堵塞SQL線程)

 

另有以下幾個腳本需自定義:

1. master_ip_failover:管理VIP

2. master_ip_online_change:

3. masterha_secondary_check:當MHA manager檢測到master不可用時,經過masterha_secondary_check腳原本進一步確認,減低誤切的風險。

4. send_report:當發生故障切換時,可經過send_report腳本發送告警信息。

 

集羣信息

角色                             IP地址                 ServerID      類型

Master                         192.168.244.10   1                 寫入

Candicate master          192.168.244.20   2                 讀

Slave                           192.168.244.30   3                 讀

Monitor host                 192.168.244.40                      監控集羣組

注:操做系統均爲RHEL 6.7

其中,master對外提供寫服務,備選master提供讀服務,slave也提供相關的讀服務,一旦master宕機,將會把備選master提高爲新的master,slave指向新的master

 

1、在全部節點上安裝MHA node

    1. 在MySQL服務器上安裝MHA node所需的perl模塊(DBD:mysql)

      # yum install perl-DBD-MySQL -y

    2. 在全部的節點上安裝mha node

      下載地址爲:https://code.google.com/p/mysql-master-ha/wiki/Downloads?tm=2

      因爲該網址在國內被牆,相關文件下載後,放到了我的網盤中,http://pan.baidu.com/s/1boS31vT,有須要的童鞋可自行下載。

      # tar xvf mha4mysql-node-0.56.tar.gz

      # cd mha4mysql-node-0.56

      # perl Makefile.PL  

Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 6.
BEGIN failed--compilation aborted at inc/Module/Install/Can.pm line 6.
Compilation failed in require at inc/Module/Install.pm line 283.
Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4.
BEGIN failed--compilation aborted at inc/Module/Install/Makefile.pm line 4.
Compilation failed in require at inc/Module/Install.pm line 283.
Can't locate ExtUtils/MM_Unix.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Metadata.pm line 349.
View Code

     經過報錯能夠看出,是相關依賴包沒有安裝。

     # yum install perl-ExtUtils-MakeMaker -y

     # perl Makefile.PL  

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
Can't locate CPAN.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/AutoInstall.pm line 277.

    # yum install perl-CPAN -y

    # perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node
View Code

     # make 

     # make install

    至此,MHA node節點安裝完畢,會在/usr/local/bin下生成如下腳本文件

# ll /usr/local/bin/
total 44
-r-xr-xr-x 1 root root 16367 Jul 20 07:00 apply_diff_relay_logs
-r-xr-xr-x 1 root root  4807 Jul 20 07:00 filter_mysqlbinlog
-r-xr-xr-x 1 root root  8261 Jul 20 07:00 purge_relay_logs
-r-xr-xr-x 1 root root  7525 Jul 20 07:00 save_binary_logs

    

2、在Monitor host節點上部署MHA Manager

     # tar xvf mha4mysql-manager-0.56.tar.gz 

     # cd mha4mysql-manager-0.56

     # perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.609)
- DBD::mysql            ...loaded. (4.013)
- Time::HiRes           ...missing.
- Config::Tiny          ...missing.
- Log::Dispatch         ...missing.
- Parallel::ForkManager ...missing.
- MHA::NodeConst        ...missing.
==> Auto-install the 5 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Warning: prerequisite Config::Tiny 0 not found.
Warning: prerequisite Log::Dispatch 0 not found.
Warning: prerequisite MHA::NodeConst 0 not found.
Warning: prerequisite Parallel::ForkManager 0 not found.
Warning: prerequisite Time::HiRes 0 not found.
Writing Makefile for mha4mysql::manager
View Code

     # make

     # make install

    執行完畢後,會在/usr/local/bin下新增如下幾個文件  

# ll /usr/local/bin/
total 40
-r-xr-xr-x 1 root root 1991 Jul 20 00:50 masterha_check_repl
-r-xr-xr-x 1 root root 1775 Jul 20 00:50 masterha_check_ssh
-r-xr-xr-x 1 root root 1861 Jul 20 00:50 masterha_check_status
-r-xr-xr-x 1 root root 3197 Jul 20 00:50 masterha_conf_host
-r-xr-xr-x 1 root root 2513 Jul 20 00:50 masterha_manager
-r-xr-xr-x 1 root root 2161 Jul 20 00:50 masterha_master_monitor
-r-xr-xr-x 1 root root 2369 Jul 20 00:50 masterha_master_switch
-r-xr-xr-x 1 root root 5167 Jul 20 00:50 masterha_secondary_check
-r-xr-xr-x 1 root root 1735 Jul 20 00:50 masterha_stop

 

3、配置SSH登陸無密碼驗證

    1. 在manager上配置到全部Node節點的無密碼驗證

      # ssh-keygen

      一路按「Enter」

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

     # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

    2. 在Master(192.168.244.10)上配置

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

    3. 在Candicate master(192.168.244.20)上配置     

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.30

     4. 在Slave(192.168.244.30)上配置     

    # ssh-keygen

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.10

    # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.244.20

 

4、搭建主從複製環境

     1. 在Master上執行備份

     # mysqldump --master-data=2 --single-transaction -R --triggers -A > all.sql

     其中,-R是備份存儲過程,--triggers是備份觸發器 -A表明全庫

     2. 在Master上建立複製用戶

mysql> grant replication slave on *.* to 'repl'@'192.168.244.%' identified by 'repl';
Query OK, 0 rows affected (0.09 sec)

    3. 查看備份文件all.sql中的CHANGE MASTER語句

      # head -n 30 all.sql

-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=120;

     4. 將備份文件複製到Candicate master和Slave上

     # scp all.sql 192.168.244.20:/root/

     # scp all.sql 192.168.244.30:/root/

     5. 在Candicate master上搭建從庫

     # mysql < all.sql 

     設置複製信息

mysql> CHANGE MASTER TO
    -> MASTER_HOST='192.168.244.10',
    -> MASTER_USER='repl',
    -> MASTER_PASSWORD='repl',
    -> MASTER_LOG_FILE='mysql-bin.000002',
    -> MASTER_LOG_POS=120;
Query OK, 0 rows affected, 2 warnings (0.19 sec)

mysql> start slave;
Query OK, 0 rows affected (0.02 sec)

mysql> show slave status\G

       6. 在Slave上搭建從庫

       7. slave服務器設置爲read only

mysql> set global read_only=1;
Query OK, 0 rows affected (0.04 sec)

       8. 在Master中建立監控用戶

mysql> grant all privileges on *.* to 'monitor'@'%' identified by 'monitor123';
Query OK, 0 rows affected (0.07 sec)

 

5、 配置MHA

     1. 在Monitor host(192.168.244.40)上建立MHA工做目錄,而且建立相關配置文件

     # mkdir -p /etc/masterha

     # vim /etc/masterha/app1.cnf

[server default]
manager_log=/masterha/app1/manager.log          //設置manager的日誌
manager_workdir=/masterha/app1           //設置manager的工做目錄
master_binlog_dir=/var/lib/mysql                  //設置master默認保存binlog的位置,以便MHA能夠找到master的日誌
master_ip_failover_script= /usr/local/bin/master_ip_failover    //設置自動failover時候的切換腳本
master_ip_online_change_script= /usr/local/bin/master_ip_online_change  //設置手動切換時候的切換腳本
user=monitor               // 設置監控用戶
password=monitor123         //設置監控用戶的密碼
ping_interval=1         //設置監控主庫,發送ping包的時間間隔,默認是3秒,嘗試三次沒有迴應的時候進行自動failover
remote_workdir=/tmp     //設置遠端mysql在發生切換時binlog的保存位置
repl_user=repl          //設置複製環境中的複製用戶名
repl_password=repl    //設置複製用戶的密碼
report_script=/usr/local/bin/send_report    //設置發生切換後發送的報警的腳本
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.244.20 -s 192.168.244.30 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306  //一旦MHA到master的監控之間出現問題,MHA Manager將會判斷其它兩個slave是否能創建到master_ip 3306端口的鏈接
shutdown_script=""      //設置故障發生後關閉故障主機腳本(該腳本的主要做用是關閉主機防止發生腦裂)
ssh_user=root           //設置ssh的登陸用戶名

[server1]
hostname=192.168.244.10
port=3306

[server2]
hostname=192.168.244.20
port=3306
candidate_master=1   //設置爲候選master,若是設置該參數之後,發生主從切換之後將會將此從庫提高爲主庫,即便這個主庫不是集羣中最新的slave
check_repl_delay=0   //默認狀況下若是一個slave落後master 100M的relay logs的話,MHA將不會選擇該slave做爲一個新的master,由於對於這個slave的恢復須要花費很長時間,經過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略複製延時,這個參數對於設置了candidate_master=1的主機很是有用,由於它保證了這個候選主在切換過程當中必定是最新的master

[server3]
hostname=192.168.244.30
port=3306

      注意:

      1> 在編輯該文件時,後面的註釋切記要去掉,MHA並不會將後面的內容識別爲註釋。

      2> 配置文件中設置了master_ip_failover_script,secondary_check_script,master_ip_online_change_script,report_script,對應的文件見文章末 尾。

      2. 設置relay log清除方式(在每一個Slave上)

mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)

      MHA在發生切換過程當中,從庫在恢復的過程當中,依賴於relay log的相關信息,因此咱們這裏要將relay log的自動清楚設置爲OFF,採用手動清楚relay log的方式。

      在默認狀況下,從服務器上的中繼日誌會在SQL線程執行完後被自動刪除。可是在MHA環境中,這些中繼日誌在恢復其它從服務器時可能會被用到,所以須要禁用中繼日誌的自動清除。改成按期手動清除SQL線程應用完的中繼日誌。

      在ext3文件系統下,刪除大的文件須要必定的時間,這樣會致使嚴重的複製延遲,因此在Linux中,通常都是經過硬連接的方式來刪除大文件。

      3. 設置按期清理relay腳本

         MHA節點中包含了purge_relay_logs腳本,它能夠爲relay log建立硬連接,執行set global relay_log_purge=1,等待幾秒鐘以便SQL線程切換到新的中繼日誌,再執行set global relay_log_purge=0。

         下面看看腳本的使用方法:

         # purge_relay_logs --user=monitor --password=monitor123 -disable_relay_log_purge --workdir=/tmp/

2017-04-24 20:27:46: purge_relay_logs script started.
 Found relay_log.info: /var/lib/mysql/relay-log.info
 Opening /var/lib/mysql/mysqld-relay-bin.000001 ..
 Opening /var/lib/mysql/mysqld-relay-bin.000002 ..
 Opening /var/lib/mysql/mysqld-relay-bin.000003 ..
 Opening /var/lib/mysql/mysqld-relay-bin.000004 ..
 Opening /var/lib/mysql/mysqld-relay-bin.000005 ..
 Opening /var/lib/mysql/mysqld-relay-bin.000006 ..
 Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if i
t keeps up); SET GLOBAL relay_log_purge=0; .. ok.2017-04-24 20:27:50: All relay log purging operations succeeded.

        其中,

        --user:mysql用戶名

        --password:mysql用戶的密碼

        --host: mysqlserver地址

        --workdir:指定建立relay log的硬連接的位置,默認的是/var/tmp。因爲系統不一樣分區建立硬連接文件會失敗,故須要指定具體的硬連接的位置。

        --disable_relay_log_purge:默認狀況下,若是relay_log_purge=1,則腳本會直接退出。經過設置這個參數,該腳本會首先將relay_log_purge設置爲1,清除掉relay log後,再將該參數設置爲0。

        設置crontab來按期清理relay log

        MHA在切換的過程當中會直接調用mysqlbinlog命令,故須要在環境變量中指定mysqlbinlog的具體路徑。

        # vim /etc/cron.d/purge_relay_logs

0 4 * * * /usr/local/bin/purge_relay_logs --user=monitor --password=monitor123 -disable_relay_log_purge --workdir=/tmp/ >> /tmp/purge
_relay_logs.log 2>&1

       注意:最好是每臺slave服務器在不一樣時間點執行該計劃任務。

      4. 將mysqlbinlog的路徑添加到環境變量中

 

6、 檢查SSH的配置

       在Monitor host上執行

       # masterha_check_ssh --conf=/etc/masterha/app1.cnf

Wed Jul 20 14:33:36 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 20 14:33:36 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:33:36 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:33:36 2016 - [info] Starting SSH connection tests..
Wed Jul 20 14:33:51 2016 - [debug] 
Wed Jul 20 14:33:36 2016 - [debug]  Connecting via SSH from root@192.168.244.10(192.168.244.10:22) to root@192.168.244.20(192.168.244.20:22)..
Wed Jul 20 14:33:48 2016 - [debug]   ok.
Wed Jul 20 14:33:48 2016 - [debug]  Connecting via SSH from root@192.168.244.10(192.168.244.10:22) to root@192.168.244.30(192.168.244.30:22)..
Wed Jul 20 14:33:50 2016 - [debug]   ok.
Wed Jul 20 14:33:55 2016 - [debug] 
Wed Jul 20 14:33:37 2016 - [debug]  Connecting via SSH from root@192.168.244.30(192.168.244.30:22) to root@192.168.244.10(192.168.244.10:22)..
Wed Jul 20 14:33:49 2016 - [debug]   ok.
Wed Jul 20 14:33:49 2016 - [debug]  Connecting via SSH from root@192.168.244.30(192.168.244.30:22) to root@192.168.244.20(192.168.244.20:22)..
Wed Jul 20 14:33:54 2016 - [debug]   ok.
Wed Jul 20 14:33:55 2016 - [debug] 
Wed Jul 20 14:33:36 2016 - [debug]  Connecting via SSH from root@192.168.244.20(192.168.244.20:22) to root@192.168.244.10(192.168.244.10:22)..
Wed Jul 20 14:33:49 2016 - [debug]   ok.
Wed Jul 20 14:33:49 2016 - [debug]  Connecting via SSH from root@192.168.244.20(192.168.244.20:22) to root@192.168.244.30(192.168.244.30:22)..
Wed Jul 20 14:33:54 2016 - [debug]   ok.
Wed Jul 20 14:33:55 2016 - [info] All SSH connection tests passed successfully.
View Code

 

7、查看整個集羣的狀態

     在Monitor host上執行

     # masterha_check_repl --conf=/etc/masterha/app1.cnf

Wed Jul 20 14:44:30 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 20 14:44:30 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:44:30 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jul 20 14:44:30 2016 - [info] MHA::MasterMonitor version 0.56.
Wed Jul 20 14:44:31 2016 - [info] GTID failover mode = 0
Wed Jul 20 14:44:31 2016 - [info] Dead Servers:
Wed Jul 20 14:44:31 2016 - [info] Alive Servers:
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.20(192.168.244.20:3306)
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.30(192.168.244.30:3306)
Wed Jul 20 14:44:31 2016 - [info] Alive Slaves:
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.20(192.168.244.20:3306)  Version=5.6.31 (oldest major version between slaves) log-bin:disabled
Wed Jul 20 14:44:31 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Jul 20 14:44:31 2016 - [info]   192.168.244.30(192.168.244.30:3306)  Version=5.6.31 (oldest major version between slaves) log-bin:disabled
Wed Jul 20 14:44:31 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 14:44:31 2016 - [info] Checking slave configurations..
Wed Jul 20 14:44:31 2016 - [warning]  log-bin is not set on slave 192.168.244.20(192.168.244.20:3306). This host cannot be a master.
Wed Jul 20 14:44:31 2016 - [warning]  log-bin is not set on slave 192.168.244.30(192.168.244.30:3306). This host cannot be a master.
Wed Jul 20 14:44:31 2016 - [info] Checking replication filtering settings..
Wed Jul 20 14:44:31 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jul 20 14:44:31 2016 - [info]  Replication filtering check ok.
Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln361] None of slaves can be master. Check failover configuration file or log-bin settings in my.cnf
Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/local/bin/masterha_check_repl line 48.
Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Wed Jul 20 14:44:31 2016 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!
View Code

    報錯很明顯,Candicate master和Slave都沒有啓動log-bin,若是沒有啓動的話,後續就沒法提高爲主

    設置log-bin後,從新執行:

Wed Jul 20 15:49:58 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 20 15:49:58 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Jul 20 15:49:58 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Jul 20 15:49:58 2016 - [info] MHA::MasterMonitor version 0.56.
Wed Jul 20 15:49:59 2016 - [info] GTID failover mode = 0
Wed Jul 20 15:49:59 2016 - [info] Dead Servers:
Wed Jul 20 15:49:59 2016 - [info] Alive Servers:
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.20(192.168.244.20:3306)
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.30(192.168.244.30:3306)
Wed Jul 20 15:49:59 2016 - [info] Alive Slaves:
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.20(192.168.244.20:3306)  Version=5.6.31-log (oldest major version between slaves) log-bin:enabled
Wed Jul 20 15:49:59 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Jul 20 15:49:59 2016 - [info]   192.168.244.30(192.168.244.30:3306)  Version=5.6.31-log (oldest major version between slaves) log-bin:enabled
Wed Jul 20 15:49:59 2016 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306)
Wed Jul 20 15:49:59 2016 - [info] Checking slave configurations..
Wed Jul 20 15:49:59 2016 - [info] Checking replication filtering settings..
Wed Jul 20 15:49:59 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jul 20 15:49:59 2016 - [info]  Replication filtering check ok.
Wed Jul 20 15:49:59 2016 - [info] GTID (with auto-pos) is not supported
Wed Jul 20 15:49:59 2016 - [info] Starting SSH connection tests..
Wed Jul 20 15:50:17 2016 - [info] All SSH connection tests passed successfully.
Wed Jul 20 15:50:17 2016 - [info] Checking MHA Node version..
Wed Jul 20 15:50:18 2016 - [info]  Version check ok.
Wed Jul 20 15:50:18 2016 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jul 20 15:50:20 2016 - [info] HealthCheck: SSH to 192.168.244.10 is reachable.
Wed Jul 20 15:50:21 2016 - [info] Master MHA Node version is 0.56.
Wed Jul 20 15:50:21 2016 - [info] Checking recovery script configurations on 192.168.244.10(192.168.244.10:3306)..
Wed Jul 20 15:50:21 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysqld-bin.000002 
Wed Jul 20 15:50:21 2016 - [info]   Connecting to root@192.168.244.10(192.168.244.10:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysqld-bin.000002
Wed Jul 20 15:50:23 2016 - [info] Binlog setting check done.
Wed Jul 20 15:50:23 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jul 20 15:50:23 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.244.20 --slave_ip=192.168.244.20 --slave_port=3306 --workdir=/tmp --target_version=5.6.31-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jul 20 15:50:23 2016 - [info]   Connecting to root@192.168.244.20(192.168.244.20:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 20 15:50:28 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.244.30 --slave_ip=192.168.244.30 --slave_port=3306 --workdir=/tmp --target_version=5.6.31-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Wed Jul 20 15:50:28 2016 - [info]   Connecting to root@192.168.244.30(192.168.244.30:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000008
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000008
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 20 15:50:32 2016 - [info] Slaves settings check done.
Wed Jul 20 15:50:32 2016 - [info] 
192.168.244.10(192.168.244.10:3306) (current master)
 +--192.168.244.20(192.168.244.20:3306)
 +--192.168.244.30(192.168.244.30:3306)

Wed Jul 20 15:50:32 2016 - [info] Checking replication health on 192.168.244.20..
Wed Jul 20 15:50:32 2016 - [info]  ok.
Wed Jul 20 15:50:32 2016 - [info] Checking replication health on 192.168.244.30..
Wed Jul 20 15:50:32 2016 - [info]  ok.
Wed Jul 20 15:50:32 2016 - [info] Checking master_ip_failover_script status:
Wed Jul 20 15:50:32 2016 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 
Wed Jul 20 15:50:32 2016 - [info]  OK.
Wed Jul 20 15:50:32 2016 - [warning] shutdown_script is not defined.
Wed Jul 20 15:50:32 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
View Code

    檢查經過~

 

8、 檢查MHA Manager的狀態

# masterha_check_status --conf=/etc/masterha/app1.cnf 
app1 is stopped(2:NOT_RUNNING).  

        若是正常,會顯示「PING_OK」,不然會顯示「NOT_RUNNING」,表明MHA監控尚未開啓。

 

9、開啓MHA Manager監控

      # nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /masterha/app1/manager.log 2>&1 &

      其中,

      remove_dead_master_conf:該參數表明當發生主從切換後,老的主庫的IP將會從配置文件中移除。

      ignore_last_failover:在默認狀況下,MHA發生切換後將會在/masterha/app1下產生app1.failover.complete文件,下次再次切換的時候若是發現該目錄下存在該文件且兩次切換的時間間隔不足8小時的話,將不容許觸發切換。除非在第一次切換後手動rm -rf /masterha/app1/app1.failover.complete。該參數表明忽略上次MHA觸發切換產生的文件。

     查看MHA Manager監控是否正常

# masterha_check_status --conf=/etc/masterha/app1.cnf 
app1 (pid:1873) is running(0:PING_OK), master:192.168.244.10

 

10、 關閉MHA Manager監控

# masterha_stop --conf=/etc/masterha/app1.cnf 
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /masterha/app1/manager.log 2>&1

至此,MHA部分配置完畢,下面,來配置VIP。

 

11、VIP配置

VIP配置能夠採用兩種方式,一是經過引入Keepalived來管理VIP,另外一種是在腳本中手動管理。

對於keepalived管理VIP,存在腦裂狀況,即當主從網絡出現問題時,slave會搶佔VIP,這樣會致使主從數據庫都持有VIP,形成IP衝突,因此在網絡不是很好的狀況下,不建議採用keepalived服務。

在實際生產中使用較多的也是第二種,即在腳本中手動管理VIP,因此,對keepalived不感興趣的童鞋可直接跳過第一種方式。

1. keepalived管理VIP

1> 安裝keepalived

    由於我這裏設置了Candicate master,故只在Master和Candicate master上安裝。

    若是沒有Candicate master,兩個Slave的地位平等,則兩個Slave上都需安裝keepalived。

    # wget http://www.keepalived.org/software/keepalived-1.2.24.tar.gz

    # tar xvf keepalived-1.2.24.tar.gz

    # cd keepalived-1.2.24

    # ./configure --prefix=/usr/local/keepalived

    # make

    # make install

    # cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/rc.d/init.d/

    # cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/

    # mkdir /etc/keepalived

    # cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/

    # cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

2> 爲keepalived設置單獨的日誌文件(非必需)

     keepalived的日誌默認是輸出到/var/log/message中

     # vim /etc/sysconfig/keepalived

KEEPALIVED_OPTIONS="-D -d -S 0"

     設置syslog

     # vim /etc/rsyslog.conf

     添加以下內容:

local0.*           /var/log/keepalived.log

    # service rsyslog restart 

2> 配置keepalived

     在Master上修改

     # vim /etc/keepalived/keepalived.conf

global_defs {
   notification_email {
     slowtech@qq.com 
   }
   notification_email_from root@localhost.localdomain 
   smtp_server 127.0.0.1 
   smtp_connect_timeout 30
   router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    nopreempt
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        192.168.244.188/24
    }
}
View Code

    關於keepalived的參數的詳細介紹,可參考:

    LVS+Keepalived搭建MyCAT高可用負載均衡集羣

    keepalived工做原理和配置說明

    將配置文件scp到Candicate master上

    # scp /etc/keepalived/keepalived.conf 192.168.244.20:/etc/keepalived/

    只需將配置文件中的priority設置爲90

    注意:咱們爲何在這裏設置keepalived爲backup模式呢?

    在master-backup模式下,若是主庫宕掉,VIP會自動漂移到Slave上,當主庫修復,keepalived啓動後,還會將VIP搶過來,即便設置了nopreempt(不搶佔)的方

    式,該動做仍會發生。但在backup-backup模式下,當主庫修改,並啓動keepalived後,並不會搶佔新主的VIP,即使原主的priority高於新主的。

3> 啓動keepalived

     先在Master上啓動

     # service keepalived start

env: /etc/init.d/keepalived: Permission denied

     # chmod +x /etc/init.d/keepalived

     # service keepalived start

     查看綁定狀況

     # ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:c6:47:04 brd ff:ff:ff:ff:ff:ff
    inet 192.168.244.10/24 brd 192.168.244.255 scope global eth0
    inet 192.168.244.188/24 scope global secondary eth0
    inet6 fe80::20c:29ff:fec6:4704/64 scope link 
       valid_lft forever preferred_lft forever
View Code

     可見,VIP(192168.244.188)已經綁定到Master的eth0網卡上了。

     啓動Candicate master的keepalived

     # service keepalived start

4> MHA中引入keepalived

     編輯/usr/local/bin/master_ip_failover

     相對於原文件,修改地方爲93-95行

  1 #!/usr/bin/env perl
  2 
  3 #  Copyright (C) 2011 DeNA Co.,Ltd.
  4 #
  5 #  This program is free software; you can redistribute it and/or modify
  6 #  it under the terms of the GNU General Public License as published by
  7 #  the Free Software Foundation; either version 2 of the License, or
  8 #  (at your option) any later version.
  9 #
 10 #  This program is distributed in the hope that it will be useful,
 11 #  but WITHOUT ANY WARRANTY; without even the implied warranty of
 12 #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 13 #  GNU General Public License for more details.
 14 #
 15 #  You should have received a copy of the GNU General Public License
 16 #   along with this program; if not, write to the Free Software
 17 #  Foundation, Inc.,
 18 #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 19 
 20 ## Note: This is a sample script and is not complete. Modify the script based on your environment.
 21 
 22 use strict;
 23 use warnings FATAL => 'all';
 24 
 25 use Getopt::Long;
 26 use MHA::DBHelper;
 27 my (
 28   $command,        $ssh_user,         $orig_master_host,
 29   $orig_master_ip, $orig_master_port, $new_master_host,
 30   $new_master_ip,  $new_master_port,  $new_master_user,
 31   $new_master_password
 32 );
 33 
 34 GetOptions(
 35   'command=s'             => \$command,
 36   'ssh_user=s'            => \$ssh_user,
 37   'orig_master_host=s'    => \$orig_master_host,
 38   'orig_master_ip=s'      => \$orig_master_ip,
 39   'orig_master_port=i'    => \$orig_master_port,
 40   'new_master_host=s'     => \$new_master_host,
 41   'new_master_ip=s'       => \$new_master_ip,
 42   'new_master_port=i'     => \$new_master_port,
 43   'new_master_user=s'     => \$new_master_user,
 44   'new_master_password=s' => \$new_master_password,
 45 );
 46 
 47 exit &main();
 48 
 49 sub main {
 50   if ( $command eq "stop" || $command eq "stopssh" ) {
 51 
 52     # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
 53     # If you manage master ip address at global catalog database,
 54     # invalidate orig_master_ip here.
 55     my $exit_code = 1;
 56     eval {
 57 
 58       # updating global catalog, etc
 59       $exit_code = 0;
 60     };
 61     if ($@) {
 62       warn "Got Error: $@\n";
 63       exit $exit_code;
 64     }
 65     exit $exit_code;
 66   }
 67   elsif ( $command eq "start" ) {
 68 
 69     # all arguments are passed.
 70     # If you manage master ip address at global catalog database,
 71     # activate new_master_ip here.
 72     # You can also grant write access (create user, set read_only=0, etc) here.
 73     my $exit_code = 10;
 74     eval {
 75       my $new_master_handler = new MHA::DBHelper();
 76 
 77       # args: hostname, port, user, password, raise_error_or_not
 78       $new_master_handler->connect( $new_master_ip, $new_master_port,
 79         $new_master_user, $new_master_password, 1 );
 80 
 81       ## Set read_only=0 on the new master
 82       $new_master_handler->disable_log_bin_local();
 83       print "Set read_only=0 on the new master.\n";
 84       $new_master_handler->disable_read_only();
 85 
 86       ## Creating an app user on the new master
 87       #print "Creating app user on the new master..\n";
 88       #FIXME_xxx_create_user( $new_master_handler->{dbh} );
 89       $new_master_handler->enable_log_bin_local();
 90       $new_master_handler->disconnect();
 91 
 92       ## Update master ip on the catalog database, etc
 93       my $cmd;
 94       $cmd = 'ssh '.$ssh_user.'@'.$orig_master_ip.' service keepalived stop';
 95       system($cmd);
 96       $exit_code = 0;
 97     };
 98     if ($@) {
 99       warn $@;
100 
101       # If you want to continue failover, exit 10.
102       exit $exit_code;
103     }
104     exit $exit_code;
105   }
106   elsif ( $command eq "status" ) {
107 
108     # do nothing
109     exit 0;
110   }
111   else {
112     &usage();
113     exit 1;
114   }
115 }
116 
117 sub usage {
118   print
119 "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
120 }
View Code

 

2. 經過腳本的方式管理VIP

   編輯/usr/local/bin/master_ip_failover

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

my $vip = '192.168.244.188';
my $key = "2";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip/24";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $ssh_send_garp = "/sbin/arping -U $vip -I eth0 -c 1";


GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
      print "Disabling the VIP an old master: $orig_master_host \n";
      &stop_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print "Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      # print "Creating app user on the new master..\n";
      # FIXME_xxx_create_user( $new_master_handler->{dbh} );
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      print "Enabling the VIP $vip on the new master: $new_master_host \n";
      &start_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub start_vip(){
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    `ssh $ssh_user\@$new_master_host \" $ssh_send_garp \"`;
}

sub stop_vip(){
return 0  unless  ($ssh_user); `
ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }

 

實際生產環境中,推薦這種方式來管理VIP,可有效防止腦裂狀況的發生。

 

至此,MHA高可用環境基本搭建完畢。

 

關於MHA的常見操做,包括自動Failover,手動Failover,在線切換,可參考另外一篇博客:

MHA在線切換的步驟和原理

MHA自動Failover與手動Failover的實踐及原理

 

總結:

1. 可單獨調試master_ip_failover,master_ip_online_change,send_report等腳本

 /usr/local/bin/master_ip_online_change --command=stop --orig_master_ip=192.168.244.10 --orig_master_host=192.168.244.10 --orig_master_port=3306 --orig_master_user=monitor --orig_master_password=monitor123 --orig_master_ssh_user=root --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user=monitor --new_master_password=monitor123 --new_master_ssh_user=root

 

/usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123'

 

 2. 官方對於master_ip_failover,master_ip_online_change,send_report腳本,給出的只是sample,切換的邏輯須要本身定義。

     不少童鞋對perl並不熟悉,以爲無從下手,其實,徹底能夠調用其它腳本,譬如python,shell等。

     如:

[root@node4 ~]# cat test.pl
#!/usr/bin/perl
use strict;
my $cmd='python /root/test.py';
system($cmd);

[root@node4 ~]# cat test.py
#!/usr/bin/python
print "hello,python"

[root@node4 ~]# perl test.pl
hello,python

 

參考:

《深刻淺出MySQL》 

 

附:

master_ip_online_change

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;

my $vip = '192.168.244.188';
my $key = "2";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip/24";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $ssh_send_garp = "/sbin/arping -U $vip -I eth0 -c 1";

my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user,
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub start_vip(){
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_send_garp \"`;
}

sub stop_vip(){
    `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}


sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
      $orig_master_handler->disable_log_bin_local();
      # print current_time_us() . " Drpping app user on the orig master..\n";
      #drop_app_user($orig_master_handler);

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

      ## Droping the VIP     
      print "Disabling the VIP an old master: $orig_master_host \n";
      &stop_vip();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      #print current_time_us() . " Creating app user on the new master..\n";
      # create_app_user($new_master_handler);
      print "Enabling the VIP $vip on the new master: $new_master_host \n";
      &start_vip();
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  die;
}
View Code

 

master_ip_failover

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

my $vip = '192.168.244.188';
my $key = "2";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip/24";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $ssh_send_garp = "/sbin/arping -U $vip -I eth0 -c 1";


GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
      print "Disabling the VIP an old master: $orig_master_host \n";
      &stop_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print "Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master
      # print "Creating app user on the new master..\n";
      # FIXME_xxx_create_user( $new_master_handler->{dbh} );
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      print "Enabling the VIP $vip on the new master: $new_master_host \n";
      &start_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub start_vip(){
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    `ssh $ssh_user\@$new_master_host \" $ssh_send_garp \"`;
}

sub stop_vip(){
    return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
View Code

 

masterha_secondary_check

#!/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

use strict;
use warnings FATAL => 'all';

use English qw(-no_match_vars);
use Getopt::Long;
use Pod::Usage;
use MHA::ManagerConst;

my @monitoring_servers;
my (
  $help,        $version,         $ssh_user,  $ssh_port,
  $ssh_options, $master_host,     $master_ip, $master_port,
  $master_user, $master_password, $ping_type
);
my $timeout = 5;

$| = 1;
GetOptions(
  'help'              => \$help,
  'version'           => \$version,
  'secondary_host=s'  => \@monitoring_servers,
  'user=s'            => \$ssh_user,
  'port=s'            => \$ssh_port,
  'options=s'         => \$ssh_options,
  'master_host=s'     => \$master_host,
  'master_ip=s'       => \$master_ip,
  'master_port=i'     => \$master_port,
  'master_user=s'     => \$master_user,
  'master_password=s' => \$master_password,
  'ping_type=s'       => \$ping_type,
  'timeout=i'         => \$timeout,
);

if ($version) {
  print "masterha_secondary_check version $MHA::ManagerConst::VERSION.\n";
  exit 0;
}

if ($help) {
  pod2usage(0);
}

unless ($master_host) {
  pod2usage(1);
}

sub exit_by_signal {
  exit 1;
}
local $SIG{INT} = $SIG{HUP} = $SIG{QUIT} = $SIG{TERM} = \&exit_by_signal;

$ssh_user    = "root" unless ($ssh_user);
$ssh_port    = 22     unless ($ssh_port);
$master_port = 3306   unless ($master_port);

if ($ssh_options) {
  $MHA::ManagerConst::SSH_OPT_CHECK = $ssh_options;
}
$MHA::ManagerConst::SSH_OPT_CHECK =~ s/VAR_CONNECT_TIMEOUT/$timeout/;

# 0: master is not reachable from all monotoring servers
# 1: unknown errors
# 2: at least one of monitoring servers is not reachable from this script
# 3: master is reachable from at least one of monitoring servers
my $exit_code = 0;

foreach my $monitoring_server (@monitoring_servers) {
  my $ssh_user_host = $ssh_user . '@' . $monitoring_server;
  my $command =
"ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \"perl -e "
    . "\\\"use IO::Socket::INET; my \\\\\\\$sock = IO::Socket::INET->new"
    . "(PeerAddr => \\\\\\\"$master_host\\\\\\\", PeerPort=> $master_port, "
    . "Proto =>'tcp', Timeout => $timeout); if(\\\\\\\$sock) { close(\\\\\\\$sock); "
    . "exit 3; } exit 0;\\\" \"";
  my $ret = system($command);
  $ret = $ret >> 8;
  if ( $ret == 0 ) {
    print
"Monitoring server $monitoring_server is reachable, Master is not reachable from $monitoring_server. OK.\n";
    next;
  }
  if ( $ret == 3 ) {
    if ( defined $ping_type
      && $ping_type eq $MHA::ManagerConst::PING_TYPE_INSERT )
    {
      my $ret_insert;
      my $command_insert =
          "ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \'"
        . "/usr/bin/mysql -u$master_user -p$master_password -h$master_host "
        . "-e \"CREATE DATABASE IF NOT EXISTS infra; "
        . "CREATE TABLE IF NOT EXISTS infra.chk_masterha (\\`key\\` tinyint NOT NULL primary key,\\`val\\` int(10) unsigned NOT NULL DEFAULT '0'\) engine=MyISAM; "
        . "INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestamp()\"\'";
      my $sigalrm_timeout = 3;
      eval {
        local $SIG{ALRM} = sub {
          die "timeout.\n";
        };
        alarm $sigalrm_timeout;
        $ret_insert = system($command_insert);
        $ret_insert = $ret_insert >> 8;
        alarm 0;
      };
      if ( $@ || $ret_insert != 0 ) {
        print
"Monitoring server $monitoring_server is reachable, Master is not writable from $monitoring_server. OK.\n";
        next;
      }
    }
    print "Master is reachable from $monitoring_server!\n";
    $exit_code = 3;
    last;
  }
  else {
    print "Monitoring server $monitoring_server is NOT reachable!\n";
    $exit_code = 2;
    last;
  }
}

exit $exit_code;

# ############################################################################
# Documentation
# ############################################################################

=pod

=head1 NAME

masterha_secondary_check - Checking master availability from additional network routes

=head1 SYNOPSIS

masterha_secondary_check -s secondary_host1 -s secondary_host2 .. --user=ssh_username --master_host=host --master_ip=ip --master_port=port

See online reference (http://code.google.com/p/mysql-master-ha/wiki/Parameters#secondary_check_script) for details.

=head1 DESCRIPTION

See online reference (http://code.google.com/p/mysql-master-ha/wiki/Parameters#secondary_check_script) for details.
View Code

 

send_report

#!/usr/bin/perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.126.com';
my $mail_from='slowtech@126.com';
my $mail_user='slowtech@126.com';
my $mail_pass='xxxxx';
my $mail_to=['slowtech@126.com'];
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };

    $sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}



# Do whatever you want here
exit 0;
View Code
相關文章
相關標籤/搜索