MHA官方文檔翻譯

時間 2019-11-30

標籤 mha 官方文檔翻譯简体版

原文原文鏈接

英文官方文檔node

http://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6mysql

轉載請註明出處sql

Overview

MHA可以在較短的時間內實現自動故障檢測和故障轉移，一般在10-30秒之內;在複製框架中，MHA可以很好地解決複製過程當中的數據一致性問題，因爲不須要在現有的replication中添加額外的服務器，僅須要一個manager節點，而一個Manager能管理多套複製，因此能大大地節約服務器的數量;另外，安裝簡單，無性能損耗，以及不須要修改現有的複製部署也是它的優點之處。安全

MHA還提供在線主庫切換的功能，可以安全地切換當前運行的主庫到一個新的主庫中(經過將從庫提高爲主庫),大概0.5-2秒內便可完成。服務器

MHA提供了上述功能，使得其在適用於對高可用性，數據完整性要求高的場合，還有要求幾乎non-stop的主庫維護。網絡

◎自動故障檢測和自動故障轉移

MHA可以在一個已經存在的複製環境中監控MySQL，當檢測到Master故障後可以實現自動故障轉移，經過鑑定出最「新」的Salve的relay log，並將其應用到全部的Slave，這樣MHA就可以保證各個slave之間的數據一致性，即便有些slave在主庫崩潰時尚未收到最新的relay log事件。一般狀況下MHA可以達到以下指標：9-12秒檢測到主庫故障，7-10秒關閉master所在的mysqld服務以防止故障擴散，並在幾秒內實現各個slave上的relay log重放到新的master。總共的down time一般控制在10-30秒內。一個slave節點可否成爲候選的主節點可經過在配置文件中配置它的優先級。因爲master可以保證各個slave之間的數據一致性，因此全部的slave節點都有但願成爲主節點。在一般的replication環境中因爲複製中斷而極容易產生的數據一致性問題，在MHA中將不會發生。app

◎交互式（手動）故障轉移

MHA能夠被定義成手動地實現故障轉移，而沒必要去理會master的狀態，即不監控master狀態，確認故障發生後可經過MHA手動切換。框架

◎非交互式的故障轉移

即不監控Master狀態，可是發生故障後可經過MHA實現自動轉移。ssh

◎在線切換Master到不一樣的主機

一般當RAID控制器或者RAM損壞，或者須要將現有的master服務器進行升級的時候，咱們就須要切換當前的master到其餘的主機中。這並非主庫崩潰，可是卻須要咱們手動切換。這一般是越快越好，由於這段時間內主庫是寫禁止的。因此，你還須要阻塞或刪除正在進行的會話，由於不由止寫就會致使數據一致性問題。舉個例子，updating master1, updating master 2,committing master1, getting error on committing master 2就會致使數據一致性問題。因此說，快速的切換和優美平滑的阻塞寫都是須要的。異步

MHA可以在0.5-2秒內實現切換，0.5-2秒的寫阻塞一般是可接受的，因此你甚至能在非維護期間就在線切換master。諸如升級到高版本，升級到更快的服務器之類的工做，將會變得更容易。

Architecture of MHA

當主庫發生崩潰，MHA經過如下方式修復

關於MHA如何修復一致性問題，詳細請查看以下連接地址，這裏我不作詳細研究

http://www.slideshare.net/matsunobu/automated-master-failover

MHA Components

MHA由Manager節點和Node節點組成。

Manaer模塊：能夠管理多套Master-Slave Replication

Masterha_manager：提供實現自動故障檢測和故障轉移的命令

其餘幫助腳本：提供手工故障轉移，在線master切換，con 檢查等功能

Node模塊：部署在全部的MySQL Server上

Save_binary_logs:若有必要，複製master的二進制日誌

Apply_diff_relay_logs:從數據最新的slave上產生不一樣的relay log，而且將其應用到不一樣的binlog events中

Purge_relay_log：清除relay log

MHA manager節點上運行着這些程序：監控mysql狀態，master故障轉移等。

MHA node節點上有實現自動故障轉移的helper腳本，好比分析mysql binary/relay log，認出哪個relay log應該應用到其餘的slave，並識別出這個relay log的位置，並將events應用到目標slave上等。Node節點應該運行在每個mysql server上。

若是MHA Manager掛掉了，MHA會嘗試經過SSH鏈接到node節點並執行node節點的命令

Advantages of MHA

這一節簡略介紹，大體內容在上面的敘述中已經有提到。

1 Masterfailover and slave promotion can be done very quickly

自動故障轉移快

2 Mastercrash does not result in data inconsistency

主庫崩潰不存在數據一致性問題

3 Noneed to modify current MySQL settings (MHA works with regular MySQL (5.0 orlater))

不須要對當前mysql環境作重大修改

4 Noneed to increase lots of servers

不須要添加額外的服務器(僅一臺manager就可管理上百個replication)

5 Noperformance penalty

性能優秀，可工做在半同步複製和異步複製，當監控mysql狀態時，僅須要每隔N秒向master發送ping包(默認3秒)，因此對性能無影響。你能夠理解爲MHA的性能和簡單的主從複製框架性能同樣。

6 Works with any storage engine

只要replication支持的存儲引擎，MHA都支持，不會侷限於innodb

Typical Use cases

怎麼部署Manager節點

◎設置一個專門的Manager Server和多個Replication環境

因爲MHA manager僅僅使用了很是少的cpu和內存資源，因此你可讓一個manager管理不少個replication，甚至超過100個replication

◎Manager節點和一個salve節點複用

假如你只有一個replication環境，並且你可能不喜歡爲配置一個專門的manager而花費了更多的硬件開銷，那麼你可讓manager和一個slave節點複用。值得注意的是，若是這麼配置了，儘管manager和slave在同一臺機子上了，可是manger依舊經過SSH鏈接到slave，因此你依舊須要配置SSH無密碼登錄。

複製配置（這一部分簡略翻譯）

Singlemaster, multiple slaves

一主多從，這是最廣泛的狀況。

Singlemaster, multiple slaves (one on remote datacenter)

一主多從，將其中一個從配置成遠程數據中心，其永遠不會成爲master

Singlemaster, multiple slaves, one candidate master

一主多從，並只配置一個候選主節點

Multiplemasters, multiple slaves

Threetier replication

管理MasterIP地址

HA方案中，不少狀況下人們會在master上綁定一個虛擬IP。當master崩潰的時候，軟件好比Keepalived會將虛擬IP從新指向正常的Server。

通用的方法就是建立一個全局的目錄庫，在庫中存放着全部應用和IP地址之間的映射關係，用以取代VIP。在這種方案下，若是master崩潰，那麼你就須要修改這個目錄庫。

兩種方案都各有優缺點,MHA不會強制使用哪種。MHA能夠調用其餘的腳原本禁用\激活write ip地址，經過設置master_ip_failover_script 腳本的參數，該腳本可在manager節點中找到。你能夠在該腳本中更新目錄庫，或者實現VIP漂移等任何你想幹的事。你一樣能夠借用現有的HA方案的軟件實現IP故障轉移，好比Pacemaker，在這種狀況下MHA將不會作IP故障轉移。

和MySQL半同步複製配合使用

儘管MHA試圖從崩潰的master上保存binarylog，但這並不老是可行的。例如，若是master是由於H/W故障或者是SSH故障，則MHA沒法保存binlog，從而沒法應用僅存在master上的binlog進行故障轉移，這將會致使丟失最近的數據。

使用半同步複製能夠極大地減小這種丟失數據的風險。因爲它也是基於mysql的複製機制，因此MHA可以配合半同步複製一塊兒使用。值得一提的是，只要有一臺slave收到最新的binlog events，則MHA就會將它應用到全部的slave，從而保證了數據的一致性。

Tutorial

建立通用的複製環境

MHA不會本身建立replication環境，因此你須要本身手動搭建。換句話說，你能夠將MHA部署在現有的複製環境中。舉個例子，假設有四臺主機：host1，host2，host3，host4.咱們將host1配置成master，host2和host3配置成slave，而host4配置成manager

在host1-host4上安裝node節點

RHEL/Centos系統

 ## If you have not installed DBD::mysql, install it like below, or install from source.
  # yum install perl-DBD-MySQL

  ## Get MHA Node rpm package from "Downloads" section.
  # rpm -ivh mha4mysql-node-X.Y-0.noarch.rpm

Ubuntu/Debian系統

## If you have not installed DBD::mysql, install it like below, or install from source.
  # apt-get install libdbd-mysql-perl

  ## Get MHA Node deb package from "Downloads" section.
  # dpkg -i mha4mysql-node_X.Y_all.deb

源碼安裝

  ## Install DBD::mysql if not installed
  $ tar -zxf mha4mysql-node-X.Y.tar.gz
  $ perl Makefile.PL
  $ make
  $ sudo make install

在host4上安裝manager節點

MHA的manager節點提供masterha_manager,masterha_master_switch等命令行的功能，依賴與Perl模塊。在安裝manager節點以前，你須要安裝如下prel模塊，另外別忘了在manager節點安裝node節點。

MHA Node package
DBD::mysql
Config::Tiny
Log::Dispatch
Parallel::ForkManager
Time::HiRes (included from Perl v5.7.3)

RHEL/Centos系統

## Install dependent Perl modules
  # yum install perl-DBD-MySQL
  # yum install perl-Config-Tiny
  # yum install perl-Log-Dispatch
  # yum install perl-Parallel-ForkManager

  ## Install MHA Node, since MHA Manager uses some modules provided by MHA Node.
  # rpm -ivh mha4mysql-node-X.Y-0.noarch.rpm

  ## Finally you can install MHA Manager
  # rpm -ivh mha4mysql-manager-X.Y-0.noarch.rpm

Ubuntu/Debian系統

  ## Install dependent Perl modules
  # apt-get install libdbd-mysql-perl
  # apt-get install libconfig-tiny-perl
  # apt-get install liblog-dispatch-perl
  # apt-get install libparallel-forkmanager-perl

  ## Install MHA Node, since MHA Manager uses some modules provided by MHA Node.
  # dpkg -i mha4mysql-node_X.Y_all.deb

  ## Finally you can install MHA Manager
  # dpkg -i mha4mysql-manager_X.Y_all.deb

源碼安裝

  ## Install dependent Perl modules
  # MHA Node (See above)
  # Config::Tiny
  ## perl -MCPAN -e "install Config::Tiny"
  # Log::Dispatch
  ## perl -MCPAN -e "install Log::Dispatch"
  # Parallel::ForkManager 
  ## perl -MCPAN -e "install Parallel::ForkManager"
  ## Installing MHA Manager
  $ tar -zxf mha4mysql-manager-X.Y.tar.gz
  $ perl Makefile.PL
  $ make
  $ sudo make install

建立配置文件

下一步就是建立manager的配置文件，參數主要包括mysql server的用戶名，密碼，複製帳戶的用戶名和密碼，工做目錄等。全部的參數列表詳見parameter表。

manager_host$ cat /etc/app1.cnf
  
  [server default]
  # mysql user and password
  user=root
  password=mysqlpass
  ssh_user=root
  # working directory on the manager
  manager_workdir=/var/log/masterha/app1
  # working directory on MySQL servers
  remote_workdir=/var/log/masterha/app1
  
  [server1]
  hostname=host1
  
  [server2]
  hostname=host2
  
  [server3]
  hostname=host3

注意到host1是當前的master，MHA會自動檢測到它。

檢查SSH鏈接

MHA manager經過SSH訪問全部的node節點，各個node節點也一樣須要經過SSH來相互發送不一樣的relay log 文件，因此有必要在每個node和manager上配置SSH無密碼登錄。MHAmanager可經過masterha_check_ssh腳本檢測SSH鏈接是否配置正常。

# masterha_check_ssh --conf=/etc/app1.cnf
  
  Sat May 14 14:42:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  Sat May 14 14:42:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Starting SSH connection tests..
  Sat May 14 14:42:19 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:20 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [info] All SSH connection tests passed successfully.

若是有報錯，則表示SSH配置有問題，影響MHA工做。你須要修復它並重試，一般的錯誤都是SSH public key認證沒有正確配置。

檢查複製配置

爲了讓MHA正常工做，全部的master和slave必須在配置文件中正確配置，MHA可經過masterha_check_repl 腳本檢測複製是否正確配置。

  manager_host$ masterha_check_repl --conf=/etc/app1.cnf
  ...
  MySQL Replication Health is OK.

若是有報錯，可經過查看日誌修復它。當前的master必定不能是slave，其餘全部的slave必須正確從master中複製。常見的錯誤可參考 TypicalErrors 頁。

開啓Manager

當你正確配置了mysql複製，正確安裝了manager和node節點，SSH配置也正確，那麼下一步就是開啓manager，可經過 masterha_manager 命令開啓

  manager_host$ masterha_manager --conf=/etc/app1.cnf
  ....
  Sat May 14 15:58:29 2011 - [info] Connecting to the master host1(192.168.0.1:3306) and sleeping until it doesn't respond..

若是全部的配置都正確，masterha_manager會檢查mastermaster是否可用直到master崩潰。若是在監控master以前masterha_manager報錯，你能夠檢查下logs並修改配置。全部的日誌都會以標準錯誤的方式打印出來，也能夠在manager配置文件中指定錯誤日誌位置。典型的錯誤有複製配置問題，ssh無訪問relay log的權限問題。默認狀況下masterha_manager不是運行在後臺，按下crtl+c鍵就會終止masterha_manager。

檢查manager狀態

當MHA manager啓動監控之後，若是沒有異常則不會打印任何信息。咱們可經過masterha_check_status命令檢查manager的狀態，如下是範例

manager_host$ masterha_check_status --conf=/etc/app1.cnf
  app1 (pid:5057) is running(0:PING_OK), master:host1

app1是MHA內部的應用名稱，該名稱可在manager配置文件中指定，若是manager終止或者配置得有錯誤，將會顯示如下信息

  manager_host$ masterha_check_status --conf=/etc/app1.cnf
  app1 is stopped(1:NOT_RUNNING).

終止manager

你能夠經過 masterha_stop命令來中止manager

manager_host$ masterha_stop --conf=/etc/app1.cnf
  Stopped app1 successfully.

若是沒法中止，嘗試加--abort參數，知道了怎麼中止，下面咱們從新開啓manager。

測試master的自動故障轉移

如今master運行正常，manager監控也正常，下一步就是中止master，測試自動故障轉移，你能夠簡單地中止master上的mysqld服務

  host1$  killall -9 mysqld mysqld_safe

這時候檢查manager的log日誌，看看host2是否成功成爲新的master，而且host3從host2中複製。

當完成一次正常的故障轉移後，manager進程將會終止。若是你須要將manager進程運行在後臺，可運行以下指令，或者經過安裝daemontools來實現(這裏略)

manager_host$ nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &

Writing an application configuration file

爲了MHA正常運行，你須要建立一個配置文件並設置參數，參數主要包括每一個mysql進程所在的服務器的用戶名和密碼，mysql服務的用戶名和密碼，工做目錄等等。整個參數列表設置詳細請見Parameters 頁。

下面是一個配置文件的設置範例

manager_host$ cat /etc/app1.cnf

  [server default]
  # mysql user and password
user=root
password=mysqlpass
  # working directory on the manager
manager_workdir=/var/log/masterha/app1
  # manager log file
manager_log=/var/log/masterha/app1/app1.log
  # working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

  [server1]
hostname=host1

  [server2]
hostname=host2

  [server3]
hostname=host3

全部的參數設置必須是"param=value"格式，打個比方，如下設置時錯誤的。

[server1]
hostname=host1
# incorrect: must be"no_master=1"
no_master

Application-scope參數必須寫在[server default]塊下，而在 [serverN]塊下，你須要設置的是local-scope參數,好比hostname是一個local-scope參數，因此必須寫在這個塊下面。塊名稱必須是字母」server」開頭。

Writing a global configuration file

若是你計劃只用一臺manager管理兩個或以上的master-slave對，那麼建議你建立一個全局配置文件，這樣你就不須要爲每個複製都配置相同的參數。若是你建立了一個文件/etc/masterha_default.cnf，那麼它默認就是全局配置文件。

你能夠在全局配置文件中設置application scope參數，例如，若是全部的mysql服務器的管理帳戶和密碼都是同樣的，你就能夠在這裏設置user和password

如下是全局配置文件範例

Global configuration file (/etc/masterha_default.cnf)

[serverdefault]
user=root
password=rootpass
ssh_user=root
master_binlog_dir= /var/lib/mysql
remote_workdir=/data/log/masterha
secondary_check_script= masterha_secondary_check-s remote_host1 -s remote_host2
ping_interval=3
master_ip_failover_script=/script/masterha/master_ip_failover
shutdown_script= /script/masterha/power_manager
report_script= /script/masterha/send_master_failover_mail

以上這些參數可適用於全部的applications。

Application配置文件應該被單獨配置，如下是app1(host1-4)和app2(host11-14)的範例

app1:

manager_host$ cat /etc/app1.cnf

  [server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log

  [server1]
hostname=host1
candidate_master=1

  [server2]
hostname=host2
candidate_master=1

  [server3]
hostname=host3

  [server4]
hostname=host4
no_master=1

app2:

manager_host$ cat /etc/app2.cnf

  [server default]
manager_workdir=/var/log/masterha/app2
manager_log=/var/log/masterha/app2/app2.log

  [server1]
hostname=host11
candidate_master=1

  [server2]
hostname=host12
candidate_master=1

  [server3]
hostname=host13

  [server4]
hostname=host14
no_master=1

Requirements and Limitations

1 這一部分作簡要翻譯，安裝MHA的依賴和限制

2 SSH public key認證

3 僅支持Liunx操做系統

4 只有一臺master能被設置成readonly=0，其餘設置爲只讀

5若是是Master1 -> Master2-> Slave3這樣的三級複製框架，在配置文件中只須要設置master1和master2這樣的二級複製結構，並設置multi_tier_slave=1來支持三級複製結構。

6 MHA僅支持mysql 5.0及之後的版本

7 mysqlbinlog必須是3.3及以上版本

8 log-bin必須在每個可稱爲master的mysql服務器上設置

9 全部mysql服務器的複製過濾規則必須一致

10 必須在能成爲master的服務器上設置複製帳戶

11全部Mysql服務器上必須設置relay_log_purge=1，使得可以保存一段時間的relay log

12 基於語句的複製時，不要使用load datainfile命令

What MHA does on monitoring and failover

這一部分不少內容與上述重複，我只作簡要翻譯，在監控和故障轉移過程當中，MHA主要作了如下幾項工做

Verifying replicationsettings and identifying the current master

覈實複製配置並識別出當前的master

Monitoring the masterserver
Detecting the masterserver failure
Verifying slaveconfigurations again
Shutting down failedmaster server (optional)
Recovering a newmaster
Activating the newmaster
Recovering the restslaves
Notifications(optional)

監控master server直到master崩潰，在這一步時manager再也不監控slave的狀態。因此若是須要添加或刪除slave節點，最好從新修改manager配置文件並重啓MHA

檢測到master故障

從新掃描配置文件，各類重連，覈實master確實已經崩潰。若是最近一次的報錯和如今同樣而且時間相隔很是之短，MHA將會中止繼續報錯並進入下一步

關閉崩潰的主機(可選)，防止錯誤繼續擴散

從新選舉出一個新的master。若是崩潰的主機可以經過SSH鏈接，則複製崩潰主機的binlog到最新的slave上，並指向他的end_log_pos。在選擇新的master上遵照manager上的配置文件，若是某個slave能成爲master，則設置candidate_master=1。若是某個slave永遠不能成爲master，則設置no_master=1。識別出最新的slave並將其選舉爲新的master，最新的slave即接受到最新的relay log的那臺slave。

激活新的master

從新設置其他的slave使其指向新選舉出來的master

發送通告（可選），好比發送郵件，禁用新master上backup工做等，可經過 report_script腳本設置

What MHA does on online(fast) master switch

簡要翻譯，在線master切換過程當中，MHA主要作了如下工做

Verifying replication settings and identifying the current master
Identifying the new mater
Rejecting writes on the current master
Waiting for all slaves to catch up replication
Granting writes on the new master
Switching replication on all the rest slaves

覈實複製配置並識別出當前的master，這個過程還會檢測如下幾個條件是否知足：

Slave上的IO線程is running

Salve上的SQL線程is running

Slave上全部的複製延遲少於2s

在master上的update操做沒有超過2秒的

識別出新的master

在當前master上執行FLUSHTABLES WITH READ LOCK阻塞寫操做防止數據一致性問題

等待全部的slave的複製跟上master

在新的master上執行SHOW MASTER STATUS，記錄下binlog文件名稱和pos，並執行SET GLOBAL read_only=0受權其寫操做

在其餘salve上並行執行CHANGE MASTER, START SLAVE，指向新的master，並start slave

Parameters

MHA manager配置參數列表以下

Parameter Name	Required?	Parameter Scope	Default Value	Example
hostname	Yes	Local Only	-	hostname=mysql_server1, hostname=192.168.0.1, etc
ip	No	Local Only	gethostbyname($hostname)	ip=192.168.1.3
port	No	Local/App/Global	3306	port=3306
ssh_host	No	Local Only	same as hostname	ssh_host=mysql_server1, ssh_host=192.168.0.1, etc
ssh_ip	No	Local Only	gethostbyname($ssh_host)	ssh_ip=192.168.1.3
ssh_port	No	Local/App/Global	22	ssh_port=22
ssh_connection_timeout	No	Local/App/Global	5	ssh_connection_timeout=20
ssh_options	No	Local/App/Global	""(empty string)	ssh_options="-i /root/.ssh/id_dsa2"
candidate_master	No	Local Only	0	candidate_master=1
no_master	No	Local Only	0	no_master=1
ignore_fail	No	Local Only	0	ignore_fail=1
skip_init_ssh_check	No	Local Only	0	skip_init_ssh_check=1
skip_reset_slave	No	Local/App/Global	0	skip_reset_slave=1
user	No	Local/App/Global	root	user=mysql_root
password	No	Local/App/Global	""(empty string)	password=rootpass
repl_user	No	Local/App/Global	Master_User value from SHOW SLAVE STATUS	repl_user=repl
repl_password	No	Local/App/Global	- (current replication password)	repl_user=replpass
disable_log_bin	No	Local/App/Global	0	disable_log_bin=1
master_pid_file	No	Local/App/Global	""(empty string)	master_pid_file=/var/lib/mysql/master1.pid
ssh_user	No	Local/App/Global	current OS user	ssh_user=root
remote_workdir	No	Local/App/Global	/var/tmp	remote_workdir=/var/log/masterha/app1
master_binlog_dir	No	Local/App/Global	/var/lib/mysql	master_binlog_dir=/data/mysql1,/data/mysql2
log_level	No	App/Global	info	log_level=debug
manager_workdir	No	App	/var/tmp	manager_workdir=/var/log/masterha
client_bindir	No	App	-	client_bindir=/usr/mysql/bin
client_libdir	No	App	-	client_libdir=/usr/lib/mysql
manager_log	No	App	STDERR	manager_log=/var/log/masterha/app1.log
check_repl_delay	No	App/Global	1	check_repl_delay=0
check_repl_filter	No	App/Global	1	check_repl_filter=0
latest_priority	No	App/Global	1	latest_priority=0
multi_tier_slave	No	App/Global	0	multi_tier_slave=1
ping_interval	No	App/Global	3	ping_interval=5
ping_type	No	App/Global	SELECT	ping_type=CONNECT
secondary_check_script	No	App/Global	null	secondary_check_script= masterha_secondary_check -s remote_dc1 -s remote_dc2
master_ip_failover_script	No	App/Global	null	master_ip_failover_script=/usr/local/custom_script/master_ip_failover
master_ip_online_change_script	No	App/Global	null	master_ip_online_change_script= /usr/local/custom_script/master_ip_online_change
shutdown_script	No	App/Global	null	shutdown_script= /usr/local/custom_script/master_shutdown
report_script	No	App/Global	null	report_script= /usr/local/custom_script/report
init_conf_load_script	No	App/Global	null	report_script= /usr/local/custom_script/init_conf_loader

Local Scope: Per-server scope parameters. Local scope parameters should be set under [server_xxx] blocks within application configuration file.
App Scope: Parameters for each {master, slaves} pair. These parameters should be set under a [server_default] block withinapplication configuration file.
Global Scope: Parameters for all {master, slaves} pairs. Global scope parameters are useful only when you manage multiple {master, slaves} pairs from single manager server. These parameters should be set in a global configuration file.

hostname

Hostname or IP address of the target MySQL server. This parameteris mandatory, and must be configured under [server_xxx]blockswithin applicationconfiguration file.

MySQL服務器的主機名稱或IP地址，寫在[server_xxx]下，xxx至關於各個mysql服務器。

ip

IP address of the target MySQL server. Default isgethostbyname($hostname). MHA Manager and MHA Node internally uses this IPaddress to connect via MySQL and SSH. Normally you don't need to configure thisparameter because it's automatically resolved from hostname parameter.

一般不須要配置

port

Port number of the target MySQL server. Default is 3306. MHAconnects to MySQL servers by using IP address and port.

Mysql服務的端口號，默認3306.

ssh_host

Ssh所在服務器，默認和hostname同樣，不須要配置。

ssh_ip

(Supported from 0.53) IP address of the target MySQL server thatis used via SSH. Default is gethostbyname($ssh_host).一般不用配置

ssh_port

(Supported from 0.53) Port number of the target MySQL server usedvia SSH. Default is 22.

ssh_connection_timeout

(Supported from 0.54) Default is 5 seconds. Before adding thisparameter timeout was hard coded.

ssh_options

(Supported from 0.53) Additional SSH command line options.

candidate_master

在[server_xxx]下配置，值爲1表明該mysql能夠成爲master，若是有兩個以上mysql都設置爲1，那麼誰寫在前面，誰的優先級就高。

no_master

設置爲1表明該mysql永遠沒法成爲master，一般在RAID0或者遠程數據中心設置該mysql的no_master爲1，或者manager和slave複用的主機上也這麼設置。

ignore_fail

默認狀況下，manager在slave出現故障的時候不會自動故障轉移，好比SSH鏈接或者SQL線程有問題等。若是設置爲1則該slave出現故障時會自動切換

skip_init_ssh_check

跳過初始化過程當中的ssh檢查

skip_reset_slave

0.56版本後支持當master崩潰，跳過執行resetslave

user

mysql管理帳戶，最好是root帳戶，默認也就是root帳戶

password

user對應的mysql帳戶密碼

repl_user

複製帳戶，一般不用設置

repl_password

複製帳戶對應的密碼，一般不用設置

disable_log_bin

若是這個選項被設置，那麼當將不一樣的relay log應用到各個slave的過程當中，slave不產生binlog

master_pid_file

設置master的pid文件，一般不用設置

ssh_user

默認是當前登錄manager的OS的用戶，須要擁有讀取mysql binlog和relay log的權限

remote_workdir

每個MHA node節點產生log文件的目錄，若是不存在MHA會自動建立，須要給出相應目錄的權限，默認在/var/tmp,最好本身指定

master_binlog_dir

master產生binlog文件的目錄，最好本身指定，由於當master崩潰後，若是master還能連通SSH，就會複製其binlog，默認路徑爲/var/lib/mysql.

log_level

一般不用設置,表示日誌級別

manager_workdir

manager產生自身狀態的文件的目錄，默認/var/tmp

client_bindir

If MySQL command line utilities are installed under a non-standarddirectory, use this option to set the directory.

client_libdir

If MySQL libraries are installed under a non-standard directory,use this option to set the directory.

manager_log

Manager日誌的全路徑名稱，若未設置，默認輸出到STDOUT/STDERR；
若是手動故障切換時，MHA則忽略參數設置，而直接輸出到STDOUT/STDERR。

Full path file name that MHA Manager generates logs. If not set,MHA Manager prints to STDOUT/STDERR. When executing manual failover(interactive failover), MHA Manager ignores manager_log setting and alwaysprints to STDOUT/STDERR.

check_repl_delay

默認狀況下，若是某個slave的複製延遲超過100MB，MHA則不會使其成爲新的master，由於這須要很長的時間來恢復。若是設爲0，MHA在選舉新的master時會忽略複製延遲
若設置該參數爲0，MHA在選擇新的Master時，會忽略複製延遲。當某個mysql設置candidate_master=1時，再將check_repl_delay設置爲0就頗有必要，確保它能成爲新的master

check_repl_filter

檢查複製過濾，默認狀況下若是master和slave擁有不一樣的過濾規則就會報錯，經過設置爲0能夠忽略複製過濾檢查，固然你得特別當心，確保沒有問題。

latest_priority

默認狀況下MHA在master崩潰後，選舉複製延遲最低的slave爲新的master，但容許你本身控制每一個slave成爲主節點的優先級和順序，經過設置該參數爲0，並由寫入candidate_master=1的mysql服務器順序決定。

multi_tier_slave

從0.52版本開始，MHA支持多級複製配置。默認狀況下，不容許設置三層以上的複製結構，好比h2從h1複製，而h3又從h2複製，MHA將會報錯。經過設置multi_tier_slave參數，則h1崩潰後，h2被選舉爲新的master，而h3依舊從h2複製

ping_interval

這個參數指定了MHA manager應該多長時間執行ping SQL一次去鏈接master，當超過三次鏈接不上master，manager將斷定master已經死亡。默認3秒ping一次，因此，總的檢測時間大概就是12秒。若是因爲鏈接錯誤或者鏈接數過多而致使的錯誤不會計入master死亡統計。

ping_type

0.53版本默認鏈接到master並執行select 1，即ping_type=SELECT。可是在某些場合，更好地方式是經過建立鏈接後又斷開鏈接的方式，由於這個更加嚴格，而且能更快地發現tcp鏈接問題，即ping_type=CONNECT。從5.6版本之後還支持ping_type=INSERT

secondary_check_script

一般狀況下，咱們建議使用兩個或以上的網絡路由來檢測master是否存活。但默認狀況下，manager僅經過單個路由來檢查，即from Manager節點to Master節點。MHA實際上能夠支持多個路由檢測，只要經過調用額外的腳本masterha_check_script便可，下面是範例。

  secondary_check_script = masterha_secondary_check -s remote_host1 -s remote_host2

masterha_secondary_check腳本在manager節點上，一般狀況下可以運行良好。

在這個範例中，MHA經過Manager-(A)->remote_host1-(B)->master_host

和Manager-(A)->remote_host2-(B)->master_host來檢測master狀態。若是在上述兩步中都是A鏈接成功而B鏈接不成功，則MHA可以判斷是master確實已經死亡並返回0，進行故障切換。若是A鏈接不成功，該腳本會返回2，MHA認爲多是自身的網絡問題而不進行故障轉移。若是此時B鏈接成功，則實際上master是存活的。通俗地說，remote_host1和remote_host2應該被設置在不一樣的網段上。

該腳本在通用場合中都適用，固然你也能夠本身寫腳原本實現更多的功能。下面是該腳本的參數列表。

--user=(SSH username of the remote hosts. ssh_user parameter value will be passed)
--master_host=(master's hostname)
--master_ip=(master's ip address)
--master_port=(master's port number)

注意該腳本須要依賴於IO::Socket::INET Perl包，Perl v5.6.0中默認已經包括。而該腳本容許鏈接任何一個遠程服務器，因此須要配置SSH public key。而且，該腳本嘗試創建遠程服務器到master的tcp鏈接，意味着若是tcp鏈接成功，則mysql配置文件中的max_connections設置不受影響，而aborts_connects的值會自動加1

master_ip_failover_script

HA方案中，不少狀況下人們會在master上綁定一個虛擬IP。當master崩潰的時候，軟件好比Keepalived會將虛擬IP從新指向正常的Server。

都各有優缺點,MHA不會強制使用哪種，容許用戶使用任何的ip漂移技術。master_ip_failover_script 腳本能用於該目的。換句話說，你須要本身寫腳本實現應用層鏈接到新的master，而且必須定義master_ip_failover_script 腳本參數，下面是使用範例

  master_ip_failover_script= /usr/local/sample/bin/master_ip_failover

MHA Manager須要調用3次該腳本，第一次是在啓動監控master以前(檢查腳本是否可用)，，第二次是在調用shutdown_script腳本以前，而第三次是在新的Master應用完全部的
relay logs以後。MHA Manager會傳遞以下參數(這些參數不須要你本身配置)：

Checking phase
- --command=status
- --ssh_user=(current master's ssh username)
- --orig_master_host=(current master's hostname)
- --orig_master_ip=(current master's ip address)
- --orig_master_port=(current master's port number)

Current master shutdown phase
- --command=stop or stopssh
- --ssh_user=(dead master's ssh username, if reachable via ssh)
- --orig_master_host=(current(dead) master's hostname)
- --orig_master_ip=(current(dead) master's ip address)
- --orig_master_port=(current(dead) master's port number)

New master activation phase
- --command=start
- --ssh_user=(new master's ssh username)
- --orig_master_host=(dead master's hostname)
- --orig_master_ip=(dead master's ip address)
- --orig_master_port=(dead master's port number)
- --new_master_host=(new master's hostname)
- --new_master_ip=(new master's ip address)
- --new_master_port(new master's port number)
- --new_master_user=(new master's user)
- --new_master_password(new master's password)

若是你在master上使用了VIP，當master關閉階段你可能不須要作任何事，只要你可以讓VIP漂移到新的master。若是你使用的目錄庫方案，你可能須要刪除或更新在master上的記錄。在新的master激活階段，你能夠在新的master上插入/更新一條記錄。而且，你能夠作任何事使得應用層可以向新master中插入數據，好比設置read_only=0,建立用戶的寫權限等。

MHA manager會檢查這個腳本返回的運行結果，若是返回0或10，則MHA manager繼續運行。若是返回的不是0或10，mangaer就會終止。默認參數空置，因此MHA manager不會作任何事。

master_ip_online_change_script

這個和master_ip_failover_script參數類似，但它並非用在master故障切換上，而是用在master在線手動切換命令上，傳遞參數過程以下

Current master write freezing phase
- --command=stop or stopssh
- --orig_master_host=(current master's hostname)
- --orig_master_ip=(current master's ip address)
- --orig_master_port=(current master's port number)
- --orig_master_user=(current master's user)
- --orig_master_password=(current master's password)
- --orig_master_ssh_user=(from 0.56, current master's ssh user)
- --orig_master_is_new_slave=(from 0.56, notifying whether the orig master will be new slave or not)

New master granting write phase
- --command=start
- --orig_master_host=(orig master's hostname)
- --orig_master_ip=(orig master's ip address)
- --orig_master_port=(orig master's port number)
- --new_master_host=(new master's hostname)
- --new_master_ip=(new master's ip address)
- --new_master_port(new master's port number)
- --new_master_user=(new master's user)
- --new_master_password=(new master's password)
- --new_master_ssh_user=(from 0.56, new master's ssh user)

shutdown_script

你或許但願強制關閉master所在的服務器，這樣就能夠防止災難擴散，如下是範例

  shutdown_script= /usr/local/sample/bin/power_manager

MHA manager包中有一個範例腳本，在調用該命令前，MHA內部會檢查master可否經過SSH鏈接。若是可鏈接(OS存活可是mysqld服務終止)，MHA manager傳遞以下參數

--command=stopssh
--ssh_user=(ssh username so that you can connect to the master)
--host=(master's hostname)
--ip=(master's ip address)
--port=(master's port number)
--pid_file=(master's pid file)

If the master is not reachable via SSH, MHA Manager passes thefollowing arguments.

--command=stop
--host=(master's hostname)
--ip=(master's ip address)

該腳本以以下方式運行。若是--command=stopssh被經過，則該腳本會經過ssh在mysqld和mysqld_safe進程上執行kill -9操做。若是—pid_file一樣被經過，該腳本就會嘗試只殺死代理的進程，而不是全部的mysql進程，這在單個master上運行多實例時是很是有用的。若是成功地經過SSH中止了該服務，則腳本運行結果返回10，而且後續manager會經過SSH鏈接到master並保存必要的binlog。若是該腳本沒法經過SSH鏈接到master或者—command命令經過的話，那麼該腳本將會嘗試關閉機器電源。關閉電源依賴於H/W。若是電源關閉成功，該腳本返回0，不然返回1。當MHA接到返回的0時即開始故障切換。若是返回的代碼既不是0也不是10，MHA將會終止故障轉移工做。缺省參數爲空，因此默認狀況下MHA不對其作任何事。

而且，MHA在開始監控以後就會調用該腳本，如下參數將會在這個時候被傳遞過去，你能夠在這裏檢測腳本設置。是否控制電源不少程度上決定於H/W，因此很是簡易在這裏檢測電源狀態。若是你哪裏配置錯了，在啓動監控的時候你須要特別當心。

--command=status
--host=(master's hostname)
--ip=(master's ip address)

report_script

當故障切換完成或返回錯誤的時候，你或許但願能夠發送一個報告給你，report_script參數可適用於這種場合。MHA manager將會傳遞以下參數

--orig_master_host=(dead master's hostname)
--new_master_host=(new master's hostname)
--new_slave_hosts=(new slaves' hostnames, delimited by commas)
--subject=(mail subject)
--body=(body)

默認狀況下該參數爲空，即MHA不對其作任何事。在MHAmanager包的

Default parameter is empty, so MHA Manager does not invokeanything by default. /samples/scripts/send_report目錄下有使用範例。

init_conf_load_script

這個腳本能被應用於你不想在配置文件中填寫清楚的文本信息，好比密碼和複製帳戶的密碼。經過從這個腳本中返回name=value對，你能夠重寫這個全局配置文件。範例以下

  #!/usr/bin/perl
  
  print "password=$ROOT_PASS\n";
  print "repl_password=$REPL_PASS\n";

缺省參數爲空，因此MHA不對其作任何事。

Command reference

這一部分不作翻譯，一般狀況下只須要運行範例的命令，參數的詳細介紹請見官方文檔。

masterha_manager: 開啓MHA Manager

  # masterha_manager --conf=/etc/conf/masterha/app1.cnf

masterha_master_switch：切換master，分故障master切換和在線master切換兩種

交互式故障master切換

$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf--dead_master_host=host1

指定新的master

$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host1 --new_master_host=host5

非交互式

$ masterha_master_switch --master_state=dead --conf=/etc/conf/masterha/app1.cnf --dead_master_host=host1 --new_master_host=host2 --interactive=0

在線master切換

$ masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=host2

masterha_check_status：檢查MHA運行狀態

$ masterha_check_status --conf=/path/to/app1.cnf
  app1 (pid:8368) is running(0:PING_OK), master:host1
  $ echo $?
  0

Status Code(Exit code)	Status String	Description
0	PING_OK	Master is running and MHA Manager is monitoring. Master state is alive.
1	---	Unexpected error happened. For example, config file does not exist. If this error happens, check arguments are valid or not.
2	NOT_RUNNING	MHA Manager is not running. Master state is unknown.
3	PARTIALLY_RUNNING	MHA Manager main process is not running, but child processes are running. This should not happen and should be investigated. Master state is unknown.
10	INITIALIZING_MONITOR	MHA Manager is just after startup and initializing. Wait for a while and see how the status changes. Master state is unknown.
20	PING_FAILING	MHA Manager detects ping to master is failing. Master state is maybe down.
21	PING_FAILED	MHA Manager detects either a) ping to master failed three times, b) preparing for starting master failover. Master state is maybe down.
30	RETRYING_MONITOR	MHA Manager internal health check program detected that master was not reachable from manager, but after double check MHA Manager verified the master is alive, and currently waiting for retry. Master state is very likely alive.
31	CONFIG_ERROR	There are some configuration problems and MHA Manager can't monitor the target master. Check a logfile for detail. Master state is unknown.
32	TIMESTAMP_OLD	MHA Manager detects that ping to master is ok but status file is not updated for a long time. Check whether MHA Manager itself hangs or not. Master state is unknown.
50	FAILOVER_RUNNING	MHA Manager confirms that master is down and running failover. Master state is dead.
51	FAILOVER_ERROR	MHA Manager confirms that master is down and running failover, but failed during failover. Master state is dead.

masterha_check_repl：檢查複製健康狀態

manager_host$ masterha_check_repl --conf=/etc/app1.cnf
  ...
  MySQL Replication Health is OK.

masterha_stop：中止MHA manager運行

manager_host$ masterha_stop --conf=/etc/app1.cnf
  Stopped app1 successfully.

Masterha_conf_host：在配置文件中添加或移除host

# masterha_conf_host --command=add--conf=/etc/conf/masterha/app1.cnf --hostname=db101

Then the following lines will be added to theconf file.

[server_db101]
hostname=db101

You can add several parameters in the configfile by passing --param parameters, separated by semi-colon(;).

# masterha_conf_host --command=add--conf=/etc/conf/masterha/app1.cnf --hostname=db101 --block=server100--params="no_master=1;ignore_fail=1"

The following lines will be added to the conffile.

[server100]
hostname=db101
no_master=1
ignore_fail=1

You can also remove specified block. Thebelow command will remove the etire block server100.

# masterha_conf_host --command=delete--conf=/etc/conf/masterha/app1.cnf --block=server100

masterha_conf_host takes below arguments

master_check_ssh:ssh認證檢查

# masterha_check_ssh --conf=/etc/app1.cnf
  
  Sat May 14 14:42:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  Sat May 14 14:42:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Starting SSH connection tests..
  Sat May 14 14:42:19 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:20 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [info] All SSH connection tests passed successfully.

purge_relay_logs script：刪除舊的relay log

[app@slave_host1]$ cat /etc/cron.d/purge_relay_logs
  # purge relay logs at 5am
  0 5 * * * app /usr/bin/purge_relay_logs --user=root --password=PASSWORD --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1

Monitoring multiple applications

你或許在一臺機子上但願監控多套master-salve複製，這很是容易，只要爲application2建立一個新的配置文件並啓動manager

  # masterha_manager --conf=/etc/conf/masterha/app1.cnf
  # masterha_manager --conf=/etc/conf/masterha/app2.cnf

若是你在app1和app2上有一些共有的參數，可在全局配置文件中配置。

Using with clustering software

若是你在master上使用虛擬IP，你可能已經使用了相似於Pacemaker的集羣軟件。若是你使用了類似的工具，你或許須要使用它們來管理虛擬IP地址，而不是讓全部的事都由MHA完成。MHA僅用於故障切換，因此你須要使用配合使用其餘集羣工具來實現高可用。

下面是一個簡要的Pacemaker配置(Heartbeat v1 模式)

# /etc/ha.d/haresources on host2
host2 failover_start IPaddr::192.168.0.3

# failover_start script example

start)
  `masterha_master_switch --master_state=dead--interactive=0 --wait_on_failover_error=0 --dead_master_host=host1--new_master_host=host2`
  exit

stop)
  # do nothing

# Application configuration file:

  [server1]
hostname=host1
candidate_master=1

  [server2]
hostname=host2
candidate_master=1

  [server3]
hostname=host3
no_master=1

由於數據文件不是共享的，因此數據資源也不用被集羣工具或DRBD管理。處於這個目的，集羣工具僅僅實現一個執行masterha_master_switch腳本和虛擬IP漂移的功能，你也能夠本身使用手工腳本實現這些功能。