MMM結合Semisync機制實現Mysql Master-Master高可用

時間 2019-11-11

標籤 mmm 結合 semisync 機制實現 mysql master 可用欄目 MySQL 简体版

原文原文鏈接

架構：node

兩個Master(主備模式)，一個或多個Slave(也能夠沒有Slave，只有主備Master)：mysql

一、Monitor運行MMM Daemon程序，實現全部Mysql服務器的監控和故障切換工做；sql

二、Master1和Master2互爲主備，同時只有一個主可用於寫操做(也可同時分擔讀操做)，另外一個做爲備用，能夠分擔讀操做，讀寫分離須要應用程序實現；數據庫

三、Slave機器與當前active Master同步，如當前active Master故障後，Master將切換到passive Master，同時MMM修改Slave與新的Master同步；服務器

四、Application經過write和read ip進行讀寫操做；網絡

環境：session

主機名	服務器IP地址	Write IP	Read IP	備註
mysql01	10.0.60.100	10.0.60.160	10.0.60.161	默認爲active Master，運行mmm agent
mysql02	10.0.60.101		10.0.60.162	默認爲passive Master，運行mmm agent
mysql03	10.0.60.102		10.0.60.163	Slave，由MMM維護，運行mmm agent
mysql04	10.0.60.103			監控機，運行MMM Daemon程序

軟件信息：架構

Mysql：5.6.17-log MySQL Community Server (GPL)併發

MMM：mysql-mmm-2.2.1app

OS：CentOS release 6.4 (Final)，kernel 2.6.32-358.el6.x86_64

1、配置複製環境

這裏是全新配置，若是是已經存在了單master和slave環境，將配置不同，能夠結合xtrabackup工具實現數據的備份和恢復，配置主備master環境。

前提要求：

一、全部mysql實例開啓read_only=1；

二、主備master須要開啓log_bin；

三、全部mysql實例配置不一樣的server_id以及不一樣的二進制日誌、relay日誌文件名；

參考配置參數：

mysql01的特殊配置參數：

mysql01>\! grep -E "log_bin|server_id|read_only" my.cnf log_bin = mysql01-bin server_id = 1 read_only mysql01>

mysql02的特殊配置參數：

mysql02>\! grep -E "log_bin|server_id|read_only" my.cnf log_bin = mysql02-bin server_id = 2 read_only mysql02>

mysql03的特殊配置參數：

mysql03>\! grep -E "log_bin|server_id|read_only" my.cnf log_bin = mysql_bin server_id = 3 read_only

配置主從：

一、配置mysql01和mysql02互爲主從：

在mysql01和mysql02上建立一樣的複製帳號：

grant replication slave on *.* to 'repl'@'10.0.60.%' identified by 'repl';

查看master狀態：

show master status;

在每一個節點執行CHANGE MASTER TO語句：

mysql01> change master to master_host = '10.0.60.101', master_user='repl', master_password='repl', master_log_file='mysql02-bin.000001', master_log_pos=545;

mysql02> change master to master_host = '10.0.60.100', master_user='repl', master_password='repl', master_log_file='mysql01-bin.000001', master_log_pos=545;

在兩個節點開啓slave：

start slave;

查看slave狀態是否正常：

mysql02>show slave status\G; Slave_IO_Running: Yes Slave_SQL_Running: Yes

二、配置mysql03爲mysql01的從服務器

mysql03> change master to master_host = '10.0.60.100', master_user='repl', master_password='repl', master_log_file='mysql01-bin.000001', master_log_pos=545;

驗證slave狀態正常後，開始下面的步驟。

2、配置半同步

使用半同步機制，能夠確保至少一臺slave收到master的二進制日誌，在必定程度上保證了數據的一致性，減小了當master當機時，形成數據丟失。

半同步機制由google貢獻，從mysql 5.5開始原生支持該特性。

前提要求：

一、主備master都要安裝並開啓semisync master和slave，因mmm不能進行semisync配置和管理；

二、slave須要安裝並開啓semisync slave；

配置步驟：

一、mysql01和mysql02安裝semisync master和slave插件：

mysql01>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; Query OK, 0 rows affected (0.01 sec) mysql01>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; Query OK, 0 rows affected (0.00 sec)

mysql02>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; Query OK, 0 rows affected (0.05 sec) mysql02>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; Query OK, 0 rows affected (0.01 sec)

二、mysql01和mysql02開啓semisync master和slave：

mysql01>SET GLOBAL rpl_semi_sync_master_enabled = 1; Query OK, 0 rows affected (0.00 sec) mysql01>SET GLOBAL rpl_semi_sync_slave_enabled = 1; Query OK, 0 rows affected (0.00 sec)

mysql02>SET GLOBAL rpl_semi_sync_master_enabled = 1; Query OK, 0 rows affected (0.00 sec) mysql02>SET GLOBAL rpl_semi_sync_slave_enabled = 1; Query OK, 0 rows affected (0.00 sec)

同時將參數寫入到配置文件，以mysql實例開啓時自動開啓半同步：

mysql02>\! cat my.cnf|grep semi rpl_semi_sync_master_enabled = 1 rpl_semi_sync_slave_enabled = 1

三、mysql03安裝並開啓semisync slave插件：

mysql03>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; Query OK, 0 rows affected (0.01 sec) mysql03>SET GLOBAL rpl_semi_sync_slave_enabled = 1; Query OK, 0 rows affected (0.00 sec)

同時將參數寫入到配置文件，以mysql實例開啓時自動開啓半同步：

mysql03>\! cat my.cnf|grep semi rpl_semi_sync_slave_enabled = 1 mysql03>

四、全部mysql實例中止slave並開啓slave，使半同步機制生效：

stop slave;start slave;

五、查看semisync狀態

mysql01>show status like '%emi%'; +--------------------------------------------+-------+ | Variable_name | Value | +--------------------------------------------+-------+ | Rpl_semi_sync_master_clients | 2 | | Rpl_semi_sync_master_net_avg_wait_time | 0 | | Rpl_semi_sync_master_net_wait_time | 0 | | Rpl_semi_sync_master_net_waits | 0 | | Rpl_semi_sync_master_no_times | 0 | | Rpl_semi_sync_master_no_tx | 0 | | Rpl_semi_sync_master_status | ON | | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 0 | | Rpl_semi_sync_master_tx_wait_time | 0 | | Rpl_semi_sync_master_tx_waits | 0 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 0 | | Rpl_semi_sync_slave_status | ON | +--------------------------------------------+-------+ 15 rows in set (0.00 sec)

3、配置MMM

Multi Master Replication Manager for Mysql(MMM)是一套開源的perl腳本，對Mysql Master-Master複製環境(在任什麼時候刻只有一個節點可寫)進行監控、故障恢復以及管理。同時能根據複製的延時狀況管理讀負載均衡，經過遷移read虛擬IP地址。同時也能用於數據備份，以及節點之間重同步。

主要由三個腳本組成：

一、mmm_mod：監控daemon程序，進行監控工做，並決定讀、寫角色的遷移；最好運行在專用的監控服務器上，能夠管理多套Master-Slave集羣。

二、mmm_agentd：客戶端daemon程序，運行在全部mysql實例服務器，與監控節點進行簡單的遠程通訊。

三、mmm_control：用於管理mmm_mond進程的命令行腳本。

前提需求：

一、支持環境：

兩個節點的Master-Master環境，MMM須要5個IP地址(每一個節點一個固定IP地址，一個write IP地址，兩個read IP地址，write和read IP依據節點的可用性進行自動的遷移)，正常狀況下，active master有一個write IP和一個read IP地址，standby master有一個read IP地址，若是當前active master故障，write和read IP地址將遷移到standby master；

兩個節點的Master-Master，以及一個或多個slave的環境，同時也是大多數企業使用的方案(能夠更好的擴展讀，同時有冗餘的Slave可用於備份等工做，防止阻塞正常的事務)。

二、n+1個主機：n個運行mysql實例的服務器，一個機器用於運行MMM監控daemon程序；

三、2*(n+1) IP地址：每一個主機一個固定IP地址，同時每臺mysql實例一個read IP地址以及一個write IP地址；

四、monitor數據庫用戶：須要REPLICATION CLIENT權限，用於MMM監控(mmm_mond)；

五、agent數據庫用戶：須要SUPER、REPLICATION CLIENT、PROCESS權限，用於MMM 客戶端(mmm_agentd)，能夠只針對本機IP進行受權；

六、relication數據庫用戶：須要REPLICATION SLAVE權限，用於mysql複製；

七、tools數據庫用戶：須要SUPER、REPLICATION CLIENT、RELOAD權限，用於MMM tools(如mmm_backup、mmm_clone、mmm_restore)

一、在mysql實例服務器安裝依賴包和mmm

安裝依賴包：

yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-ARP

若是在標準軟件倉庫和EPEL軟件倉庫沒有，須要單獨下載，能夠去如下網址下載：

http://rpm.pbone.net

http://search.cpan.org/

http://www.rpmfind.net/

安裝mmm：

tar xvf mysql-mmm-2.2.1.tar.gz cd mysql-mmm-2.2.1 make install

二、在mysql實例服務器配置mmm agent

mmm_agentd使用mmm_agent.conf配置文件：

# cat /etc/mysql-mmm/mmm_agent.conf include mmm_common.conf #包含這個公用配置文件 #Description: name of this host，能夠不是主機名，每臺mysql實例的host不一樣(如mysql01設置爲db1，mysql02設置爲db2，mysql03設置爲db3) this db1 #Description: Enable debug mode，設置1，打印日誌到前臺，按ctrl+c將結束進程 debug 0 #Description: Maximum number of retries when killing threads to prevent further #writes during the removal of the active_master_role. max_kill_retries 10

公用配置文件：mmm_common.conf，每一個實例同樣，並要拷貝到監控服務器供mmm_mond使用，進行網卡接口的定義，每一個主機的描述，複製和mmm agent的用戶名和密碼配置，以及讀寫規則等

# cat /etc/mysql-mmm/mmm_common.conf #Description: name of the role for which identifies the active master，定義活動master爲可寫 active_master_role writer <host default> #默認段 #Description: network interface on which the IPs of the roles should be configured，用於綁定ip的網絡接口 cluster_interface eth0 pid_path /var/run/mmm_agentd.pid bin_path /usr/lib/mysql-mmm/ #Description: Port on which mmm agentd listens agent_port 9989 #Description: Port on which mysqld is listening mysql_port 3306 #Description: mysql user used for replication replication_user repl #Description: mysql password used for replication replication_password repl #Description: mysql user for MMM Agent agent_user mmm_agent #Description: mysql password for MMM Agent agent_password mmm_agent </host> <host db1> #命名段，指定每一個mysql實例主機 #Description: IP of host ip 10.0.60.100 #Description: Mode of host. Either master or slave. mode master #Description: Name of peer host (if mode is master) peer db2 </host> <host db2> ip 10.0.60.101 mode master peer db1 </host> <host db3> ip 10.0.60.102 mode slave </host> <role writer> #定義write角色 #Description: Hosts which may take over the role hosts db1, db2 #Description: One or multiple IPs associated with the role，指定浮動write IP地址 ips 10.0.60.160 #Description: Mode of role. Either balanced or exclusive mode exclusive #Description: The preferred host for this role. Only allowed for exclusive roles. #prefer - </role> <role reader> #定義read角色 hosts db1, db2, db3 ips 10.0.60.161,10.0.60.162,10.0.60.163 #浮動read IP地址 mode balanced </role>

三、啓動mmm agent服務

/etc/init.d/mysql-mmm-agent start chkconfig --level 2345 mysql-mmm-agent on

四、在監控服務器(mysql04)安裝依賴包和mmm

安裝依賴包：

安裝mmm：

tar xvf mysql-mmm-2.2.1.tar.gz cd mysql-mmm-2.2.1 make install

五、配置mmm 監控配置文件

mmm_mond和mmm_control使用mmm_mon.conf或mmm_mon_CLUSTER.conf配置文件

mmm_mon.conf配置文件參考：

# cat /etc/mysql-mmm/mmm_mon.conf include mmm_common.conf #The monitor section is required by mmm_mond and mmm_control <monitor> #Description: IP on which mmm_mond listens ip 127.0.0.1 #Description: Port on which mmm mond listens port 9988 #Description: Location of pid-file pid_path /var/run/mmm_mond.pid #Description: Path to directory containing MMM binaries bin_path /usr/lib/mysql-mmm/ #Description: Location of of status file status_path /var/lib/misc/mmm_mond.status #Description: Break between network checks ping_interval 1 #Description: IPs used for network checks，指定全部mysql服務器IP，write和read IP地址，用於進行ping檢查 ping_ips 10.0.60.100, 10.0.60.101, 10.0.60.102, 10.0.60.160, 10.0.60.161, 10.0.60.162, 10.0.60.163 #Description: Duration in seconds for flap detection. See flap_count flap_duration 3600 #Description: Maximum number of downtimes within flap_duration seconds after #which a host is considered to be flapping. flap_count 3 #Description: How many seconds to wait before switching node status from #AWAITING_RECOVERY to ONLINE. 0 = disabled. auto_set_online 0 #Description: Binary used to kill hosts if roles couldn’t be removed because the agent #was not reachable. You have to provide a custom binary for this which #takes the hostname as first argument and the state of check ping (1 -ok; 0 - not ok) as second argument. kill_host_bin /usr/lib/mysql-mmm/monitor/kill_host #Description: Startup carefully i.e. switch into passive mode when writer role is #configured on multiple hosts careful_startup 0 #Description: Default mode of monitor. mode active #Description: How many seconds to wait for other master to become ONLINE before #switching from mode WAIT to mode ACTIVE. 0 = infinite. wait_for_other_master 120 </monitor> <host default> #Description: mysql user for MMM Monitor monitor_user mmm_agent #Description: mysql password for MMM Monitor monitor_password mmm_agent </host> <check mysql> #check段，mmm執行ping、mysql、rep_threads、rep_backlog四種檢查，能夠分別進行檢查間隔等參數配置。 #Description: Perform check every 5 seconds check_period 5 #Description: Check is considered as failed if it doesn’t succeed for at least #trap period seconds. trap_period 10 #Description: Check times out after timeout seconds timeout 2 #Description: Restart checker process after restart after checks restart_after 10000 #Description: Maximum backlog for check rep_backlog. max_backlog 60 </check> #設置爲1,開啓調試模式，打印日誌到前臺，ctrl+c將結束進程，對於調試有幫助 debug 0

六、開啓mmm monitor監控

/etc/init.d/mysql-mmm-monitor start chkconfig --level 2345 mysql-mmm-monitor on

七、使用mmm_control查看狀態

# mmm_control show db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163), writer(10.0.60.160) db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

注：

當節點第一次開啓，狀態爲等待恢復。

設置節點online：

# mmm_control set_online db1

MMM如何工做：

當故障發生時，mmm迅速的遷移故障節點的IP地址從一個節點到另外一個節點，並使用Net::ARP Perl模塊更新ARP表。

處理過程：

在故障active master節點：

一、mysql 設置爲read_only(set global read_only=1)，防止寫事務；

二、中斷活動鏈接；

三、移除寫ip；

在新master節點：

一、運行在passive master的mmm進程被通知即將成爲active write；

二、slave將嘗試從master的二進制日誌抓取任何剩餘事務；

三、關閉read_only(set global read_only=0)；

四、綁定write ip，併發生arp通告；

4、測試

一、測試mysql01 mysql實例故障

手動關閉mysql01服務器上的mysql實例，指望master將遷移到mysql02

中止mysql01的mysqld進程：也可使用"killall -15 mysqld"結束mysqld進程

mysql01>\! sh stop.sh

查看mmm_mond的日誌：總共通過10s時間完成遷移

# tail -f /var/log/mysql-mmm/mmm_mond.log 2014/05/27 14:19:13 WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2014/05/27 14:19:13 WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2014/05/27 14:19:22 ERROR Check 'mysql' on 'db1' has failed for 10 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2014/05/27 14:19:23 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2014/05/27 14:19:23 INFO Removing all roles from host 'db1': 2014/05/27 14:19:23 INFO Removed role 'reader(10.0.60.163)' from host 'db1' 2014/05/27 14:19:23 INFO Removed role 'writer(10.0.60.160)' from host 'db1' 2014/05/27 14:19:23 INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2' #能夠看到寫IP已經遷移到mysql02 2014/05/27 14:19:23 INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3'

使用mmm_control命令查看狀態：

[root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/HARD_OFFLINE. Roles: #mysql01狀態已經變爲HARD_OFFLINE，意外着ping錯誤或mysql故障 db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

檢查mysql02的read_only變量是否改變：

檢查mysql03是否已經將mysql02做爲主：

mysql03>show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.60.101 #已經從Mysql02同步 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql02-bin.000014 Read_Master_Log_Pos: 120 Relay_Log_File: mysql03-relay-bin.000002 Relay_Log_Pos: 285 Relay_Master_Log_File: mysql02-bin.000014 Slave_IO_Running: Yes Slave_SQL_Running: Yes

當再次啓動mysql01的mysql實例，db1的狀態將由HARD_OFFLINE改變爲AWAITING_RECOVERY：

[root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

須要手動設置爲online，mmm纔會分配read ip給mysql01，並與mysql02同步：

[root@mysql04 ~]# mmm_control set_online db1 OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!

mysql01>show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.60.101 #mysql01已經從Mysql02同步 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql02-bin.000015 Read_Master_Log_Pos: 120 Relay_Log_File: mysql01-relay.000027 Relay_Log_Pos: 285 Relay_Master_Log_File: mysql02-bin.000015 Slave_IO_Running: Yes Slave_SQL_Running: Yes

查看mysql02的semisync狀態：

mysql02>show status like 'Rpl_semi%'; +--------------------------------------------+-------+ | Variable_name | Value | +--------------------------------------------+-------+ | Rpl_semi_sync_master_clients | 2 | | Rpl_semi_sync_master_net_avg_wait_time | 1053 | | Rpl_semi_sync_master_net_wait_time | 2106 | | Rpl_semi_sync_master_net_waits | 2 | | Rpl_semi_sync_master_no_times | 0 | | Rpl_semi_sync_master_no_tx | 0 | | Rpl_semi_sync_master_status | ON | | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 1015 | | Rpl_semi_sync_master_tx_wait_time | 1015 | | Rpl_semi_sync_master_tx_waits | 1 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 1 | | Rpl_semi_sync_slave_status | ON | +--------------------------------------------+-------+ 15 rows in set (0.00 sec)

二、模擬mysql02(Active Master服務器) kernel panic，指望進行遷移

執行上面的測試後，當前active master爲mysql02，使用下面命令模擬kernel panic：

mysql02>\! echo "c" > /proc/sysrq-trigger

查看mmm_mond日誌：總共通過了20s的時間完成遷移

# tail -f /var/log/mysql-mmm/mmm_mond.log 2014/05/27 14:44:42 WARN Check 'rep_threads' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4) 2014/05/27 14:44:42 WARN Check 'rep_backlog' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4) 2014/05/27 14:44:46 FATAL Can't reach agent on host 'db2' 2014/05/27 14:44:49 ERROR Check 'ping' on 'db2' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.101 #ping檢查錯誤 2014/05/27 14:44:55 ERROR Check 'mysql' on 'db2' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4) #mysql檢查錯誤，不能鏈接 2014/05/27 14:44:59 INFO Check 'ping' on 'db2' is ok! 2014/05/27 14:45:02 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) #改變mysql02的狀態爲HARD_OFFLINE 2014/05/27 14:45:02 INFO Removing all roles from host 'db2': #移除mysql02的角色 2014/05/27 14:45:02 INFO Removed role 'reader(10.0.60.161)' from host 'db2' 2014/05/27 14:45:02 INFO Removed role 'writer(10.0.60.160)' from host 'db2' 2014/05/27 14:45:02 FATAL Agent on host 'db2' is reachable again 2014/05/27 14:45:02 INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db1' #分配角色到其餘機器,write IP分配到mysql01，永遠不會分配到mysql03 2014/05/27 14:45:02 INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'

使用mmm_control查看狀態：

[root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163) db2(10.0.60.101) master/HARD_OFFLINE. Roles: db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

三、模擬active master服務器網絡不通，指望進行遷移，可是網絡恢復後，將不會重啓slave；

當前active master爲mysql01，在mysql01上禁用網卡：

mysql01>\! cat down_net.sh ifdown eth0 sleep 600 ifup eth0 mysql01>\! sh down_net.sh

查看mmm_mond日誌：總共通過了9s完成遷移

2014/05/27 15:02:35 WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4) 2014/05/27 15:02:35 WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4) 2014/05/27 15:02:41 FATAL Can't reach agent on host 'db1' 2014/05/27 15:02:41 ERROR Check 'ping' on 'db1' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.100 #ping檢查錯誤 2014/05/27 15:02:44 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: not OK, mysql: OK) #改變狀態 2014/05/27 15:02:44 INFO Removing all roles from host 'db1': #移除角色 2014/05/27 15:02:44 INFO Removed role 'reader(10.0.60.163)' from host 'db1' 2014/05/27 15:02:44 INFO Removed role 'writer(10.0.60.160)' from host 'db1' 2014/05/27 15:02:44 ERROR Can't send offline status notification to 'db1' - killing it! 2014/05/27 15:02:44 FATAL Could not kill host 'db1' - there may be some duplicate ips now! (There's no binary configured for killing hosts.) 2014/05/27 15:02:44 INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2' 2014/05/27 15:02:44 INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3' 2014/05/27 15:02:48 ERROR Check 'mysql' on 'db1' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)

使用mmm_control查看狀態：

[root@mysql04 ~]# mmm_control show # Warning: agent on host db1 is not reachable db1(10.0.60.100) master/HARD_OFFLINE. Roles: db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

當網絡恢復後，mmm會修改mysql01的slave配置，修改主爲mysql02，可是沒有重啓slave，形成不能進行數據同步，須要手工從新開啓slave。

使用mmm_control檢查狀態：

[root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163) [root@mysql04 ~]# mmm_control set_online db1 #網絡恢復後，手動設置爲online OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles! [root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163) db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)

檢查mysql01的slave狀態：看上去正常的

mysql01>show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.60.101 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql02-bin.000015 Read_Master_Log_Pos: 328 Relay_Log_File: mysql01-relay.000027 Relay_Log_Pos: 493 Relay_Master_Log_File: mysql02-bin.000015 Slave_IO_Running: Yes Slave_SQL_Running: Yes

可是在mysql02(當前active master)插入數據，mysql01不能從mysql02同步：

mysql02> insert into t1 values(2); Query OK, 1 row affected (0.02 sec) mysql02>select * from t1; +----+ | id | +----+ | 1 | | 2 | +----+ 2 rows in set (0.00 sec)

mysql03已經同步了數據：

mysql03>select * from t1; +----+ | id | +----+ | 1 | | 2 | +----+ 2 rows in set (0.00 sec)

而mysql01沒有同步數據：

mysql01>select * from t1; +----+ | id | +----+ | 1 | +----+ 1 row in set (0.00 sec)

解決方法：先中止slave，而後啓動slave；

mysql01>stop slave;start slave; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql01>select * from t1; +----+ | id | +----+ | 1 | | 2 | +----+ 2 rows in set (0.00 sec)

截圖：

四、模擬slave線程故障，不論是io或sql線程故障，指望進行遷移，恢復時若是在flap_duration時間內超過了flap_count次數的故障，將不會自動恢復，狀態由REPLICATION_FAIL改成 AWAITING_RECOVERY (because it's flapping)

當前active master爲mysql02。

中止active master(mysql02)的slave，不會形成遷移：

mmm_mond的日誌：已經檢測到db2(mysql02)複製線程錯誤

2014/05/27 15:39:02 ERROR Check 'rep_threads' on 'db2' has failed for 10 seconds! Message: ERROR: Replication is broken

使用mmm_control查看狀態：

[root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.161) db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.163)

若是當其餘slave(mysql0一、mysql03)的線程(不論是io仍是sql線程)故障將會發生遷移：

手工中止io線程：

mysql01>stop slave io_thread; Query OK, 0 rows affected (0.01 sec)

查看mmm_mond日誌：

2014/05/27 15:43:28 ERROR Check 'rep_threads' on 'db1' has failed for 10 seconds! Message: ERROR: Replication is broken 2014/05/27 15:43:31 FATAL State of host 'db1' changed from ONLINE to REPLICATION_FAIL 2014/05/27 15:43:31 INFO Removing all roles from host 'db1': 2014/05/27 15:43:31 INFO Removed role 'reader(10.0.60.161)' from host 'db1' #移除角色 2014/05/27 15:43:31 INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'

使用mmm_control查看狀態：

[root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/REPLICATION_FAIL. Roles: db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161), reader(10.0.60.163)

當從新開啓io線程後，mmm將自動恢復db1，並從新遷移read IP到db1(mysql01)上，若是故障超過：

從新開啓線程：

mysql01>start slave io_thread; Query OK, 0 rows affected (0.00 sec)

查看mmm_mond日誌：

2014/05/27 15:45:23 INFO Check 'rep_threads' on 'db1' is ok! 2014/05/27 15:45:25 FATAL State of host 'db1' changed from REPLICATION_FAIL to ONLINE 2014/05/27 15:45:25 INFO Moving role 'reader(10.0.60.163)' from host 'db3' to host 'db1'

使用mmm_control查看狀態：

[root@mysql04 ~]# mmm_control show db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163) db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160) db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161)