MMM結合Semisync機制實現Mysql Master-Master高可用

架構:node

兩個Master(主備模式),一個或多個Slave(也能夠沒有Slave,只有主備Master):mysql

一、Monitor運行MMM Daemon程序,實現全部Mysql服務器的監控和故障切換工做;sql

二、Master1和Master2互爲主備,同時只有一個主可用於寫操做(也可同時分擔讀操做),另外一個做爲備用,能夠分擔讀操做,讀寫分離須要應用程序實現;數據庫

三、Slave機器與當前active Master同步,如當前active Master故障後,Master將切換到passive Master,同時MMM修改Slave與新的Master同步;服務器

四、Application經過write和read ip進行讀寫操做;網絡



環境:session

主機名 服務器IP地址 Write IP Read IP 備註
mysql01 10.0.60.100 10.0.60.160 10.0.60.161 默認爲active Master,運行mmm agent
mysql02 10.0.60.101   10.0.60.162 默認爲passive  Master,運行mmm agent
mysql03 10.0.60.102   10.0.60.163 Slave,由MMM維護,運行mmm agent
mysql04 10.0.60.103     監控機,運行MMM  Daemon程序


軟件信息:架構

Mysql:5.6.17-log MySQL Community Server (GPL)併發

MMM:mysql-mmm-2.2.1app

OS:CentOS release 6.4 (Final),kernel 2.6.32-358.el6.x86_64


1、配置複製環境

這裏是全新配置,若是是已經存在了單master和slave環境,將配置不同,能夠結合xtrabackup工具實現數據的備份和恢復,配置主備master環境。

前提要求:

一、全部mysql實例開啓read_only=1;

二、主備master須要開啓log_bin;

三、全部mysql實例配置不一樣的server_id以及不一樣的二進制日誌、relay日誌文件名;


參考配置參數:

mysql01的特殊配置參數:

mysql01>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql01-bin
server_id = 1
read_only
mysql01>


mysql02的特殊配置參數:

mysql02>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql02-bin
server_id = 2
read_only
mysql02>


mysql03的特殊配置參數:

mysql03>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql_bin
server_id = 3
read_only


配置主從:

一、配置mysql01和mysql02互爲主從:

在mysql01和mysql02上建立一樣的複製帳號:

grant replication slave on *.* to 'repl'@'10.0.60.%' identified by 'repl';


查看master狀態:

show master status;



在每一個節點執行CHANGE MASTER TO語句:

mysql01change master to master_host = '10.0.60.101'
master_user='repl'
master_password='repl'
master_log_file='mysql02-bin.000001'
master_log_pos=545;


mysql02> change master to master_host = '10.0.60.100'
master_user='repl'
master_password='repl'
master_log_file='mysql01-bin.000001'
master_log_pos=545;


在兩個節點開啓slave:

start slave;


查看slave狀態是否正常:

mysql02>show slave status\G;
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


二、配置mysql03爲mysql01的從服務器

mysql03> change master to master_host = '10.0.60.100'
master_user='repl'
master_password='repl'
master_log_file='mysql01-bin.000001'
master_log_pos=545;


驗證slave狀態正常後,開始下面的步驟。


2、配置半同步

使用半同步機制,能夠確保至少一臺slave收到master的二進制日誌,在必定程度上保證了數據的一致性,減小了當master當機時,形成數據丟失。

半同步機制由google貢獻,從mysql 5.5開始原生支持該特性。


前提要求:

一、主備master都要安裝並開啓semisync master和slave,因mmm不能進行semisync配置和管理;

二、slave須要安裝並開啓semisync slave;


配置步驟:

一、mysql01和mysql02安裝semisync master和slave插件:

mysql01>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.01 sec)

mysql01>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.00 sec)


mysql02>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.05 sec)

mysql02>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)


二、mysql01和mysql02開啓semisync master和slave:

mysql01>SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql01>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)


mysql02>SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql02>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)


同時將參數寫入到配置文件,以mysql實例開啓時自動開啓半同步:

mysql02>\! cat my.cnf|grep semi
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 1


三、mysql03安裝並開啓semisync slave插件:

mysql03>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)

mysql03>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)


同時將參數寫入到配置文件,以mysql實例開啓時自動開啓半同步:

mysql03>\! cat my.cnf|grep semi
rpl_semi_sync_slave_enabled = 1
mysql03>


四、全部mysql實例中止slave並開啓slave,使半同步機制生效:

stop slave;start slave;


五、查看semisync狀態

mysql01>show status like '%emi%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
| Rpl_semi_sync_slave_status                 | ON    |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)


3、配置MMM

Multi Master Replication Manager for Mysql(MMM)是一套開源的perl腳本,對Mysql Master-Master複製環境(在任什麼時候刻只有一個節點可寫)進行監控、故障恢復以及管理。同時能根據複製的延時狀況管理讀負載均衡,經過遷移read虛擬IP地址。同時也能用於數據備份,以及節點之間重同步。

主要由三個腳本組成:

一、mmm_mod:監控daemon程序,進行監控工做,並決定讀、寫角色的遷移;最好運行在專用的監控服務器上,能夠管理多套Master-Slave集羣。

二、mmm_agentd:客戶端daemon程序,運行在全部mysql實例服務器,與監控節點進行簡單的遠程通訊。

三、mmm_control:用於管理mmm_mond進程的命令行腳本。


前提需求:

一、支持環境:

兩個節點的Master-Master環境,MMM須要5個IP地址(每一個節點一個固定IP地址,一個write IP地址,兩個read IP地址,write和read IP依據節點的可用性進行自動的遷移),正常狀況下,active master有一個write IP和一個read IP地址,standby master有一個read IP地址,若是當前active master故障,write和read IP地址將遷移到standby master;

兩個節點的Master-Master,以及一個或多個slave的環境,同時也是大多數企業使用的方案(能夠更好的擴展讀,同時有冗餘的Slave可用於備份等工做,防止阻塞正常的事務)。

二、n+1個主機:n個運行mysql實例的服務器,一個機器用於運行MMM監控daemon程序;

三、2*(n+1) IP地址:每一個主機一個固定IP地址,同時每臺mysql實例一個read IP地址以及一個write IP地址;

四、monitor數據庫用戶:須要REPLICATION CLIENT權限,用於MMM監控(mmm_mond);

五、agent數據庫用戶:須要SUPER、REPLICATION CLIENT、PROCESS權限,用於MMM 客戶端(mmm_agentd),能夠只針對本機IP進行受權;

六、relication數據庫用戶:須要REPLICATION SLAVE權限,用於mysql複製;

七、tools數據庫用戶:須要SUPER、REPLICATION CLIENT、RELOAD權限,用於MMM tools(如mmm_backup、mmm_clone、mmm_restore)


一、在mysql實例服務器安裝依賴包和mmm

安裝依賴包:

yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-ARP


若是在標準軟件倉庫和EPEL軟件倉庫沒有,須要單獨下載,能夠去如下網址下載:

http://rpm.pbone.net

http://search.cpan.org/

http://www.rpmfind.net/


安裝mmm:

tar xvf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install


二、在mysql實例服務器配置mmm agent

mmm_agentd使用mmm_agent.conf配置文件:

# cat /etc/mysql-mmm/mmm_agent.conf   
include mmm_common.conf #包含這個公用配置文件
#Description: name of this host,能夠不是主機名,每臺mysql實例的host不一樣(如mysql01設置爲db1,mysql02設置爲db2,mysql03設置爲db3)
this db1
#Description: Enable debug mode,設置1,打印日誌到前臺,按ctrl+c將結束進程
debug 0
#Description: Maximum number of retries when killing threads to prevent further
#writes during the removal of the active_master_role.
max_kill_retries 10


公用配置文件:mmm_common.conf,每一個實例同樣,並要拷貝到監控服務器供mmm_mond使用,進行網卡接口的定義,每一個主機的描述,複製和mmm agent的用戶名和密碼配置,以及讀寫規則等

# cat /etc/mysql-mmm/mmm_common.conf 
#Description: name of the role for which identifies the active master,定義活動master爲可寫
active_master_role                              writer

<host default> #默認段
        #Description: network interface on which the IPs of the roles should be configured,用於綁定ip的網絡接口
        cluster_interface                       eth0

        pid_path                                /var/run/mmm_agentd.pid
        bin_path                                /usr/lib/mysql-mmm/
        #Description: Port on which mmm agentd listens
        agent_port                              9989
        #Description: Port on which mysqld is listening
        mysql_port                              3306

        #Description: mysql user used for replication
        replication_user                        repl
        #Description: mysql password used for replication
        replication_password                    repl
        #Description: mysql user for MMM Agent
        agent_user                              mmm_agent
        #Description: mysql password for MMM Agent
        agent_password                          mmm_agent
</host>

<host db1> #命名段,指定每一個mysql實例主機
        #Description: IP of host
        ip                                      10.0.60.100
        #Description: Mode of host. Either master or slave.
        mode                                    master
        #Description: Name of peer host (if mode is master)
        peer                                    db2
</host>

<host db2>
        ip                                      10.0.60.101
        mode                                    master
        peer                                    db1
</host>

<host db3>
        ip                                      10.0.60.102
        mode                                    slave
</host>


<role writer> #定義write角色
        #Description: Hosts which may take over the role
        hosts                                   db1, db2
        #Description: One or multiple IPs associated with the role,指定浮動write IP地址
        ips                                     10.0.60.160
        #Description: Mode of role. Either balanced or exclusive
        mode                                    exclusive
        #Description: The preferred host for this role. Only allowed for exclusive roles.
        #prefer                                 -
</role>

<role reader>  #定義read角色
        hosts                                   db1, db2, db3
        ips                                     10.0.60.161,10.0.60.162,10.0.60.163 #浮動read IP地址
        mode                                    balanced
</role>


三、啓動mmm agent服務

/etc/init.d/mysql-mmm-agent start
chkconfig --level 2345 mysql-mmm-agent on


四、在監控服務器(mysql04)安裝依賴包和mmm

安裝依賴包:

yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-Ping


安裝mmm:

tar xvf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install


五、配置mmm 監控配置文件

mmm_mond和mmm_control使用mmm_mon.conf或mmm_mon_CLUSTER.conf配置文件

mmm_mon.conf配置文件參考:

# cat /etc/mysql-mmm/mmm_mon.conf 
include mmm_common.conf

#The monitor section is required by mmm_mond and mmm_control
<monitor>
        #Description: IP on which mmm_mond listens
        ip                                              127.0.0.1
        #Description: Port on which mmm mond listens
        port                                    9988
        #Description: Location of pid-file
        pid_path                                /var/run/mmm_mond.pid
        #Description: Path to directory containing MMM binaries
        bin_path                                /usr/lib/mysql-mmm/
        #Description: Location of of status file
        status_path                             /var/lib/misc/mmm_mond.status
        #Description: Break between network checks
        ping_interval                           1
        #Description: IPs used for network checks,指定全部mysql服務器IP,write和read IP地址,用於進行ping檢查
        ping_ips                                10.0.60.100, 10.0.60.101, 10.0.60.102, 10.0.60.160, 10.0.60.161, 10.0.60.162, 10.0.60.163
        #Description: Duration in seconds for flap detection. See flap_count
        flap_duration                           3600
        #Description: Maximum number of downtimes within flap_duration seconds after
        #which a host is considered to be flapping.
        flap_count                              3
        #Description: How many seconds to wait before switching node status from
        #AWAITING_RECOVERY to ONLINE. 0 = disabled.
        auto_set_online                         0
        #Description: Binary used to kill hosts if roles couldn’t be removed because the agent
        #was not reachable. You have to provide a custom binary for this which
        #takes the hostname as first argument and the state of check ping (1 -ok; 0 - not ok) as second argument.
        kill_host_bin                           /usr/lib/mysql-mmm/monitor/kill_host
        #Description: Startup carefully i.e. switch into passive mode when writer role is
        #configured on multiple hosts
        careful_startup                         0
        #Description: Default mode of monitor.
        mode                                    active
        #Description: How many seconds to wait for other master to become ONLINE before
        #switching from mode WAIT to mode ACTIVE. 0 = infinite.
        wait_for_other_master                   120
</monitor>

<host default>
        #Description: mysql user for MMM Monitor
        monitor_user                    mmm_agent
        #Description: mysql password for MMM Monitor
        monitor_password                mmm_agent
</host>

<check  mysql> #check段,mmm執行ping、mysql、rep_threads、rep_backlog四種檢查,能夠分別進行檢查間隔等參數配置。
        #Description: Perform check every 5 seconds
        check_period                    5
        #Description: Check is considered as failed if it doesn’t succeed for at least
        #trap period seconds.
        trap_period                     10
        #Description: Check times out after timeout seconds
        timeout                         2
        #Description: Restart checker process after restart after checks
        restart_after                   10000
        #Description: Maximum backlog for check rep_backlog.
        max_backlog                     60
</check>

#設置爲1,開啓調試模式,打印日誌到前臺,ctrl+c將結束進程,對於調試有幫助
debug 0


六、開啓mmm monitor監控

/etc/init.d/mysql-mmm-monitor start
chkconfig --level 2345 mysql-mmm-monitor on


七、使用mmm_control查看狀態

# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163), writer(10.0.60.160)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)


注:

當節點第一次開啓,狀態爲等待恢復。

設置節點online:

# mmm_control set_online db1


MMM如何工做:

當故障發生時,mmm迅速的遷移故障節點的IP地址從一個節點到另外一個節點,並使用Net::ARP Perl模塊更新ARP表

處理過程:

在故障active master節點:

一、mysql 設置爲read_only(set global read_only=1),防止寫事務;

二、中斷活動鏈接;

三、移除寫ip;


在新master節點:

一、運行在passive master的mmm進程被通知即將成爲active write;

二、slave將嘗試從master的二進制日誌抓取任何剩餘事務;

三、關閉read_only(set global read_only=0);

四、綁定write ip,併發生arp通告;


4、測試

一、測試mysql01 mysql實例故障

手動關閉mysql01服務器上的mysql實例,指望master將遷移到mysql02

中止mysql01的mysqld進程:也可使用"killall -15 mysqld"結束mysqld進程

mysql01>\! sh stop.sh


查看mmm_mond的日誌:總共通過10s時間完成遷移

# tail -f /var/log/mysql-mmm/mmm_mond.log 
2014/05/27 14:19:13  WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:13  WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:22 ERROR Check 'mysql' on 'db1' has failed for 10 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:23 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2014/05/27 14:19:23  INFO Removing all roles from host 'db1':
2014/05/27 14:19:23  INFO     Removed role 'reader(10.0.60.163)' from host 'db1'
2014/05/27 14:19:23  INFO     Removed role 'writer(10.0.60.160)' from host 'db1'
2014/05/27 14:19:23  INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2' #能夠看到寫IP已經遷移到mysql02
2014/05/27 14:19:23  INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3'


使用mmm_control命令查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/HARD_OFFLINE. Roles: #mysql01狀態已經變爲HARD_OFFLINE,意外着ping錯誤或mysql故障
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)


檢查mysql02的read_only變量是否改變:

mysql02>show global variables like 'read_only'; #默認在passive master時,read_only爲ON
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.00 sec)

mysql02>


檢查mysql03是否已經將mysql02做爲主:

mysql03>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.60.101 #已經從Mysql02同步
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql02-bin.000014
          Read_Master_Log_Pos: 120
               Relay_Log_File: mysql03-relay-bin.000002
                Relay_Log_Pos: 285
        Relay_Master_Log_File: mysql02-bin.000014
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


當再次啓動mysql01的mysql實例,db1的狀態將由HARD_OFFLINE改變爲AWAITING_RECOVERY:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)


須要手動設置爲online,mmm纔會分配read ip給mysql01,並與mysql02同步:

[root@mysql04 ~]# mmm_control set_online db1
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!


mysql01>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.60.101 #mysql01已經從Mysql02同步
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql02-bin.000015
          Read_Master_Log_Pos: 120
               Relay_Log_File: mysql01-relay.000027
                Relay_Log_Pos: 285
        Relay_Master_Log_File: mysql02-bin.000015
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


查看mysql02的semisync狀態:

mysql02>show status like 'Rpl_semi%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 1053  |
| Rpl_semi_sync_master_net_wait_time         | 2106  |
| Rpl_semi_sync_master_net_waits             | 2     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 1015  |
| Rpl_semi_sync_master_tx_wait_time          | 1015  |
| Rpl_semi_sync_master_tx_waits              | 1     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 1     |
| Rpl_semi_sync_slave_status                 | ON    |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)


二、模擬mysql02(Active Master服務器) kernel panic,指望進行遷移

執行上面的測試後,當前active master爲mysql02,使用下面命令模擬kernel panic:

mysql02>\! echo "c" > /proc/sysrq-trigger 


查看mmm_mond日誌:總共通過了20s的時間完成遷移

# tail -f /var/log/mysql-mmm/mmm_mond.log
2014/05/27 14:44:42  WARN Check 'rep_threads' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)
2014/05/27 14:44:42  WARN Check '
rep_backlog' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)
2014/05/27 14:44:46 FATAL Can't reach agent on host 'db2'
2014/05/27 14:44:49 ERROR Check '
ping' on 'db2' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.101 #ping檢查錯誤
2014/05/27 14:44:55 ERROR Check '
mysql' on 'db2' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4) #mysql檢查錯誤,不能鏈接
2014/05/27 14:44:59  INFO Check 'ping' on 'db2' is ok!
2014/05/27 14:45:02 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) #改變mysql02的狀態爲HARD_OFFLINE
2014/05/27 14:45:02  INFO Removing all roles from host 'db2':  #移除mysql02的角色
2014/05/27 14:45:02  INFO     Removed role 'reader(10.0.60.161)' from host 'db2'
2014/05/27 14:45:02  INFO     Removed role 'writer(10.0.60.160)' from host 'db2'
2014/05/27 14:45:02 FATAL Agent on host 'db2' is reachable again
2014/05/27 14:45:02  INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db1' #分配角色到其餘機器,write IP分配到mysql01,永遠不會分配到mysql03
2014/05/27 14:45:02  INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'


使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
  db2(10.0.60.101) master/HARD_OFFLINE. Roles: 
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)


三、模擬active master服務器網絡不通,指望進行遷移,可是網絡恢復後,將不會重啓slave;

當前active master爲mysql01,在mysql01上禁用網卡:

mysql01>\! cat down_net.sh
ifdown eth0
sleep 600
ifup eth0
mysql01>\! sh down_net.sh


查看mmm_mond日誌:總共通過了9s完成遷移

2014/05/27 15:02:35  WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)
2014/05/27 15:02:35  WARN Check '
rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)
2014/05/27 15:02:41 FATAL Can't reach agent on host 'db1'
2014/05/27 15:02:41 ERROR Check '
ping' on 'db1' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.100  #ping檢查錯誤
2014/05/27 15:02:44 FATAL State of host '
db1' changed from ONLINE to HARD_OFFLINE (ping: not OK, mysql: OK) #改變狀態
2014/05/27 15:02:44  INFO Removing all roles from host '
db1': #移除角色
2014/05/27 15:02:44  INFO     Removed role '
reader(10.0.60.163)' from host 'db1'  
2014/05/27 15:02:44  INFO     Removed role '
writer(10.0.60.160)' from host 'db1'
2014/05/27 15:02:44 ERROR Can'
t send offline status notification to 'db1' - killing it!
2014/05/27 15:02:44 FATAL Could not kill host 'db1' - there may be some duplicate ips now! (There's no binary configured for killing hosts.)
2014/05/27 15:02:44  INFO Orphaned role '
writer(10.0.60.160)' has been assigned to 'db2'
2014/05/27 15:02:44  INFO Orphaned role '
reader(10.0.60.163)' has been assigned to 'db3'
2014/05/27 15:02:48 ERROR Check '
mysql' on 'db1' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)


使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
# Warning: agent on host db1 is not reachable
  db1(10.0.60.100) master/HARD_OFFLINE. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)


當網絡恢復後,mmm會修改mysql01的slave配置,修改主爲mysql02,可是沒有重啓slave,形成不能進行數據同步,須要手工從新開啓slave。

使用mmm_control檢查狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/AWAITING_RECOVERY. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)

[root@mysql04 ~]# mmm_control set_online db1  #網絡恢復後,手動設置爲online
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!
[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)


檢查mysql01的slave狀態:看上去正常的

mysql01>show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.60.101
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql02-bin.000015
          Read_Master_Log_Pos: 328
               Relay_Log_File: mysql01-relay.000027
                Relay_Log_Pos: 493
        Relay_Master_Log_File: mysql02-bin.000015
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


可是在mysql02(當前active master)插入數據,mysql01不能從mysql02同步:

mysql02> insert into t1 values(2);
Query OK, 1 row affected (0.02 sec)
mysql02>select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)


mysql03已經同步了數據:

mysql03>select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)


而mysql01沒有同步數據:

mysql01>select * from t1;
+----+
| id |
+----+
|  1 |
+----+
1 row in set (0.00 sec)


解決方法:先中止slave,而後啓動slave;

mysql01>stop slave;start slave; 
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql01>select * from t1;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
2 rows in set (0.00 sec)


截圖:


四、模擬slave線程故障,不論是io或sql線程故障,指望進行遷移,恢復時若是在flap_duration時間內超過了flap_count次數的故障,將不會自動恢復,狀態由REPLICATION_FAIL改成 AWAITING_RECOVERY (because it's flapping)

當前active master爲mysql02。

中止active master(mysql02)的slave,不會形成遷移:

mmm_mond的日誌:已經檢測到db2(mysql02)複製線程錯誤

2014/05/27 15:39:02 ERROR Check 'rep_threads' on 'db2' has failed for 10 seconds! Message: ERROR: Replication is broken


使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.161)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.163)


若是當其餘slave(mysql0一、mysql03)的線程(不論是io仍是sql線程)故障將會發生遷移:

手工中止io線程:

mysql01>stop slave io_thread;
Query OK, 0 rows affected (0.01 sec)


查看mmm_mond日誌:

2014/05/27 15:43:28 ERROR Check 'rep_threads' on 'db1' has failed for 10 seconds! Message: ERROR: Replication is broken
2014/05/27 15:43:31 FATAL State of host 'db1' changed from ONLINE to REPLICATION_FAIL
2014/05/27 15:43:31  INFO Removing all roles from host 'db1':
2014/05/27 15:43:31  INFO     Removed role 'reader(10.0.60.161)' from host 'db1'  #移除角色
2014/05/27 15:43:31  INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'


使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/REPLICATION_FAIL. Roles: 
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161), reader(10.0.60.163)


當從新開啓io線程後,mmm將自動恢復db1,並從新遷移read IP到db1(mysql01)上,若是故障超過:

從新開啓線程:

mysql01>start slave io_thread;  
Query OK, 0 rows affected (0.00 sec)


查看mmm_mond日誌:

2014/05/27 15:45:23  INFO Check 'rep_threads' on 'db1' is ok!
2014/05/27 15:45:25 FATAL State of host 'db1' changed from REPLICATION_FAIL to ONLINE
2014/05/27 15:45:25  INFO Moving role 'reader(10.0.60.163)' from host 'db3' to host 'db1'


使用mmm_control查看狀態:

[root@mysql04 ~]# mmm_control show
  db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
  db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
  db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161)


五、複製延時

延時檢查有max_backlog控制,默認爲60;

複製延時或錯誤,若是故障時間少於60s,狀態爲ONLINE,單會遷移,故障恢復後,mmm自動恢復read IP。若是rep_backlog和rel_threads同時錯誤,狀態將爲REPLICATION_FAIL。


六、mmm agent或monitor故障

不會遷移角色,若是此時有master或slave故障,也將不能遷移角色


參考:

MMM官網:http://mysql-mmm.org/

MMM博客:http://blog.mysql-mmm.org/





來自爲知筆記(Wiz)

相關文章
相關標籤/搜索