架構:node
兩個Master(主備模式),一個或多個Slave(也能夠沒有Slave,只有主備Master):mysql
一、Monitor運行MMM Daemon程序,實現全部Mysql服務器的監控和故障切換工做;sql
二、Master1和Master2互爲主備,同時只有一個主可用於寫操做(也可同時分擔讀操做),另外一個做爲備用,能夠分擔讀操做,讀寫分離須要應用程序實現;數據庫
三、Slave機器與當前active Master同步,如當前active Master故障後,Master將切換到passive Master,同時MMM修改Slave與新的Master同步;服務器
四、Application經過write和read ip進行讀寫操做;網絡
環境:session
主機名 | 服務器IP地址 | Write IP | Read IP | 備註 |
mysql01 | 10.0.60.100 | 10.0.60.160 | 10.0.60.161 | 默認爲active Master,運行mmm agent |
mysql02 | 10.0.60.101 | 10.0.60.162 | 默認爲passive Master,運行mmm agent | |
mysql03 | 10.0.60.102 | 10.0.60.163 | Slave,由MMM維護,運行mmm agent | |
mysql04 | 10.0.60.103 | 監控機,運行MMM Daemon程序 |
軟件信息:架構
Mysql:5.6.17-log MySQL Community Server (GPL)併發
MMM:mysql-mmm-2.2.1app
OS:CentOS release 6.4 (Final),kernel 2.6.32-358.el6.x86_64
1、配置複製環境
這裏是全新配置,若是是已經存在了單master和slave環境,將配置不同,能夠結合xtrabackup工具實現數據的備份和恢復,配置主備master環境。
前提要求:
一、全部mysql實例開啓read_only=1;
二、主備master須要開啓log_bin;
三、全部mysql實例配置不一樣的server_id以及不一樣的二進制日誌、relay日誌文件名;
參考配置參數:
mysql01的特殊配置參數:
mysql01>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql01-bin
server_id = 1
read_only
mysql01>
mysql02的特殊配置參數:
mysql02>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql02-bin
server_id = 2
read_only
mysql02>
mysql03的特殊配置參數:
mysql03>\! grep -E "log_bin|server_id|read_only" my.cnf
log_bin = mysql_bin
server_id = 3
read_only
配置主從:
一、配置mysql01和mysql02互爲主從:
在mysql01和mysql02上建立一樣的複製帳號:
grant replication slave on *.* to 'repl'@'10.0.60.%' identified by 'repl';
查看master狀態:
show master status;
在每一個節點執行CHANGE MASTER TO語句:
mysql01> change master to master_host = '10.0.60.101',
master_user='repl',
master_password='repl',
master_log_file='mysql02-bin.000001',
master_log_pos=545;
mysql02> change master to master_host = '10.0.60.100',
master_user='repl',
master_password='repl',
master_log_file='mysql01-bin.000001',
master_log_pos=545;
在兩個節點開啓slave:
start slave;
查看slave狀態是否正常:
mysql02>show slave status\G;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
二、配置mysql03爲mysql01的從服務器
mysql03> change master to master_host = '10.0.60.100',
master_user='repl',
master_password='repl',
master_log_file='mysql01-bin.000001',
master_log_pos=545;
驗證slave狀態正常後,開始下面的步驟。
2、配置半同步
使用半同步機制,能夠確保至少一臺slave收到master的二進制日誌,在必定程度上保證了數據的一致性,減小了當master當機時,形成數據丟失。
半同步機制由google貢獻,從mysql 5.5開始原生支持該特性。
前提要求:
一、主備master都要安裝並開啓semisync master和slave,因mmm不能進行semisync配置和管理;
二、slave須要安裝並開啓semisync slave;
配置步驟:
一、mysql01和mysql02安裝semisync master和slave插件:
mysql01>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.01 sec)
mysql01>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.00 sec)
mysql02>INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.05 sec)
mysql02>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)
二、mysql01和mysql02開啓semisync master和slave:
mysql01>SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)
mysql01>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)
mysql02>SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)
mysql02>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)
同時將參數寫入到配置文件,以mysql實例開啓時自動開啓半同步:
mysql02>\! cat my.cnf|grep semi
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 1
三、mysql03安裝並開啓semisync slave插件:
mysql03>INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)
mysql03>SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)
同時將參數寫入到配置文件,以mysql實例開啓時自動開啓半同步:
mysql03>\! cat my.cnf|grep semi
rpl_semi_sync_slave_enabled = 1
mysql03>
四、全部mysql實例中止slave並開啓slave,使半同步機制生效:
stop slave;start slave;
五、查看semisync狀態
mysql01>show status like '%emi%';
+--------------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients | 2 |
| Rpl_semi_sync_master_net_avg_wait_time | 0 |
| Rpl_semi_sync_master_net_wait_time | 0 |
| Rpl_semi_sync_master_net_waits | 0 |
| Rpl_semi_sync_master_no_times | 0 |
| Rpl_semi_sync_master_no_tx | 0 |
| Rpl_semi_sync_master_status | ON |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 0 |
| Rpl_semi_sync_master_tx_wait_time | 0 |
| Rpl_semi_sync_master_tx_waits | 0 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 0 |
| Rpl_semi_sync_slave_status | ON |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)
3、配置MMM
Multi Master Replication Manager for Mysql(MMM)是一套開源的perl腳本,對Mysql Master-Master複製環境(在任什麼時候刻只有一個節點可寫)進行監控、故障恢復以及管理。同時能根據複製的延時狀況管理讀負載均衡,經過遷移read虛擬IP地址。同時也能用於數據備份,以及節點之間重同步。
主要由三個腳本組成:
一、mmm_mod:監控daemon程序,進行監控工做,並決定讀、寫角色的遷移;最好運行在專用的監控服務器上,能夠管理多套Master-Slave集羣。
二、mmm_agentd:客戶端daemon程序,運行在全部mysql實例服務器,與監控節點進行簡單的遠程通訊。
三、mmm_control:用於管理mmm_mond進程的命令行腳本。
前提需求:
一、支持環境:
兩個節點的Master-Master環境,MMM須要5個IP地址(每一個節點一個固定IP地址,一個write IP地址,兩個read IP地址,write和read IP依據節點的可用性進行自動的遷移),正常狀況下,active master有一個write IP和一個read IP地址,standby master有一個read IP地址,若是當前active master故障,write和read IP地址將遷移到standby master;
兩個節點的Master-Master,以及一個或多個slave的環境,同時也是大多數企業使用的方案(能夠更好的擴展讀,同時有冗餘的Slave可用於備份等工做,防止阻塞正常的事務)。
二、n+1個主機:n個運行mysql實例的服務器,一個機器用於運行MMM監控daemon程序;
三、2*(n+1) IP地址:每一個主機一個固定IP地址,同時每臺mysql實例一個read IP地址以及一個write IP地址;
四、monitor數據庫用戶:須要REPLICATION CLIENT權限,用於MMM監控(mmm_mond);
五、agent數據庫用戶:須要SUPER、REPLICATION CLIENT、PROCESS權限,用於MMM 客戶端(mmm_agentd),能夠只針對本機IP進行受權;
六、relication數據庫用戶:須要REPLICATION SLAVE權限,用於mysql複製;
七、tools數據庫用戶:須要SUPER、REPLICATION CLIENT、RELOAD權限,用於MMM tools(如mmm_backup、mmm_clone、mmm_restore)
一、在mysql實例服務器安裝依賴包和mmm
安裝依賴包:
yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-ARP
若是在標準軟件倉庫和EPEL軟件倉庫沒有,須要單獨下載,能夠去如下網址下載:
安裝mmm:
tar xvf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install
二、在mysql實例服務器配置mmm agent
mmm_agentd使用mmm_agent.conf配置文件:
# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf #包含這個公用配置文件
#Description: name of this host,能夠不是主機名,每臺mysql實例的host不一樣(如mysql01設置爲db1,mysql02設置爲db2,mysql03設置爲db3)
this db1
#Description: Enable debug mode,設置1,打印日誌到前臺,按ctrl+c將結束進程
debug 0
#Description: Maximum number of retries when killing threads to prevent further
#writes during the removal of the active_master_role.
max_kill_retries 10
公用配置文件:mmm_common.conf,每一個實例同樣,並要拷貝到監控服務器供mmm_mond使用,進行網卡接口的定義,每一個主機的描述,複製和mmm agent的用戶名和密碼配置,以及讀寫規則等
# cat /etc/mysql-mmm/mmm_common.conf
#Description: name of the role for which identifies the active master,定義活動master爲可寫
active_master_role writer
<host default> #默認段
#Description: network interface on which the IPs of the roles should be configured,用於綁定ip的網絡接口
cluster_interface eth0
pid_path /var/run/mmm_agentd.pid
bin_path /usr/lib/mysql-mmm/
#Description: Port on which mmm agentd listens
agent_port 9989
#Description: Port on which mysqld is listening
mysql_port 3306
#Description: mysql user used for replication
replication_user repl
#Description: mysql password used for replication
replication_password repl
#Description: mysql user for MMM Agent
agent_user mmm_agent
#Description: mysql password for MMM Agent
agent_password mmm_agent
</host>
<host db1> #命名段,指定每一個mysql實例主機
#Description: IP of host
ip 10.0.60.100
#Description: Mode of host. Either master or slave.
mode master
#Description: Name of peer host (if mode is master)
peer db2
</host>
<host db2>
ip 10.0.60.101
mode master
peer db1
</host>
<host db3>
ip 10.0.60.102
mode slave
</host>
<role writer> #定義write角色
#Description: Hosts which may take over the role
hosts db1, db2
#Description: One or multiple IPs associated with the role,指定浮動write IP地址
ips 10.0.60.160
#Description: Mode of role. Either balanced or exclusive
mode exclusive
#Description: The preferred host for this role. Only allowed for exclusive roles.
#prefer -
</role>
<role reader> #定義read角色
hosts db1, db2, db3
ips 10.0.60.161,10.0.60.162,10.0.60.163 #浮動read IP地址
mode balanced
</role>
三、啓動mmm agent服務
/etc/init.d/mysql-mmm-agent start
chkconfig --level 2345 mysql-mmm-agent on
四、在監控服務器(mysql04)安裝依賴包和mmm
安裝依賴包:
yum -y install perl iproute perl-Algorithm-Diff perl-DBI perl-Class-Singleton perl-DBD-MySQL perl-Log-Log4perl perl-Log-Dispatch perl-Proc-Daemon perl-MailTools perl-Time-HiRes perl-Mail-Sendmail perl-Mail-Sender perl-Email-Date-Format perl-MIME-Lite perl-Net-Ping
安裝mmm:
tar xvf mysql-mmm-2.2.1.tar.gz
cd mysql-mmm-2.2.1
make install
五、配置mmm 監控配置文件
mmm_mond和mmm_control使用mmm_mon.conf或mmm_mon_CLUSTER.conf配置文件
mmm_mon.conf配置文件參考:
# cat /etc/mysql-mmm/mmm_mon.conf
include mmm_common.conf
#The monitor section is required by mmm_mond and mmm_control
<monitor>
#Description: IP on which mmm_mond listens
ip 127.0.0.1
#Description: Port on which mmm mond listens
port 9988
#Description: Location of pid-file
pid_path /var/run/mmm_mond.pid
#Description: Path to directory containing MMM binaries
bin_path /usr/lib/mysql-mmm/
#Description: Location of of status file
status_path /var/lib/misc/mmm_mond.status
#Description: Break between network checks
ping_interval 1
#Description: IPs used for network checks,指定全部mysql服務器IP,write和read IP地址,用於進行ping檢查
ping_ips 10.0.60.100, 10.0.60.101, 10.0.60.102, 10.0.60.160, 10.0.60.161, 10.0.60.162, 10.0.60.163
#Description: Duration in seconds for flap detection. See flap_count
flap_duration 3600
#Description: Maximum number of downtimes within flap_duration seconds after
#which a host is considered to be flapping.
flap_count 3
#Description: How many seconds to wait before switching node status from
#AWAITING_RECOVERY to ONLINE. 0 = disabled.
auto_set_online 0
#Description: Binary used to kill hosts if roles couldn’t be removed because the agent
#was not reachable. You have to provide a custom binary for this which
#takes the hostname as first argument and the state of check ping (1 -ok; 0 - not ok) as second argument.
kill_host_bin /usr/lib/mysql-mmm/monitor/kill_host
#Description: Startup carefully i.e. switch into passive mode when writer role is
#configured on multiple hosts
careful_startup 0
#Description: Default mode of monitor.
mode active
#Description: How many seconds to wait for other master to become ONLINE before
#switching from mode WAIT to mode ACTIVE. 0 = infinite.
wait_for_other_master 120
</monitor>
<host default>
#Description: mysql user for MMM Monitor
monitor_user mmm_agent
#Description: mysql password for MMM Monitor
monitor_password mmm_agent
</host>
<check mysql> #check段,mmm執行ping、mysql、rep_threads、rep_backlog四種檢查,能夠分別進行檢查間隔等參數配置。
#Description: Perform check every 5 seconds
check_period 5
#Description: Check is considered as failed if it doesn’t succeed for at least
#trap period seconds.
trap_period 10
#Description: Check times out after timeout seconds
timeout 2
#Description: Restart checker process after restart after checks
restart_after 10000
#Description: Maximum backlog for check rep_backlog.
max_backlog 60
</check>
#設置爲1,開啓調試模式,打印日誌到前臺,ctrl+c將結束進程,對於調試有幫助
debug 0
六、開啓mmm monitor監控
/etc/init.d/mysql-mmm-monitor start
chkconfig --level 2345 mysql-mmm-monitor on
七、使用mmm_control查看狀態
# mmm_control show
db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163), writer(10.0.60.160)
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)
注:
當節點第一次開啓,狀態爲等待恢復。
設置節點online:
# mmm_control set_online db1
MMM如何工做:
當故障發生時,mmm迅速的遷移故障節點的IP地址從一個節點到另外一個節點,並使用Net::ARP Perl模塊更新ARP表。
處理過程:
在故障active master節點:
一、mysql 設置爲read_only(set global read_only=1),防止寫事務;
二、中斷活動鏈接;
三、移除寫ip;
在新master節點:
一、運行在passive master的mmm進程被通知即將成爲active write;
二、slave將嘗試從master的二進制日誌抓取任何剩餘事務;
三、關閉read_only(set global read_only=0);
四、綁定write ip,併發生arp通告;
4、測試
一、測試mysql01 mysql實例故障
手動關閉mysql01服務器上的mysql實例,指望master將遷移到mysql02
中止mysql01的mysqld進程:也可使用"killall -15 mysqld"結束mysqld進程
mysql01>\! sh stop.sh
查看mmm_mond的日誌:總共通過10s時間完成遷移
# tail -f /var/log/mysql-mmm/mmm_mond.log
2014/05/27 14:19:13 WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:13 WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:22 ERROR Check 'mysql' on 'db1' has failed for 10 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
2014/05/27 14:19:23 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2014/05/27 14:19:23 INFO Removing all roles from host 'db1':
2014/05/27 14:19:23 INFO Removed role 'reader(10.0.60.163)' from host 'db1'
2014/05/27 14:19:23 INFO Removed role 'writer(10.0.60.160)' from host 'db1'
2014/05/27 14:19:23 INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2' #能夠看到寫IP已經遷移到mysql02
2014/05/27 14:19:23 INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3'
使用mmm_control命令查看狀態:
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/HARD_OFFLINE. Roles: #mysql01狀態已經變爲HARD_OFFLINE,意外着ping錯誤或mysql故障
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)
檢查mysql02的read_only變量是否改變:
mysql02>show global variables like 'read_only'; #默認在passive master時,read_only爲ON
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.00 sec)
mysql02>
檢查mysql03是否已經將mysql02做爲主:
mysql03>show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.60.101 #已經從Mysql02同步
Master_User: repl
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql02-bin.000014
Read_Master_Log_Pos: 120
Relay_Log_File: mysql03-relay-bin.000002
Relay_Log_Pos: 285
Relay_Master_Log_File: mysql02-bin.000014
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
當再次啓動mysql01的mysql實例,db1的狀態將由HARD_OFFLINE改變爲AWAITING_RECOVERY:
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/AWAITING_RECOVERY. Roles:
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)
須要手動設置爲online,mmm纔會分配read ip給mysql01,並與mysql02同步:
[root@mysql04 ~]# mmm_control set_online db1
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!
mysql01>show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.60.101 #mysql01已經從Mysql02同步
Master_User: repl
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql02-bin.000015
Read_Master_Log_Pos: 120
Relay_Log_File: mysql01-relay.000027
Relay_Log_Pos: 285
Relay_Master_Log_File: mysql02-bin.000015
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
查看mysql02的semisync狀態:
mysql02>show status like 'Rpl_semi%';
+--------------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients | 2 |
| Rpl_semi_sync_master_net_avg_wait_time | 1053 |
| Rpl_semi_sync_master_net_wait_time | 2106 |
| Rpl_semi_sync_master_net_waits | 2 |
| Rpl_semi_sync_master_no_times | 0 |
| Rpl_semi_sync_master_no_tx | 0 |
| Rpl_semi_sync_master_status | ON |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 1015 |
| Rpl_semi_sync_master_tx_wait_time | 1015 |
| Rpl_semi_sync_master_tx_waits | 1 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 1 |
| Rpl_semi_sync_slave_status | ON |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)
二、模擬mysql02(Active Master服務器) kernel panic,指望進行遷移
執行上面的測試後,當前active master爲mysql02,使用下面命令模擬kernel panic:
mysql02>\! echo "c" > /proc/sysrq-trigger
查看mmm_mond日誌:總共通過了20s的時間完成遷移
# tail -f /var/log/mysql-mmm/mmm_mond.log
2014/05/27 14:44:42 WARN Check 'rep_threads' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)
2014/05/27 14:44:42 WARN Check 'rep_backlog' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4)
2014/05/27 14:44:46 FATAL Can't reach agent on host 'db2'
2014/05/27 14:44:49 ERROR Check 'ping' on 'db2' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.101 #ping檢查錯誤
2014/05/27 14:44:55 ERROR Check 'mysql' on 'db2' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.101:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.101' (4) #mysql檢查錯誤,不能鏈接
2014/05/27 14:44:59 INFO Check 'ping' on 'db2' is ok!
2014/05/27 14:45:02 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) #改變mysql02的狀態爲HARD_OFFLINE
2014/05/27 14:45:02 INFO Removing all roles from host 'db2': #移除mysql02的角色
2014/05/27 14:45:02 INFO Removed role 'reader(10.0.60.161)' from host 'db2'
2014/05/27 14:45:02 INFO Removed role 'writer(10.0.60.160)' from host 'db2'
2014/05/27 14:45:02 FATAL Agent on host 'db2' is reachable again
2014/05/27 14:45:02 INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db1' #分配角色到其餘機器,write IP分配到mysql01,永遠不會分配到mysql03
2014/05/27 14:45:02 INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'
使用mmm_control查看狀態:
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
db2(10.0.60.101) master/HARD_OFFLINE. Roles:
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)
三、模擬active master服務器網絡不通,指望進行遷移,可是網絡恢復後,將不會重啓slave;
當前active master爲mysql01,在mysql01上禁用網卡:
mysql01>\! cat down_net.sh
ifdown eth0
sleep 600
ifup eth0
mysql01>\! sh down_net.sh
查看mmm_mond日誌:總共通過了9s完成遷移
2014/05/27 15:02:35 WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)
2014/05/27 15:02:35 WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)
2014/05/27 15:02:41 FATAL Can't reach agent on host 'db1'
2014/05/27 15:02:41 ERROR Check 'ping' on 'db1' has failed for 11 seconds! Message: ERROR: Could not ping 10.0.60.100 #ping檢查錯誤
2014/05/27 15:02:44 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: not OK, mysql: OK) #改變狀態
2014/05/27 15:02:44 INFO Removing all roles from host 'db1': #移除角色
2014/05/27 15:02:44 INFO Removed role 'reader(10.0.60.163)' from host 'db1'
2014/05/27 15:02:44 INFO Removed role 'writer(10.0.60.160)' from host 'db1'
2014/05/27 15:02:44 ERROR Can't send offline status notification to 'db1' - killing it!
2014/05/27 15:02:44 FATAL Could not kill host 'db1' - there may be some duplicate ips now! (There's no binary configured for killing hosts.)
2014/05/27 15:02:44 INFO Orphaned role 'writer(10.0.60.160)' has been assigned to 'db2'
2014/05/27 15:02:44 INFO Orphaned role 'reader(10.0.60.163)' has been assigned to 'db3'
2014/05/27 15:02:48 ERROR Check 'mysql' on 'db1' has failed for 14 seconds! Message: ERROR: Connect error (host = 10.0.60.100:3306, user = mmm_agent)! Can't connect to MySQL server on '10.0.60.100' (4)
使用mmm_control查看狀態:
[root@mysql04 ~]# mmm_control show
# Warning: agent on host db1 is not reachable
db1(10.0.60.100) master/HARD_OFFLINE. Roles:
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)
當網絡恢復後,mmm會修改mysql01的slave配置,修改主爲mysql02,可是沒有重啓slave,形成不能進行數據同步,須要手工從新開啓slave。
使用mmm_control檢查狀態:
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/AWAITING_RECOVERY. Roles:
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162), reader(10.0.60.163)
[root@mysql04 ~]# mmm_control set_online db1 #網絡恢復後,手動設置爲online
OK: State of 'db1' changed to ONLINE. Now you can wait some time and check its new roles!
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.161), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.162)
檢查mysql01的slave狀態:看上去正常的
mysql01>show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.60.101
Master_User: repl
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql02-bin.000015
Read_Master_Log_Pos: 328
Relay_Log_File: mysql01-relay.000027
Relay_Log_Pos: 493
Relay_Master_Log_File: mysql02-bin.000015
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
可是在mysql02(當前active master)插入數據,mysql01不能從mysql02同步:
mysql02> insert into t1 values(2);
Query OK, 1 row affected (0.02 sec)
mysql02>select * from t1;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set (0.00 sec)
mysql03已經同步了數據:
mysql03>select * from t1;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set (0.00 sec)
而mysql01沒有同步數據:
mysql01>select * from t1;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
解決方法:先中止slave,而後啓動slave;
mysql01>stop slave;start slave;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql01>select * from t1;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set (0.00 sec)
截圖:
四、模擬slave線程故障,不論是io或sql線程故障,指望進行遷移,恢復時若是在flap_duration時間內超過了flap_count次數的故障,將不會自動恢復,狀態由REPLICATION_FAIL改成 AWAITING_RECOVERY (because it's flapping)
當前active master爲mysql02。
中止active master(mysql02)的slave,不會形成遷移:
mmm_mond的日誌:已經檢測到db2(mysql02)複製線程錯誤
2014/05/27 15:39:02 ERROR Check 'rep_threads' on 'db2' has failed for 10 seconds! Message: ERROR: Replication is broken
使用mmm_control查看狀態:
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.161)
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.163)
若是當其餘slave(mysql0一、mysql03)的線程(不論是io仍是sql線程)故障將會發生遷移:
手工中止io線程:
mysql01>stop slave io_thread;
Query OK, 0 rows affected (0.01 sec)
查看mmm_mond日誌:
2014/05/27 15:43:28 ERROR Check 'rep_threads' on 'db1' has failed for 10 seconds! Message: ERROR: Replication is broken
2014/05/27 15:43:31 FATAL State of host 'db1' changed from ONLINE to REPLICATION_FAIL
2014/05/27 15:43:31 INFO Removing all roles from host 'db1':
2014/05/27 15:43:31 INFO Removed role 'reader(10.0.60.161)' from host 'db1' #移除角色
2014/05/27 15:43:31 INFO Orphaned role 'reader(10.0.60.161)' has been assigned to 'db3'
使用mmm_control查看狀態:
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/REPLICATION_FAIL. Roles:
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161), reader(10.0.60.163)
當從新開啓io線程後,mmm將自動恢復db1,並從新遷移read IP到db1(mysql01)上,若是故障超過:
從新開啓線程:
mysql01>start slave io_thread;
Query OK, 0 rows affected (0.00 sec)
查看mmm_mond日誌:
2014/05/27 15:45:23 INFO Check 'rep_threads' on 'db1' is ok!
2014/05/27 15:45:25 FATAL State of host 'db1' changed from REPLICATION_FAIL to ONLINE
2014/05/27 15:45:25 INFO Moving role 'reader(10.0.60.163)' from host 'db3' to host 'db1'
使用mmm_control查看狀態:
[root@mysql04 ~]# mmm_control show
db1(10.0.60.100) master/ONLINE. Roles: reader(10.0.60.163)
db2(10.0.60.101) master/ONLINE. Roles: reader(10.0.60.162), writer(10.0.60.160)
db3(10.0.60.102) slave/ONLINE. Roles: reader(10.0.60.161)
五、複製延時
延時檢查有max_backlog控制,默認爲60;
複製延時或錯誤,若是故障時間少於60s,狀態爲ONLINE,單會遷移,故障恢復後,mmm自動恢復read IP。若是rep_backlog和rel_threads同時錯誤,狀態將爲REPLICATION_FAIL。
六、mmm agent或monitor故障
不會遷移角色,若是此時有master或slave故障,也將不能遷移角色
參考:
MMM官網:http://mysql-mmm.org/
MMM博客:http://blog.mysql-mmm.org/