MySQL MGR 故障切換及節點管理

MySQL MGR 故障切換及節點管理

MySQL Group Replication(MGR)框架讓MySQL具有了自動主從切換和故障恢復能力。本文測試,單主模式下,master發生了crash後,DB系統恢復狀況html

1. 測試環境

1.1 環境規劃以下

角色 IP port server-id
DB-1 192.110.103.41 3106 103413106
DB-2 192.110.103.42 3106 103423106
DB-3 192.110.103.43 3106 103423106

說明:機器上port 3306已在使用被佔了,只好改用3106。mysql

1.2 安裝MGR

見上節的MGR安裝步驟。MySQL5.7 MGR安裝及介紹算法

2. MGR技術介紹

MGR以組視圖(Group View,簡稱視圖)爲基礎來進行成員管理。視圖指Group在一段時間內的成員狀態,若是在這段時間內沒有成員變化,也就是說沒有成員加入或退出,則這段連續的時間爲一個視圖,若是發生了成員加入或退出變化,則視圖也就發生了變化,MGR使用視圖ID(View ID)來跟蹤視圖的變化並區分視圖的前後時間。sql

2.1 單主/多主模式

MySQL的組複製能夠配置爲單主模型和多主模型兩種工做模式。 如下是兩種工做模式的特性簡介:bootstrap

  • 單主模型:從複製組中衆多個MySQL節點中自動選舉一個master節點,只有master節點能夠寫,其餘節點自動設置爲read only。當master節點故障時,會自動選舉一個新的master節點,選舉成功後,它將設置爲可寫,其餘slave將指向這個新的master。
  • 多主模型:複製組中的任何一個節點均可以寫,所以沒有master和slave的概念,只要忽然故障的節點數量不太多,這個多主模型就能繼續可用。

MySQL組複製使用Paxos分佈式算法來提供節點間的分佈式協調。正因如此,它要求組中大多數節點在線才能達到法定票數,從而對一個決策作出一致的決定。安全

大多數指的是N/2+1(N是組中目前節點總數),例如目前組中有5個節點,則須要3個節點才能達到大多數的要求。因此,容許出現故障的節點數量以下圖:bash

組大小 大多數數量 故障容忍數量
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2

2.2 配置說明

[mysqld]
datadir=/data
socket=/data/mysql.sock

server-id=100                      # 必須
gtid_mode=on                       # 必須
enforce_gtid_consistency=on        # 必須
log-bin=/data/master-bin           # 必須
binlog_format=row                  # 必須
binlog_checksum=none               # 必須
master_info_repository=TABLE       # 必須
relay_log_info_repository=TABLE    # 必須
relay_log=/data/relay-log          # 必須,若是不給,將採用默認值
log_slave_updates=ON               # 必須
sync-binlog=1                      # 建議
log-error=/data/error.log
pid-file=/data/mysqld.pid

transaction_write_set_extraction=XXHASH64         # 必須
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"  # 必須
loose-group_replication_start_on_boot=off        # 建議設置爲OFF
loose-group_replication_member_weigth = 40   # 非必需,mysql 5.7.20纔開始支持該選項
loose-group_replication_local_address = "192.110.103.41:31061"  #必須,下一行也必須
loose-group_replication_group_seeds = "192.110.103.41:31061,192.110.103.42:31061,192.110.103.43:31061"
loose-group_replication_bootstrap_group = OFF
loose-group_replication_single_primary_mode = FALSE # = multi-primary
loose-group_replication_enforce_update_everywhere_checks=ON # = multi-primary
複製代碼

分析一下上面的配置選項:併發

  • 1).由於組複製基於GTID,因此必須開啓gtid_mode和enforce_gtid_consistency。
  • 2).組複製必須開啓二進制日誌,且必須設置爲行格式的二進制日誌,這樣才能從日誌記錄中收集信息且保證數據一致性。因此設置log_bin和binlog_format。
  • 3).因爲MySQL對複製事件校驗的設計缺陷,組複製不能對他們校驗,因此設置binlog_checksum=none。
  • 4).組複製要將master和relay log的元數據寫入到mysql.slave_master_info和mysql.slave_relay_log_info中。
  • 5).組中的每一個節點都保留了完整的數據副本,它是share-nothing的模式。因此全部節點上都必須開啓log_slave_updates,這樣新節點隨便選哪一個做爲donor均可以進行異步複製。
  • 6).sync_binlog是爲了保證每次事務提交都馬上將binlog刷盤,保證出現故障也不丟失日誌。
  • 7).最後的6行是組複製插件的配置。以loose_開頭表示即便啓動組複製插件,MySQL也繼續正常容許下去。這個前綴是可選的。
  • 8).倒數第6行表示寫集合以XXHASH64的算法進行hash。所謂寫集,是對事務中所修改的行進行的惟一標識,在後續檢測併發事務之間是否修改同一行衝突時使用。它基於主鍵生成,因此使用組複製,表中必需要有主鍵。
  • 9).倒數第5行表示這個複製組的名稱。它必須是一個有效的UUID值。嫌能夠直接和上面同樣全寫字母a。在Linux下,可使用uuidgen工具來生成UUID值。

3. 組複製管理

操做組複製的語句。app

SELECT * FROM performance_schema.replication_group_members;
#查看master
SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='group_replication_primary_member';
>SHOW STATUS LIKE 'group_replication_primary_member';

start group_replication;
stop group_replication;
複製代碼

當要中止組中的某個成員中的組複製功能時,須要在那個節點上執行stop group_replication語句。但必定要注意,在執行這個語句以前,必需要保證這個節點不會向外提供MySQL服務,不然有可能會有新數據寫入(例如主節點中止時),或者讀取到過時數據。 故要安全地重啓整個組,最佳方法是先中止全部非主節點的MySQL實例(不只是中止組複製功能),而後中止主節點的MySQL實例,再先重啓主節點,在這個節點上引導組,並啓動它的組複製功能。最後再將各slave節點加入組。框架

組複製中,有兩種節點離組的狀況:自願離組、非自願離組。

自願離組:

執行stop group_replication;語句。

  • 1).執行該語句表示該節點自願離組,它會觸發視圖自動配置,並將該視圖更改操做複製到組內全部節點,直到大多數節點都贊成新的視圖配置,該節點纔會離組。
  • 2).節點自願離組時,不會丟失法定票數。因此不管多少節點自願離組,都不會出現"達不到大多數"的要求而阻塞組。
  • 3).舉個例子,5個節點的組,自願退出一個節點A後,這個組的大小爲4。這個組認爲節點A歷來都沒有出現過。

非自願離組:

除了上面自願離組的狀況,全部離組的狀況都是非自願離組。好比節點宕機,斷網等等。

1).節點非自願離組時,故障探測機制會檢測到這個問題,因而向組中報告這個問題。而後會觸發組視圖成員自動配置,須要大多數節點贊成新視圖。

2).非自願離組時,組的大小不會改變,不管多少個節點的組,節點非自願退出後,組大小仍是5,只不過這些離組的節點被標記爲非ONLINE。

3).非自願離組時,會丟失法定票數。因此,當非自願離組節點數量過多時,致使組中剩餘節點數量達不到大多數的要求,組就會被阻塞。

4).舉個例子,5節點的組,非自願退出1個節點A後,這個組的大小仍是5,可是節點A在新的視圖中被標記爲unreachable或其餘狀態。當繼續非自願退出2個節點後,組中只剩下2個ONLINE節點,這時達不到大多數的要求,組就會被阻塞。

4. 故障切換測試

4.1 當前狀態

當前狀態爲單主模式,狀態以下:

select * from performance_schema.global_variables  where VARIABLE_NAME 
 in ('group_replication_single_primary_mode','group_replication_enforce_update_everywhere_checks');
+----------------------------------------------------+----------------+
| VARIABLE_NAME                                      | VARIABLE_VALUE |
+----------------------------------------------------+----------------+
| group_replication_enforce_update_everywhere_checks | OFF            |
| group_replication_single_primary_mode              | ON             |
+----------------------------------------------------+----------------+
2 rows in set (0.00 sec)
 select * from performance_schema.global_variables  where VARIABLE_NAME like '%read_only';
+-----------------------+----------------+
| VARIABLE_NAME         | VARIABLE_VALUE |
+-----------------------+----------------+
| innodb_read_only      | OFF            |
| read_only             | OFF            |
| super_read_only       | OFF            |
+-----------------------+----------------+
 > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
複製代碼

當前master: 只有單主模型的組複製才須要查找主節點,多主模型沒有master/slave的概念,因此無需查找。

SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME='group_replication_primary_member';
+----------------------------------+--------------------------------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 |
+----------------------------------+--------------------------------------+
#查看Master
 SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_primary_member | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)
複製代碼

4.2 主庫故障

1)正常關停master

用kill命令關停master.

#Master 機器192.110.103.41上kill mysql 。
$ps aux|grep 3106
mysql    122456  0.0  0.0 106252  1444 ?        S    Oct23   0:00 /bin/sh /usr/local/mysql-5.7.23/bin/mysqld_safe --defaults-file=/data1/mysql_3106/etc/my.cnf
mysql    123471  0.1  0.8 11575792 1069584 ?    Sl   Oct23   1:45 /usr/local/mysql-5.7.23/bin/mysqld --defaults-file=/data1/mysql_3106/etc/my.cnf --basedir=/usr/local/mysql-5.7.23 --datadir=/data1/mysql_3106/data --plugin-dir=/usr/local/mysql-5.7.23/lib/plugin --log-error=/data1/mysql_3106/logs/mysqld.err --open-files-limit=8192 --pid-file=/data1/mysql_3106/tmp/mysql.pid --socket=/data1/mysql_3106/tmp/mysql.sock --port=3106
$kill 122456 123471 ; tail -f /data1/mysql_3106/logs/mysqld.err 
複製代碼

日誌:

原DB Master日誌

$tail -f /data1/mysql_3106/logs/mysqld.err 
2019-10-24T03:10:32.746843Z 0 [Warning] /usr/local/mysql-5.7.23/bin/mysqld: Forcing close of thread 31  user: 'root'

2019-10-24T03:10:32.746873Z 0 [Note] Plugin group_replication reported: 'Plugin 'group_replication' is stopping.'
2019-10-24T03:10:32.746901Z 0 [Note] Plugin group_replication reported: 'Going to wait for view modification'
2019-10-24T03:10:35.797258Z 0 [Note] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
2019-10-24T03:10:40.799923Z 0 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
2019-10-24T03:10:40.799954Z 0 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
2019-10-24T03:10:40.800110Z 7 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2019-10-24T03:10:40.800431Z 7 [Note] Slave SQL thread for channel 'group_replication_applier' exiting, replication stopped in log 'FIRST' at position 65
2019-10-24T03:10:40.800652Z 4 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2019-10-24T03:10:40.800787Z 0 [Note] Plugin group_replication reported: 'Plugin 'group_replication' has been stopped.'
2019-10-24T03:10:40.800799Z 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-10-24T03:10:40.801278Z 0 [Note] Binlog end
2019-10-24T03:10:40.802272Z 0 [Note] Shutting down plugin 'group_replication'
2019-10-24T03:10:40.802322Z 0 [Note] Plugin group_replication reported: 'All Group Replication server observers have been successfully unregistered'
...
2019-10-24T03:10:42.804477Z 0 [Note] Shutting down plugin 'binlog'
2019-10-24T03:10:42.805238Z 0 [Note] /usr/local/mysql-5.7.23/bin/mysqld: Shutdown complete

2019-10-24T03:10:42.814933Z mysqld_safe mysqld from pid file /data1/mysql_3106/tmp/mysql.pid ended
複製代碼

新DB Master日誌:

$tail -n 30 /data1/mysql_3106/logs/mysqld.err
2019-10-23T11:11:00.671705Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2019-10-23T11:11:00.671736Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 31061'
2019-10-23T11:11:05.400823Z 2 [Note] Plugin group_replication reported: 'This server is working as secondary member with primary member address 192.110.103.41:3106.'
2019-10-23T11:11:05.401138Z 20 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2019-10-23T11:11:05.401143Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.41:3106, 192.110.103.42:3106 on view 15718289704352993:2.'
2019-10-23T11:11:05.402757Z 20 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.110.103.41', master_port= 3106, master_log_file='', master_log_pos= 4, master_bind=''.
2019-10-23T11:11:05.404717Z 20 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 509810ee-f3d7-11e9-a7d5-a0369fac2de4 at 192.110.103.41 port: 3106.'
2019-10-23T11:11:05.404998Z 22 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2019-10-23T11:11:05.406423Z 22 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@192.110.103.41:3106',replication started in log 'FIRST' at position 4
2019-10-23T11:11:05.442349Z 23 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './mysql-relay-bin-group_replication_recovery.000001' position: 4
2019-10-23T11:11:05.461483Z 20 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2019-10-23T11:11:05.461910Z 23 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'mysql-bin.000002' at position 934
2019-10-23T11:11:05.462119Z 22 [Note] Slave I/O thread killed while reading event for channel 'group_replication_recovery'
2019-10-23T11:11:05.462143Z 22 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'mysql-bin.000002', position 934
2019-10-23T11:11:05.523357Z 20 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='192.110.103.41', master_port= 3106, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2019-10-23T11:11:05.526137Z 0 [Note] Plugin group_replication reported: 'This server was declared online within the replication group'
2019-10-23T11:15:33.426684Z 0 [Note] Plugin group_replication reported: 'Members joined the group: 192.110.103.43:3106'
2019-10-23T11:15:33.426832Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.41:3106, 192.110.103.42:3106, 192.110.103.43:3106 on view 15718289704352993:3.'
2019-10-23T11:15:34.094942Z 0 [Note] Plugin group_replication reported: 'The member with address 192.110.103.43:3106 was declared online within the replication group'
2019-10-24T03:10:32.839967Z 0 [Warning] Plugin group_replication reported: 'Members removed from the group: 192.110.103.41:3106'
2019-10-24T03:10:32.839985Z 0 [Note] Plugin group_replication reported: 'Primary server with address 192.110.103.41:3106 left the group. Electing new Primary.'
2019-10-24T03:10:32.840052Z 0 [Note] Plugin group_replication reported: 'A new primary with address 192.110.103.42:3106 was elected, enabling conflict detection until the new primary applies all relay logs.'
2019-10-24T03:10:32.840086Z 41 [Note] Plugin group_replication reported: 'This server is working as primary member.'
2019-10-24T03:10:32.840107Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.42:3106, 192.110.103.43:3106 on view 15718289704352993:4.'
2019-10-24T03:12:01.677869Z 4 [Note] Plugin group_replication reported: 'Primary had applied all relay logs, disabled conflict detection'
複製代碼

DB自動切換:

能夠看到DB已順利完成自動切換。

root@192.110.103.42 : (none) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
2 rows in set (0.00 sec)

root@192.110.103.42 : (none) > SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_primary_member | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)
複製代碼

原master恢復,從新加入機羣

原master恢復,START GROUP_REPLICATION;

> select @@group_replication_bootstrap_group;
+-------------------------------------+
| @@group_replication_bootstrap_group |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
> START GROUP_REPLICATION;
> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
複製代碼

2)異常關停master

異常關停kill -9 master節點,192.110.103.42。從日誌中能夠看到自動選主,並主變成爲192.110.103.41。

#kill -9 master節點,192.110.103.42上執行.
$kill -9 mysql_pid

#192.110.103.41機器上日誌
tail -f /data1/mysql_3106/logs/mysqld.err     
2019-10-24T06:17:31.473849Z 0 [Warning] Plugin group_replication reported: 'Member with address 192.110.103.42:3106 has become unreachable.'
2019-10-24T06:17:32.479299Z 0 [Warning] Plugin group_replication reported: 'Members removed from the group: 192.110.103.42:3106'
2019-10-24T06:17:32.479323Z 0 [Note] Plugin group_replication reported: 'Primary server with address 192.110.103.42:3106 left the group. Electing new Primary.'
2019-10-24T06:17:32.479395Z 0 [Note] Plugin group_replication reported: 'A new primary with address 192.110.103.41:3106 was elected, enabling conflict detection until the new primary applies all relay logs.'
2019-10-24T06:17:32.479439Z 37 [Note] Plugin group_replication reported: 'This server is working as primary member.'
2019-10-24T06:17:32.479465Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.41:3106, 192.110.103.43:3106 on view 15718289704352993:6.'
複製代碼
root@192.110.103.41 : (none) >  select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
2 rows in set (0.00 sec)

root@192.110.103.41 : (none) > SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_primary_member | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)

複製代碼

5. MGR節點管理(增/刪節點)

5.1 組中節點刪除

組中節點刪除較簡單,執行stop group_replication語句,便可。

stop group_replication;
select * from performance_schema.replication_group_members;
複製代碼

5.2 節點加入

1)原MGR節點重啓後,恢復加入 如果之前的MGR某節點關停了,而後服務恢復了。一般只要**start group_replication;**命令以下:。

set global group_replication_enforce_update_everywhere_checks=OFF;
set global group_replication_single_primary_mode=ON;
START GROUP_REPLICATION;
select * from performance_schema.replication_group_members;
複製代碼

2)全新節點加入 如果新增一個節點或恢復的Node 故障時間太長,master log已purge了,則不能直接START GROUP_REPLICATION; 恢復。須要本身手動在MGR集羣中備份,MGR集羣不存在SST和IST概念,而是經過GTID和binlog來實現「同步,追數據」的一個操做。

示例: 192.110.103.42:3106 故障時間太長,MGR中master log已purge了,致使狀態一直爲RECOVERING或ERROR,沒法加入MGR。 說明: MGR宕機節點會詢問存活集羣,是否能補全binlog?若是能補齊,那麼就會正常傳輸,進行追數據 ;若是宕機節點須要的日誌不存在了,則該節點沒法正常加入到集羣環境中。

root@192.110.103.42 : (none) > show variables like 'group_replication_group_seeds';
+-------------------------------+-------------------------------------------------------------+
| Variable_name                 | Value                                                       |
+-------------------------------+-------------------------------------------------------------+
| group_replication_group_seeds | 192.110.103.41:31061,192.110.103.42:31061,192.110.103.43:31061 |
+-------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

root@192.110.103.42 : (none) > start group_replication;
Query OK, 0 rows affected (3.35 sec)

root@192.110.103.42 : (none) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | RECOVERING   |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
3 rows in set (0.00 sec)

root@192.110.103.42 : (none) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ERROR        |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)
複製代碼

錯誤日誌:

2019-10-24T08:15:11.015032Z 198 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but t
he master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2019-10-24T08:15:11.015060Z 198 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CH ANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2019-10-24T08:15:11.015076Z 198 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2019-10-24T08:15:11.015131Z 196 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2019-10-24T08:15:11.015164Z 199 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2019-10-24T08:15:11.015530Z 199 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2019-10-24T08:15:11.017239Z 196 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='192.110.103.41', master_port= 3106, master_log_file='', m
aster_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2019-10-24T08:15:11.019165Z 196 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
複製代碼

此種狀況下,從新制作從庫,並處理以下: 1)製做備份

/usr/local/mysql/bin/mysqldump -h 127.0.0.1 -P3106 --all-databases --default-character-set=utf8 -R -q --triggers --master-data=2 --single-transaction > mysql3106_online.sql ;
複製代碼

2)關閉同步進程、只讀、並清理本地GTID信息。

show variables like 'group_replication_group_seeds';
STOP GROUP_REPLICATION;
set global super_read_only=0;
show master logs;
reset master;
show master logs;
複製代碼

3)導入備份數據:

mysql -h 127.0.0.1 -P3106  --default-character-set=utf8 < mysql3106_online.sql
複製代碼

4)開啓恢復MGR:

set global group_replication_enforce_update_everywhere_checks=OFF;
set global group_replication_single_primary_mode=ON;
START GROUP_REPLICATION;
複製代碼

5)結果:

由於是新制做的從庫,能正常同步追上數據。故可用START GROUP_REPLICATION;成功加入到MGR中。

>START GROUP_REPLICATION;
> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
複製代碼

參考:

相關文章
相關標籤/搜索