本文將簡單介紹基於5.6 GTID的主從複製原理的搭建。並經過幾個實驗介紹相關故障的處理方式html
http://cenalulu.github.io/node
http://cenalulu.github.io/mysql/mysql-5-6-gtid-basic/mysql
GTID(Global Transaction ID)是對於一個已提交事務的編號,而且是一個全局惟一的編號。
GTID其實是由UUID+TID組成的。其中UUID是一個MySQL實例的惟一標識。TID表明了該實例上已經提交的事務數量,而且隨着事務提交單調遞增。下面是一個GTID的具體形式git
3E11FA47-71CA-11E1-9E33-C80AA9429562:23
更詳細的介紹能夠參見:官方文檔github
那麼GTID功能的目的是什麼呢?具體概括主要有如下兩點:sql
這裏詳細解釋下第二點。咱們能夠看下在MySQL 5.6的GTID出現之前replication failover的操做過程。假設咱們有一個以下圖的環境
此時,Server A
的服務器宕機,須要將業務切換到Server B
上。同時,咱們又須要將Server C
的複製源改爲Server B
。複製源修改的命令語法很簡單即CHANGE MASTER TO MASTER_HOST='xxx', MASTER_LOG_FILE='xxx', MASTER_LOG_POS=nnnn
。而難點在於,因爲同一個事務在每臺機器上所在的binlog名字和位置都不同,那麼怎麼找到Server C
當前同步中止點,對應Server B
的master_log_file
和master_log_pos
是什麼的時候就成爲了難題。這也就是爲何M-S複製集羣須要使用MMM
,MHA
這樣的額外管理工具的一個重要緣由。
這個問題在5.6的GTID出現後,就顯得很是的簡單。因爲同一事務的GTID在全部節點上的值一致,那麼根據Server C
當前中止點的GTID就能惟必定位到Server B
上的GTID。甚至因爲MASTER_AUTO_POSITION
功能的出現,咱們都不須要知道GTID的具體值,直接使用CHANGE MASTER TO MASTER_HOST='xxx', MASTER_AUTO_POSITION
命令就能夠直接完成failover的工做。 So easy不是麼?安全
搭建使用了mysql_sandbox
腳本爲基礎,先建立了一個一主三從的基於位置複製的環境。而後經過配置修改,將整個架構專爲基於GTID的複製。
根據MySQL官方文檔給出的GTID搭建建議。須要一次對主從節點作配置修改,並重啓服務。這樣的操做,顯然在production環境進行升級時是不可接受的。Facebook
,Booking.com
,Percona
都對此經過patch作了優化,作到了更優雅的升級。具體的操做方式會在之後的博文當中介紹到。這裏咱們就按照官方文檔,進行一次實驗性的升級。
主要的升級步驟會有如下幾步:bash
my.cnf
,並重啓服務my.cnf
,並重啓服務change master to
並帶上master_auto_position=1
啓用基於GTID的複製因爲是實驗環境,read_only和服務重啓並沒有大礙。只要按照官方的GTID搭建建議作就能順利完成升級,這裏就不贅述詳細過程了。下面列舉了一些在升級過程當中容易遇到的錯誤。服務器
gtid_mode=ON
,log_slave_updates
,enforce_gtid_consistency
這三個參數必定要同時在my.cnf
中配置。不然在mysql.err
中會出現以下的報錯架構
2015-02-26 17:11:08 32147 [ERROR] --gtid-mode=ON or UPGRADE_STEP_1 or UPGRADE_STEP_2 requires --log-bin and --log-slave-updates
2015-02-26 17:13:53 32570 [ERROR] --gtid-mode=ON or UPGRADE_STEP_1 requires --enforce-gtid-consistency
在按照文檔的操做change master to
後,會發現有兩個warnings。實際上是兩個安全性警告,不影響正常的同步(有興趣的讀者能夠看下關於該warning的具體介紹。warning的具體內容以下:
slave1 [localhost] {msandbox} ((none)) > stop slave; Query OK, 0 rows affected (0.03 sec) slave1 [localhost] {msandbox} ((none)) > change master to master_host='127.0.0.1',master_port =21288,master_user='rsandbox',master_password='rsandbox',master_auto_position=1; Query OK, 0 rows affected, 2 warnings (0.04 sec) slave1 [localhost] {msandbox} ((none)) > show warnings; +-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1759 | Sending passwords in plain text without SSL/TLS is extremely insecure. | | Note | 1760 | Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. | +-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)
根據show global variables like '%gtid%'
的命令結果咱們能夠看到,和GTID相關的變量中有一個gtid_purged
。從字面意思以及 官方文檔能夠知道該變量中記錄的是本機上已經執行過,可是已經被purge binary logs to
命令清理的gtid_set
。
本節中咱們就要試驗下,若是master上把某些slave尚未fetch到的gtid event purge後會有什麼樣的結果。
如下指令在master上執行
master [localhost] {msandbox} (test) > show global variables like '%gtid%'; +---------------------------------+----------------------------------------+ | Variable_name | Value | +---------------------------------+----------------------------------------+ | binlog_gtid_simple_recovery | OFF | | enforce_gtid_consistency | ON | | gtid_executed | 24024e52-bd95-11e4-9c6d-926853670d0b:1 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | | simplified_binlog_gtid_recovery | OFF | +---------------------------------+----------------------------------------+ 7 rows in set (0.01 sec) master [localhost] {msandbox} (test) > flush logs;create table gtid_test2 (ID int) engine=innodb; Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.02 sec) master [localhost] {msandbox} (test) > flush logs;create table gtid_test3 (ID int) engine=innodb; Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.04 sec) master [localhost] {msandbox} (test) > show master status; +------------------+----------+--------------+------------------+------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+------------------------------------------+ | mysql-bin.000005 | 359 | | | 24024e52-bd95-11e4-9c6d-926853670d0b:1-3 | +------------------+----------+--------------+------------------+------------------------------------------+ 1 row in set (0.00 sec) master [localhost] {msandbox} (test) > purge binary logs to 'mysql-bin.000004'; Query OK, 0 rows affected (0.03 sec) master [localhost] {msandbox} (test) > show global variables like '%gtid%'; +---------------------------------+------------------------------------------+ | Variable_name | Value | +---------------------------------+------------------------------------------+ | binlog_gtid_simple_recovery | OFF | | enforce_gtid_consistency | ON | | gtid_executed | 24024e52-bd95-11e4-9c6d-926853670d0b:1-3 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | 24024e52-bd95-11e4-9c6d-926853670d0b:1 | | simplified_binlog_gtid_recovery | OFF | +---------------------------------+------------------------------------------+ 7 rows in set (0.00 sec)
在slave2上從新作一次主從,如下命令在slave2上執行
slave2 [localhost] {msandbox} ((none)) > change master to master_host='127.0.0.1',master_port =21288,master_user='rsandbox',master_password='rsandbox',master_auto_position=1; Query OK, 0 rows affected, 2 warnings (0.04 sec) slave2 [localhost] {msandbox} ((none)) > start slave; Query OK, 0 rows affected (0.01 sec) slave2 [localhost] {msandbox} ((none)) > show slave status\G *************************** 1. row *************************** ...... Slave_IO_Running: No Slave_SQL_Running: Yes ...... Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 0 Relay_Log_Space: 151 ...... Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' Last_SQL_Errno: 0 Last_SQL_Error: ...... Auto_Position: 1 1 row in set (0.00 sec)
那麼實際生產應用當中,偶爾會遇到這樣的狀況:某個slave從備份恢復後(或者load data infile)後,DBA能夠人爲保證該slave數據和master一致;或者即便不一致,這些差別也不會致使從此的主從異常(例如:全部master上只有insert沒有update)。這樣的前提下,咱們又想使slave經過replication從master進行數據複製。此時咱們就須要跳過master已經被purge的部分,那麼實際該如何操做呢?
咱們仍是以實驗一的狀況爲例:
先確認master上已經purge的部分。從下面的命令結果能夠知道master上已經缺失24024e52-bd95-11e4-9c6d-926853670d0b:1
這一條事務的相關日誌
master [localhost] {msandbox} (test) > show global variables like '%gtid%'; +---------------------------------+------------------------------------------+ | Variable_name | Value | +---------------------------------+------------------------------------------+ | binlog_gtid_simple_recovery | OFF | | enforce_gtid_consistency | ON | | gtid_executed | 24024e52-bd95-11e4-9c6d-926853670d0b:1-3 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | 24024e52-bd95-11e4-9c6d-926853670d0b:1 | | simplified_binlog_gtid_recovery | OFF | +---------------------------------+------------------------------------------+ 7 rows in set (0.00 sec)
在slave上經過set global gtid_purged='xxxx'
的方式,跳過已經purge的部分
slave2 [localhost] {msandbox} ((none)) > stop slave; Query OK, 0 rows affected (0.04 sec) slave2 [localhost] {msandbox} ((none)) > set global gtid_purged = '24024e52-bd95-11e4-9c6d-926853670d0b:1'; Query OK, 0 rows affected (0.05 sec) slave2 [localhost] {msandbox} ((none)) > start slave; Query OK, 0 rows affected (0.01 sec) slave2 [localhost] {msandbox} ((none)) > show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event ...... Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 359 Relay_Log_File: mysql_sandbox21290-relay-bin.000004 Relay_Log_Pos: 569 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: Yes Slave_SQL_Running: Yes ...... Exec_Master_Log_Pos: 359 Relay_Log_Space: 873 ...... Master_Server_Id: 1 Master_UUID: 24024e52-bd95-11e4-9c6d-926853670d0b Master_Info_File: /data/mysql/rsandbox_mysql-5_6_23/node2/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it ...... Retrieved_Gtid_Set: 24024e52-bd95-11e4-9c6d-926853670d0b:2-3 Executed_Gtid_Set: 24024e52-bd95-11e4-9c6d-926853670d0b:1-3 Auto_Position: 1 1 row in set (0.00 sec)
能夠看到此時slave已經能夠正常同步,並補齊了24024e52-bd95-11e4-9c6d-926853670d0b:2-3
範圍的binlog日誌。