參考葉師傅文章:FAQ系列 | 如何保證主從複製數據一致性html
在MySQL中,一次事務提交後,須要寫undo、寫redo、寫binlog,寫數據文件等等。在這個過程當中,可能在某個步驟發生crash,就有可能致使主從數據的不一致。爲了不這種狀況,咱們須要調整主從上面相關選項配置,確保即使發生crash了,也不能發生主從複製的數據丟失。
MASTER上修改配置
保證每次事務提交後,都能實時刷新到磁盤中,尤爲是確保每次事務對應的binlog都能及時刷新到磁盤中innodb_flush_log_at_trx_commit = 1 --> redo log 1寫磁盤、2寫系統緩存(操做系統掛可能丟數據)、0寫redo log buffer(mysql掛可能丟數據) sync_binlog = 1 --> binlog 1寫磁盤、0寫系統緩存
SLAVE上修改配置
確保在slave上和複製相關的元數據表也採用InnoDB引擎,受到InnoDB事務安全的保護;開啓relay-log自動修復機制,發生crash時根據relay_log_info中記錄的已執行的binlog位置從master上從新抓取回來再次應用,以此避免部分數據丟失的可能性。master_info_repository = "TABLE" relay_log_info_repository = "TABLE" relay_log_recovery = 1
這樣配置後,正常狀況下主從數據應該是一致的~mysql
• binlog_format='STATEMENT'
只要複製語句對應的表結構一致,主從數據是否一致不會影響複製狀態
• binlog_format='ROW'
一、有主鍵/惟一索引的狀況下,slave應用relay-log的過程只需匹配主鍵/惟一索引便可,不會考慮其餘列與master上的原始值是否一致
二、slave update/delete master上永遠不會訪問的數據
一致性的保證,須要按期使用pt工具檢測並同步啦●-●sql
參考文章:MySQL relay_log_purge=0 時的風險緩存
有時候,咱們但願將MySQL的relay-log多保留一段時間,好比用於高可用切換後的數據補齊,因而就會設置relay_log_purge=0,禁止SQL_Thread在執行完一個relay-log後自動將其刪除。
relay_log_recovery=1 && relay_log_purge=0會有什麼坑
• 因爲崩潰或中止MySQL時,SQL_Thread可能沒有執行徹底部的relay-log,最後一個relay-log中的一部分數據會被從新獲取到新的文件中。也就是說,這部分數據重複了兩次
• 若是SQL_Thread跟得很緊,則可能在IO_Thread寫入relay-log,但尚未同步到磁盤時,就已經讀取執行了。這時,就會形成新的文件和舊的文件中少了一部分數據
對於複製來講這樣不會有什麼影響,但若是咱們讀取relay-log來獲取數據,必須注意這一點,不然就會形成數據不一致安全
傳統複製環境,MHA利用Latest Slave的relay-log去補全其餘Slave的與Latest Slave之間的差別數據;GTID環境,經過change master to利用binlog補全數據,再也不依賴relay-log
爲了方便模擬,本文選擇手動Failover來檢測MHA遇到上面提到的坑會出現什麼現象?本文使用MHA-手動Failover流程(傳統複製>ID複製)中的基本環境app
人爲暫停SQL_Thread,再關閉MySQL實例,模擬SQL_Thread沒有執行徹底部的relay-logide
relay_log_recovery=1 && relay_log_purge=0 #測試數據簡寫以下 Node1寫入第一條記錄->Node3中止io_thread Node1寫入第二條記錄-> 1、Node2從庫stop slave sql_thread; 2、Node1主庫寫入一條新數據row_new 3、Node2從庫shutdown; 4、Node2從庫啓動mysql,start slave; Node2中止io_thread Node1寫入第三條記錄
暫停從庫的SQL_Thread,主庫寫入新數據,新數據被IO_Thread獲取寫入到relay-log,而後從新啓動從庫的mysql實例,IO_Thread根據relay_log_info中記錄的已執行的binlog位置從master上從新抓取回來再次應用,所以在relay-log中能夠解析到row_new獲取過兩次~工具
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover ... Fri Apr 13 18:09:37 2018 - [info] * Phase 3.4: Master Log Apply Phase.. Fri Apr 13 18:09:37 2018 - [info] Fri Apr 13 18:09:37 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Fri Apr 13 18:09:37 2018 - [info] Starting recovery on 192.168.85.134(192.168.85.134:3307).. Fri Apr 13 18:09:37 2018 - [info] Generating diffs succeeded. Fri Apr 13 18:09:37 2018 - [info] Waiting until all relay logs are applied. Fri Apr 13 18:09:37 2018 - [info] done. Fri Apr 13 18:09:37 2018 - [debug] Stopping SQL thread on 192.168.85.134(192.168.85.134:3307).. Fri Apr 13 18:09:37 2018 - [debug] done. Fri Apr 13 18:09:37 2018 - [info] Getting slave status.. Fri Apr 13 18:09:37 2018 - [info] This slave(192.168.85.134)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000005:484). No need to recover from Exec_Master_Log_Pos. Fri Apr 13 18:09:37 2018 - [debug] Current max_allowed_packet is 4194304. Fri Apr 13 18:09:37 2018 - [debug] Tentatively setting max_allowed_packet to 1GB succeeded. Fri Apr 13 18:09:37 2018 - [info] Connecting to the target slave host 192.168.85.134, running recover script.. Fri Apr 13 18:09:37 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.134 --slave_ip=192.168.85.134 --slave_port=3307 --apply_files=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.21-log --timestamp=20180413180912 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --debug --slave_pass=xxx Fri Apr 13 18:09:45 2018 - [info] Concat all apply files to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180413180912.binlog .. Copying the first binlog file /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180413180912.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog.. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type=15 server_id=1323307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123 Binlog Checksum enabled parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type=35 server_id=1323307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154 Got previous gtids log event: 154. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type=34 server_id=1323307 length=65 nextmpos=1209 prevrelay=154 cur(post)relay=219 dumped up to pos 154. ok. /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog has effective binlog events from pos 154. Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog position 154 to tail(507).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307.20180413180912.binlog . MySQL client version is 5.7.21. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog on 192.168.85.134:3307. This may take long time... FATAL: applying log files failed with rc 1:0! Error logs from ZST3:/var/log/masterha/app1/relay_log_apply_for_192.168.85.134_3307_20180413180912_err.log (the last 200 lines).. mysql: [Warning] Using a password on the command line interface can be insecure. ... ERROR 1062 (23000) at line 92: Duplicate entry '3' for key 'PRIMARY' -------------- BINLOG ' NoDQWhMrMRQAPwAAAAMEAAAAAG4AAAAAAAEACXJlcGxjcmFzaAAHcHlfdXNlcgAEAw8SDwVgAAAe AA7JJu9M NoDQWh4rMRQAVgAAAFkEAAAAAG4AAAAAAAEAAgAE//ADAAAAIGM3MzExZWQ0LTNmMDEtMTFlOC05 ODg4LTAwMGMyOWMxmZ+bIJ4HMTMyMzMwN3PJaGg= ' -------------- Bye at /usr/bin/apply_diff_relay_logs line 515 eval {...} called at /usr/bin/apply_diff_relay_logs line 475 main::main() called at /usr/bin/apply_diff_relay_logs line 120 Fri Apr 13 18:09:45 2018 - [debug] Setting max_allowed_packet back to 4194304 succeeded. Fri Apr 13 18:09:45 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1398] Applying diffs failed with return code 22:0. Fri Apr 13 18:09:45 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1561] Recovering master server failed. Fri Apr 13 18:09:45 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53 Fri Apr 13 18:09:45 2018 - [debug] Disconnected from 192.168.85.133(192.168.85.133:3307) Fri Apr 13 18:09:45 2018 - [debug] Disconnected from 192.168.85.134(192.168.85.134:3307) Fri Apr 13 18:09:45 2018 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.85.132(192.168.85.132:3307) Master 192.168.85.132(192.168.85.132:3307) is down! Check MHA Manager logs at ZST3 for details. Started manual(interactive) failover. Invalidated master IP address on 192.168.85.132(192.168.85.132:3307) The latest slave 192.168.85.133(192.168.85.133:3307) has all relay logs for recovery. Selected 192.168.85.134(192.168.85.134:3307) as a new master. Recovering master server failed. Got Error so could not continue failover from here. [root@ZST3 app1]#
MHA切換會報錯!緣由就是Node3獲取Latest Slave上的數據,會有重複記錄,致使應用差別日誌時報錯。relay_from_read_to_latest_**裏面也能看到有重複數據post
要模擬SQL_Thread跟得比較緊不太好實現,可是能夠變相模擬從庫缺失relay-log的狀況測試
relay_log_recovery=1 && relay_log_purge=1 #測試數據簡寫以下 Node1寫入第一條記錄->Node3中止io_thread Node1寫入第二條記錄->Node2執行兩次flush relay logs;->Node2中止io_thread Node1寫入第三條記錄
目的是將第二條記錄相關的relay-log給purge掉,這樣Latest Slave上就沒有足夠的relay-log用於其餘Slave的恢復
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.85.134 --new_master_port=3307 --ignore_last_failover ... Fri Apr 13 15:26:39 2018 - [info] * Phase 3.3: Determining New Master Phase.. Fri Apr 13 15:26:39 2018 - [info] Fri Apr 13 15:26:39 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Fri Apr 13 15:26:39 2018 - [info] Checking whether 192.168.85.133 has relay logs from the oldest position.. Fri Apr 13 15:26:39 2018 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000001 --latest_rmlp=1303 --target_mlf=mysql-bin.000001 --target_rmlp=643 --server_id=1333307 --workdir=/var/log/masterha/app1 --timestamp=20180413152622 --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info --relay_dir=/data/mysql/mysql3307/data/ --debug : Opening /data/mysql/mysql3307/data/relay-log.info ... ok. Relay log found at /data/mysql/mysql3307/data, up to relay-bin.000004 Fast relay log position search failed. Reading relay logs to find.. Reading relay-bin.000004 parse_init_headers: file=relay-bin.000004 event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123 Binlog Checksum enabled parse_init_headers: file=relay-bin.000004 event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154 Got previous gtids log event: 154. parse_init_headers: file=relay-bin.000004 event_type=15 server_id=1323307 length=119 nextmpos=0 prevrelay=154 cur(post)relay=273 Master Version is 5.7.21-log Binlog Checksum enabled parse_init_headers: file=relay-bin.000004 event_type=34 server_id=1323307 length=65 nextmpos=1038 prevrelay=273 cur(post)relay=338 get_starting_mlp: file=relay-bin.000004 event_type=2 server_id=1323307 length=85 next=1123 relay-bin.000004 contains master mysql-bin.000001 from position 1123 Reading relay-bin.000003 parse_init_headers: file=relay-bin.000003 event_type=15 server_id=1333307 length=119 nextmpos=123 prevrelay=4 cur(post)relay=123 Binlog Checksum enabled parse_init_headers: file=relay-bin.000003 event_type=35 server_id=1333307 length=31 nextmpos=154 prevrelay=123 cur(post)relay=154 Got previous gtids log event: 154. parse_init_headers: file=relay-bin.000003 event_type=15 server_id=1323307 length=119 nextmpos=0 prevrelay=154 cur(post)relay=273 parse_init_headers: file=relay-bin.000003 event_type=4 server_id=1333307 length=47 nextmpos=320 prevrelay=273 cur(post)relay=320 Reading relay-bin.000002 No such file or directory:/data/mysql/mysql3307/data/relay-bin.000002 at /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm line 102 Fri Apr 13 15:26:40 2018 - [warning] 192.168.85.133 does not have all relay logs. Maybe some logs were purged. Fri Apr 13 15:26:40 2018 - [warning] None of latest servers have enough relay logs from oldest position. We can not recover oldest slaves. Fri Apr 13 15:26:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln947] None of the latest slaves has enough relay logs for recovery. Fri Apr 13 15:26:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53 Fri Apr 13 15:26:40 2018 - [debug] Disconnected from 192.168.85.133(192.168.85.133:3307) Fri Apr 13 15:26:40 2018 - [debug] Disconnected from 192.168.85.134(192.168.85.134:3307) Fri Apr 13 15:26:40 2018 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.85.132(192.168.85.132:3307) Master 192.168.85.132(192.168.85.132:3307) is down! Check MHA Manager logs at ZST3 for details. Started manual(interactive) failover. Invalidated master IP address on 192.168.85.132(192.168.85.132:3307) None of the latest slaves has enough relay logs for recovery. Got Error so could not continue failover from here. [root@ZST3 app1]#
MHA切換會報錯!緣由是Latest Slave沒有包含足夠的relay-log用於其餘Slave的恢復操做
這樣看來MHA須要relay-log恢復數據的過程,若是relay-log重複或者缺失會直接報錯,切換失敗!!!
自動切換先找出全部配置candidate_master=1的[server],再從中找出日誌最新的,若是有多個日誌最新的,那就按[server]的前後順序來選new master
傳統複製環境,若是選擇了"問題Slave"做爲Latest Slave,無論手動仍是自動Failover,切換都會報錯。因此儘可能用GTID吧~
15:07 2018/7/26 補充
GTID環境,執行save_binary_logs --command=save 保存Dead Master/Binlog Server和Latest Slave之間的差別數據報錯
mysqlbinlog:[ERROR] unknown variable 'default-character-set=utf8'
/etc/mysql.cnf中有
[client]
default-character-set=utf8
當註釋掉這行就能夠正常切換(不需重啓),什麼緣由呢?
GTID環境save_binary_logs執行的相似這種命令
Executing command: mysqlbinlog --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009
mysqlbinlog相似mysqladmin會到/etc/my.cnf /etc/mysql/my.cnf /usr/local/mysql/etc/my.cnf ~/.my.cnf文件中讀取[mysqladmin] [client]組
若是上述配置文件中添加前面的字符集信息,嘗試打印mysqlbinlog默認參數信息
[root@ZST1 ~]# mysqlbinlog --print-defaults
mysqlbinlog would have been started with the following arguments:
--character-set-server=utf8
也就是說mysqlbinlog --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009,等效命令mysqlbinlog --character-set-server=utf8 --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009可是mysqlbinlog並不支持--character-set-server這樣的變量因此就報錯啦~解決方法嘛,註釋配置文件中的字符集信息,或者給mysqlbinlog增長一個別名:alias mysqlbinlog='mysqlbinlog --no-defaults'