主從複製延遲緣由剖析！

時間 2019-11-06

標籤主從複製延遲緣由剖析简体版

原文原文鏈接

寫在前面：html

以前在維護線上主從複製架構的時候，遇到了一些主從延遲問題，筆者呢，也是比較好學的，哈哈！因此，查閱了諸多資料，而後去其糟粕，根據本身的理解和查閱的資料整理成了本文，在此申明，本文內容是筆者本身的理解，不表明權威性，僅供各位同行作個參考，本身呢，也作個學習記錄。本着實事求是的原則，對於網上的諸多資料，筆者本身也只是信5分，懷疑5分，就算是官方的資料，有的也有bug不是，只有本身動手實踐過了，才能相信，纔有發言權，其餘的，一切都是虛的！mysql

MySQL主從複製過程：sql

1）主庫 Binlog Dump線程在binlog有變化時，主動發送最新的binlog到從庫。數據庫

2）從庫 I/O線程被動接收主庫傳來的binlog以後，記錄到從庫的relay log中，當沒有數據傳入的時候則會等待。與此同時SQL線程重放 relay log。緩存

3）當從庫長時間未收到主庫傳來的數據，而且等待時間超過了slave_net_timeout定義的時間（默認3600秒）後，Slave_IO_State的狀態將會置爲No。在此以後，每隔MASTER_CONNECT_RETRY [Connect_Retry: 30]定義的時間（默認60秒）將會嘗試從新鏈接，直到鏈接成功或超太重試次數MASTER_RETRY_COUNT [Master_Retry_Count: 6666]（默認86400次）。安全

slave_net_timeout能夠在配置文件中修改或set variable在線設置網絡

而 MASTER_CONNECT_RETRY、MASTER_RETRY_COUNT 須要在CHANGE MASTER TO創建複製關係時提早指定架構

在線變動 slave_net_timeout：併發

SHOW VARIABLES LIKE 'slave_net_timeout'異步

Variable_name `Value`

slave_net_timeout 3600

SET GLOBAL slave_net_timeout=1800

修改MASTER_CONNECT_RETRY=30,MASTER_RETRY_COUNT值：

mysql> change master to MASTER_CONNECT_RETRY=30,MASTER_RETRY_COUNT=6666;

ERROR 1198 (HY000): This operation cannot be performed with a running slave; run STOP SLAVE first

mysql> stop slave;

Query OK, 0 rows affected (0.01 sec)

mysql> change master to MASTER_CONNECT_RETRY=30,MASTER_RETRY_COUNT=6666;

Query OK, 0 rows affected (0.01 sec)

mysql> start slave;

Query OK, 0 rows affected (0.02 sec)

MySQL主從複製延遲是怎樣造成的：

一、主庫的worker線程在寫binlog的時候是併發工做的（並行寫入），而主庫的dump線程和從庫的IO線程都是單線程推拉binlog，特別是SQL線程是拿着relay log中的event逐一單線程回放的（5.6版本開啓slave_parallel_workers支持特定狀況下的並行複製，5.7版本以後全面支持並行複製後在複製層面已極大改善了延遲問題）。所以即便不考慮網絡延遲，主流MySQL版本在高併發的狀況下，消費極可能趕不上生產，採用異步複製的從庫頗有可能跟不上主庫的進度。

二、在複製期間，不管是主庫或從庫負載高（特別是從庫落盤壓力大，關係到sync_binlog、innodb_flush_log_at_trx_commit的設置）或者是網絡傳輸慢（特別是跨機房的同步）等狀況發生時，都會產生主從延遲，而且是不可避免的。若是要實現強一致性，可採用Semi-sync，但採用該plugin也沒法保證持續強一致性（rpl_semi_sync_master_timeout會引發複製模式的降級）

根據MySQL主從複製延遲的造成緣由，如下狀況可能致使MySQL主從複製延遲：

1）MASTER高併發，造成大量事務

2）網絡情況差

3）從庫的硬件配置沒有主庫的好

4）原本就是異步複製

關於落盤時的部分參數解釋：

sync_binlog：

MySQL官方文檔參考：https://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html

解釋：

Controls how often the MySQL server synchronizes the binary log to disk

控制MySQL將二進制日誌（binary log）同步到磁盤的頻率

sync_binlog=0:

Disables synchronization of the binary log to disk by the MySQL server. Instead, the MySQL server relies on the operating system to flush the binary log to disk from time to time as it does for any other file. This setting provides the best performance, but in the event of a power failure or operating system crash, it is possible that the server has committed transactions that have not been synchronized to the binary log.

sync_binlog=1:

Enables synchronization of the binary log to disk before transactions are committed. This is the safest setting but can have a negative impact on performance due to the increased number of disk writes. In the event of a power failure or operating system crash, transactions that are missing from the binary log are only in a prepared state. This permits the automatic recovery routine to roll back the transactions, which guarantees that no transaction is lost from the binary log.

sync_binlog=N:

where N is a value other than 0 or 1: The binary log is synchronized to disk after N binary log commit groups have been collected. In the event of a power failure or operating system crash, it is possible that the server has committed transactions that have not been flushed to the binary log. This setting can have a negative impact on performance due to the increased number of disk writes. A higher value improves performance, but with an increased risk of data loss.

innodb_flush_log_at_trx_commit：

MySQL官方文檔參考：https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

其餘參考文檔：

https://blog.csdn.net/codepen/article/details/52160715

https://www.cnblogs.com/mayipapa/p/4685313.html

解釋：

是 InnoDB 引擎特有的，ib_logfile的刷新方式

MySQL日誌寫入順序：

log buffer => MySQL(write) => log file => OS刷新(flush) => disk

innodb_flush_log_at_trx_commit取值解釋：（從一些博客之中參考的）

0，延遲寫：

log buffer => 每隔1秒 => log file => OS 實時flush => disk

1，實時寫，實時刷：

log buffer => 實時 => log file => OS實時flush => disk

這樣的話，數據庫對IO的要求就很是高了，若是底層的硬件提供的IOPS比較差，那麼MySQL數據庫的併發很快就會因爲硬件IO的問題而沒法提高

2，實時寫，延遲刷：

log buffer => 實時 => log file => OS每隔1秒 => disk

若是隻是MySQL數據庫掛掉了，因爲文件系統沒有問題，那麼對應的事務數據並無丟失。只有在數據庫所在的主機操做系統損壞或者忽然掉電的狀況下，數據庫的事務數據可能丟失1秒之類的事務數據。這樣的好處，減小了事務數據丟失的機率，而對底層硬件的IO要求也沒有那麼高(log buffer寫到文件系統中，通常只是從log buffer的內存轉移的文件系統的內存緩存中，對底層IO沒有壓力)。

官方文檔中文解釋：（筆者本身的理解）

當innodb_flush_log_at_trx_commit，被設置爲0，日誌緩衝(log buffer)每秒一次地被寫入到日誌文件(log file)，而且對日誌文件作磁盤刷新(flush disk)，該模式下在事務提交的時候，不會主動觸發寫入磁盤的操做。

當innodb_flush_log_at_trx_commit，被設置爲1，在每一個事務提交時，日誌緩衝(log buffer)被寫入到日誌文件(log file)，而且對日誌文件作磁盤刷新(flush disk) [同時進行]

當innodb_flush_log_at_trx_commit，被設置爲2，在每一個事務提交時，日誌緩衝(log buffer)被寫入到日誌文件(log file)，但不對日誌文件作磁盤刷新(flush disk) [不一樣時進行]，該模式下，MySQL會每秒執行一次 flush(刷到磁盤)操做

儘管如此，當innodb_flush_log_at_trx_commit值爲2時，對日誌文件(log file)的磁盤刷新(flush disk)也每秒發生一次。

由於進程安排問題，每秒一次的刷新不是100%保證都會發生。能夠經過設置innodb_flush_log_at_trx_commit值不爲1來得到較好的性能，但若是你設置此值爲0，那麼MySQL崩潰會丟失崩潰前1秒的事務（該模式下性能最好，但不×××全）；若是設置此值爲2，當操做系統崩潰或斷電時纔會丟失最後1秒的事務（該模式下性能較好，也比0模式安全）。若是設置此值爲0，該模式性能最低，可是最安全的模式。在MySQL服務崩潰或者操做系統crash的狀況下，binlog只會丟失一個語句或一個事務。

注意，許多操做系統和一些磁盤硬件會欺騙刷新到磁盤操做(flush disk)。儘管刷新沒有進行，也會告訴MySQL刷新已經進行。即便設置此值爲1，事務的持久性也不被保證，且在最壞的狀況下斷電甚至會破壞數據庫。在SCSI磁盤控制器中，或在磁盤自身中，使用有後備電池的磁盤緩存會加速文件刷新而且使得操做更安全。能夠試着使用hdparm命令在硬件緩存中禁止磁盤寫緩存，或者使用其餘一些對硬件提供商專用的命令

最後MySQL官方建議：

A higher value improves performance, but with an increased risk of data loss

For the greatest possible durability and consistency in a replication setup that uses InnoDB with transactions, use these settings:

sync_binlog=1

innodb_flush_log_at_trx_commit=1

MySQL主從延遲如何計算：

第一種計算方式：position（簡單粗暴，僅僅能看出是否有延遲）

mysql> show slave status\G;

Read_Master_Log_Pos: 4

Exec_Master_Log_Pos: 0

第二種計算方式：Seconds_Behind_Master

參考MySQL官方文檔：