MySQL binlog 組提交與 XA(兩階段提交)

1. XA-2PC (two phase commit, 兩階段提交 )html

XA是由X/Open組織提出的分佈式事務的規範(X表明transaction; A表明accordant?)。XA規範主要定義了(全局)事務管理器(TM: Transaction Manager)和(局部)資源管理器(RM: Resource Manager)之間的接口。XA爲了實現分佈式事務,將事務的提交分紅了兩個階段:也就是2PC (tow phase commit),XA協議就是經過將事務的提交分爲兩個階段來實現分佈式事務。mysql

1.1 prepare 階段:linux

第一階段,事務管理器向全部涉及到的數據庫服務器發出prepare"準備提交"請求,數據庫收到請求後執行數據修改和日誌記錄等處理,處理完成後只是把事務的狀態改爲"能夠提交",而後把結果返回給事務管理器。sql

1.2 commit 階段:數據庫

事務管理器收到迴應後進入第二階段,若是在第一階段內有任何一個數據庫的操做發生了錯誤,或者事務管理器收不到某個數據庫的迴應,則認爲事務失敗,回撤全部數據庫的事務。數據庫服務器收不到第二階段的確認提交請求,也會把"能夠提交"的事務回撤。若是第一階段中全部數據庫都提交成功,那麼事務管理器向數據庫服務器發出"確認提交"請求,數據庫服務器把事務的"能夠提交"狀態改成"提交完成"狀態,而後返回應答。服務器

 

2. MySQL 中的XA實現多線程

Support for XA transactions is available for the InnoDB storage engine. The MySQL XA implementation is based on the X/Open CAE document Distributed Transaction Processing: The XA Specification.
併發

Currently, among the MySQL Connectors, MySQL Connector/J 5.0.0 and higher supports XA directly, by means of a class interface that handles the XA SQL statement interface for you.less

XA supports distributed transactions, that is, the ability to permit multiple separate transactional resources to participate in a global transaction. Transactional resources often are RDBMSs but may be other kinds of resources.分佈式

A global transaction involves several actions that are transactional in themselves, but that all must either complete successfully as a group, or all be rolled back as a group. In essence, this extends ACID properties 「up a level」 so that multiple ACID transactions can be executed in concert as components of a global operation that also has ACID properties. (However, for a distributed transaction, you must use the SERIALIZABLE isolation level to achieve ACID properties. It is enough to use REPEATABLE READ for a nondistributed transaction, but not for a distributed transaction.)

最重要的一點:使用MySQL中的XA實現分佈式事務時必須使用serializable隔離級別。

The MySQL implementation of XA MySQL enables a MySQL server to act as a Resource Manager that handles XA transactions within a global transaction. A client program that connects to the MySQL server acts as the Transaction Manager.

The process for executing a global transaction uses two-phase commit (2PC). This takes place after the actions performed by the branches of the global transaction have been executed.

  1. In the first phase, all branches are prepared. That is, they are told by the TM to get ready to commit. Typically, this means each RM that manages a branch records the actions for the branch in stable storage. The branches indicate whether they are able to do this, and these results are used for the second phase.

  2. In the second phase, the TM tells the RMs whether to commit or roll back. If all branches indicated when they were prepared that they will be able to commit, all branches are told to commit. If any branch indicated when it was prepared that it will not be able to commit, all branches are told to roll back.

第一階段:爲prepare階段,TM向RM發出prepare指令,RM進行操做,而後返回成功與否的信息給TM;

第二階段:爲事務提交或者回滾階段,若是TM收到全部RM的成功消息,則TM向RM發出提交指令;否則則發出回滾指令;

XA transaction support is limited to the InnoDB storage engine.(只有innodb支持XA分佈式事務)

For "external XA" a MySQL server acts as a Resource Manager and client programs act as Transaction Managers. For "Internal XA", storage engines within a MySQL server act as RMs, and the server itself acts as a TM. Internal XA support is limited by the capabilities of individual storage engines.  Internal XA is required for handling XA transactions that involve more than one storage engine. The implementation of internal XA requires that a storage engine support two-phase commit at the table handler level, and currently this is true only for InnoDB.

MySQL中的XA實現分爲:外部XA和內部XA;前者是指咱們一般意義上的分佈式事務實現;後者是指單臺MySQL服務器中,Server層做爲TM(事務協調者),而服務器中的多個數據庫實例做爲RM,而進行的一種分佈式事務,也就是MySQL跨庫事務;也就是一個事務涉及到同一條MySQL服務器中的兩個innodb數據庫(由於其它引擎不支持XA)。

3. 內部XA的額外功能

XA 將事務的提交分爲兩個階段,而這種實現,解決了 binlog 和 redo log的一致性問題,這就是MySQL內部XA的第三種功能。

MySQL爲了兼容其它非事物引擎的複製,在server層面引入了 binlog, 它能夠記錄全部引擎中的修改操做,於是能夠對全部的引擎使用複製功能;MySQL在4.x 的時候放棄redo的複製策略而引入binlog的複製(淘寶丁奇)。

可是引入了binlog,會致使一個問題——binlog和redo log的一致性問題:一個事務的提交必須寫redo log和binlog,那麼兩者如何協調一致呢?事務的提交以哪個log爲標準?如何判斷事務提交?事務崩潰恢復如何進行?

MySQL經過兩階段提交(內部XA的兩階段提交)很好地解決了這一問題:

第一階段:InnoDB prepare,持有prepare_commit_mutex,而且write/sync redo log; 將回滾段設置爲Prepared狀態,binlog不做任何操做;

第二階段:包含兩步,1> write/sync Binlog; 2> InnoDB commit (寫入COMMIT標記後釋放prepare_commit_mutex);

以 binlog 的寫入與否做爲事務提交成功與否的標誌,innodb commit標誌並非事務成功與否的標誌。由於此時的事務崩潰恢復過程以下:

1> 崩潰恢復時,掃描最後一個Binlog文件,提取其中的xid; 
2> InnoDB維持了狀態爲Prepare的事務鏈表,將這些事務的xid和Binlog中記錄的xid作比較,若是在Binlog中存在,則提交,不然回滾事務。

經過這種方式,可讓InnoDB和Binlog中的事務狀態保持一致。若是在寫入innodb commit標誌時崩潰,則恢復時,會從新對commit標誌進行寫入;

在prepare階段崩潰,則會回滾,在write/sync binlog階段崩潰,也會回滾。這種事務提交的實現是MySQL5.6以前的實現。

 

4. binlog 組提交

上面的事務的兩階段提交過程是5.6以前版本中的實現,有嚴重的缺陷。當sync_binlog=1時,很明顯上述的第二階段中的 write/sync binlog會成爲瓶頸,並且仍是持有全局大鎖(prepare_commit_mutex: prepare 和 commit共用一把鎖),這會致使性能急劇降低。解決辦法就是MySQL5.6中的 binlog組提交。

4.1 MySQL5.6中的binlog group commit:

Binlog Group Commit的過程拆分紅了三個階段

1> flush stage 將各個線程的binlog從cache寫到文件中; 

2> sync stage 對binlog作fsync操做(若是須要的話;最重要的就是這一步,對多個線程的binlog合併寫入磁盤);

3> commit stage 爲各個線程引擎層的事務commit(這裏不用寫redo log,在prepare階段已寫)。每一個stage同時只有一個線程在操做。(分紅三個階段,每一個階段的任務分配給一個專門的線程,這是典型的併發優化)

這種實現的優點在於三個階段能夠併發執行,從而提高效率。注意prepare階段沒有變,仍是write/sync redo log.

(另外:5.7中引入了MTS:多線程slave複製,也是經過binlog組提交實現的,在binlog組提交時,給每個組提交打上一個seqno,而後在slave中就能夠按照master中同樣按照seqno的大小順序,進行事務組提交了。)

 

4.2 MySQL5.7中的binlog group commit:

淘寶對binlog group commit進行了進一步的優化,其原理以下:

從XA恢復的邏輯咱們能夠知道,只要保證InnoDB Prepare的redo日誌在寫Binlog前完成write/sync便可。所以咱們對Group Commit的第一個stage的邏輯作了些許修改,大概描述以下:

 Step1. InnoDB Prepare,記錄當前的LSN到thd中; 
 Step2. 進入Group Commit的flush stage;Leader蒐集隊列,同時算出隊列中最大的LSN。 
 Step3. 將InnoDB的redo log write/fsync到指定的LSN  (:這一步就是redo log的組寫入。由於小於等於LSN的redo log被一次性寫入到ib_logfile[0|1])
 Step4. 寫Binlog並進行隨後的工做(sync Binlog, InnoDB commit , etc)

也就是將 redo log的write/sync延遲到了 binlog group commit的 flush stage 以後,sync binlog以前。

經過延遲寫redo log的方式,顯式的爲redo log作了一次組寫入(redo log group write),並減小了(redo log) log_sys->mutex的競爭。

也就是將 binlog group commit 對應的redo log也進行了 group write. 這樣binlog 和 redo log都進行了優化。

官方MySQL在5.7.6的代碼中引入了淘寶的優化,對應的Release Note以下:

When using InnoDB with binary logging enabled, concurrent transactions written in the InnoDB redo log are now grouped together before synchronizing to disk when innodb_flush_log_at_trx_commit is set to 1, which reduces the amount of synchronization operations. This can lead to improved performance.

5. XA參數 innodb_support_xa

http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_support_xa

Command-Line Format --innodb_support_xa
System Variable Name innodb_support_xa
Variable Scope Global, Session
Dynamic Variable Yes
Permitted Values Type boolean
Default TRUE

Enables InnoDB support for two-phase commit(2PC) in XA transactions, causing an extra disk flush for transaction preparation. This setting is the default. The XA mechanism is used internally and is essential for any server that has its binary log turned on and is accepting changes to its data from more than one thread. If you turn it off, transactions can be written to the binary log in a different order from the one in which the live database is committing them. This can produce different data when the binary log is replayed in disaster recovery or on a replication slave. Do not turn it off on a replication master server unless you have an unusual setup where only one thread is able to change data.

For a server that is accepting data changes from only one thread, it is safe and recommended to turn off this option to improve performance forInnoDB tables. For example, you can turn it off on replication slaves where only the replication SQL thread is changing data.

You can also turn off this option if you do not need it for safe binary logging or replication, and you also do not use an external XA transaction manager.

參數innodb_support_xa默認爲true,表示啓用XA,雖然它會致使一次額外的磁盤flush(prepare階段flush redo log). 可是咱們必須啓用,而不能關閉它。由於關閉會致使binlog寫入的順序和實際的事務提交順序不一致,會致使崩潰恢復和slave複製時發生數據錯誤。若是啓用了log-bin參數,而且不止一個線程對數據庫進行修改,那麼就必須啓用innodb_support_xa參數


轉載於:http://www.linuxidc.com/Linux/2015-11/124942.htm

相關文章
相關標籤/搜索