摘要: 本文主要基於 spring cloud + spring jpa + spring cloud alibaba fescar + mysql + seata 的結構,搭建一個分佈式系統的 demo,經過 seata 的 debug 日誌和源代碼,從 client 端(RM、TM)的角度分析其工做流程及原理。
在分佈式系統中,分佈式事務是一個必需要解決的問題,目前使用較多的是最終一致性方案。自年初阿里開源了Fescar(四月初改名爲Seata)後,該項目受到了極大的關注,目前已接近 8000 Star。Seata以高性能和零侵入的特性爲目標解決微服務領域的分佈式事務難題,目前正處於快速迭代中,近期小目標是生產可用的 Mysql 版本。java
本文主要基於 spring cloud + spring jpa + spring cloud alibaba fescar + mysql + seata 的結構,搭建一個分佈式系統的 demo,經過 seata 的 debug 日誌和源代碼,從 client 端(RM、TM)的角度分析其工做流程及原理。(示例項目:https://github.com/fescar-group/fescar-samples/tree/master/springcloud-jpa-seata)mysql
爲了更好地理解全文,咱們來熟悉一下相關概念:git
提示:文中代碼是基於 fescar-0.4.1 版本,因爲項目剛改名爲 seata 不久,其中一些包名、類名、jar包等名稱還沒統一更換過來,故下文中仍使用 fescar 進行表述。
Fescar 使用 XID 表示一個分佈式事務,XID 須要在一次分佈式事務請求所涉的系統中進行傳遞,從而向 feacar-server 發送分支事務的處理狀況,以及接收 feacar-server 的 commit、rollback 指令。 Fescar 官方已支持全版本的 dubbo 協議,而對於 spring cloud(spring-boot)的分佈式項目社區也提供了相應的實現github
<dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-alibaba-fescar</artifactId> <version>2.1.0.BUILD-SNAPSHOT</version> </dependency>
該組件實現了基於 RestTemplate、Feign 通訊時的 XID 傳遞功能。web
業務邏輯是經典的下訂單、扣餘額、減庫存流程。 根據模塊劃分爲三個獨立的服務,且分別鏈接對應的數據庫:spring
另外還有發起分佈式事務的業務系統:sql
項目結構以下圖數據庫
正常業務:api
異常業務:架構
扣減餘額異常
正常流程下 二、三、4 步的數據正常更新全局 commit,異常流程下的數據則因爲第 4 步的異常報錯全局回滾。
fescar 的配置入口文件是 registry.conf, 查看代碼 ConfigurationFactory 得知目前還不能指定該配置文件,因此配置文件名稱只能爲 registry.conf。
private static final String REGISTRY_CONF = "registry.conf"; public static final Configuration FILE_INSTANCE = new FileConfiguration(REGISTRY_CONF);
在 registry
中能夠指定具體配置的形式,默認使用 file 類型,在 file.conf 中有 3 部分配置內容:
service
service { #vgroup->rgroup vgroup_mapping.my_test_tx_group = "default" #配置Client鏈接TC的地址 default.grouplist = "127.0.0.1:8091" #degrade current not support enableDegrade = false #disable 是否啓用seata的分佈式事務 disableGlobalTransaction = false }
client
client { #RM接收TC的commit通知後緩衝上限 async.commit.buffer.limit = 10000 lock { retry.internal = 10 retry.times = 30 } }
除了前面的配置文件,fescar 在 AT 模式下稍微有點代碼量的地方就是對數據源的代理指定,且目前只能基於DruidDataSource
的代理。 (注:在最新發布的 0.4.2 版本中已支持任意數據源類型)
@Bean @ConfigurationProperties(prefix = "spring.datasource") public DruidDataSource druidDataSource() { DruidDataSource druidDataSource = new DruidDataSource(); return druidDataSource; } @Primary @Bean("dataSource") public DataSourceProxy dataSource(DruidDataSource druidDataSource) { return new DataSourceProxy(druidDataSource); }
使用 DataSourceProxy
的目的是爲了引入 ConnectionProxy
,fescar 無侵入的一方面就體如今 ConnectionProxy
的實現上,即分支事務加入全局事務的切入點是在本地事務的 commit
階段,這樣設計能夠保證業務數據與 undo_log
是在一個本地事務中。
undo_log
是須要在業務庫上建立的一個表,fescar 依賴該表記錄每筆分支事務的狀態及二階段 rollback
的回放數據。不用擔憂該表的數據量過大造成單點問題,在全局事務 commit
的場景下事務對應的 undo_log
會異步刪除。
CREATE TABLE `undo_log` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `branch_id` bigint(20) NOT NULL, `xid` varchar(100) NOT NULL, `rollback_info` longblob NOT NULL, `log_status` int(11) NOT NULL, `log_created` datetime NOT NULL, `log_modified` datetime NOT NULL, `ext` varchar(100) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `ux_undo_log` (`xid`,`branch_id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
前往https://github.com/seata/seata/releases 下載與 Client 版本對應的 fescar-server,避免因爲版本的不一樣致使的協議不一致問題 進入解壓以後的 bin 目錄,執行:
./fescar-server.sh 8091 ../data
啓動成功輸出:
2019-04-09 20:27:24.637 INFO [main]c.a.fescar.core.rpc.netty.AbstractRpcRemotingServer.start:152 -Server started ...
fescar 的加載入口類位於 GlobalTransactionAutoConfiguration,對基於 spring boot 的項目可以自動加載,固然也能夠經過其餘方式示例化 GlobalTransactionScanner
。
@Configuration @EnableConfigurationProperties({FescarProperties.class}) public class GlobalTransactionAutoConfiguration { private final ApplicationContext applicationContext; private final FescarProperties fescarProperties; public GlobalTransactionAutoConfiguration(ApplicationContext applicationContext, FescarProperties fescarProperties) { this.applicationContext = applicationContext; this.fescarProperties = fescarProperties; } /** * 示例化GlobalTransactionScanner * scanner爲client初始化的發起類 */ @Bean public GlobalTransactionScanner globalTransactionScanner() { String applicationName = this.applicationContext.getEnvironment().getProperty("spring.application.name"); String txServiceGroup = this.fescarProperties.getTxServiceGroup(); if (StringUtils.isEmpty(txServiceGroup)) { txServiceGroup = applicationName + "-fescar-service-group"; this.fescarProperties.setTxServiceGroup(txServiceGroup); } return new GlobalTransactionScanner(applicationName, txServiceGroup); } }
能夠看到支持一個配置項FescarProperties,用於配置事務分組名稱:
spring.cloud.alibaba.fescar.tx-service-group=my_test_tx_group
若是不指定服務組,則默認使用spring.application.name+ -fescar-service-group生成名稱,因此不指定spring.application.name啓動會報錯。
@ConfigurationProperties("spring.cloud.alibaba.fescar") public class FescarProperties { private String txServiceGroup; public FescarProperties() { } public String getTxServiceGroup() { return this.txServiceGroup; } public void setTxServiceGroup(String txServiceGroup) { this.txServiceGroup = txServiceGroup; } }
獲取 applicationId 和 txServiceGroup 後,建立 GlobalTransactionScanner 對象,主要看類中 initClient 方法。
private void initClient() { if (StringUtils.isNullOrEmpty(applicationId) || StringUtils.isNullOrEmpty(txServiceGroup)) { throw new IllegalArgumentException( "applicationId: " + applicationId + ", txServiceGroup: " + txServiceGroup); } //init TM TMClient.init(applicationId, txServiceGroup); //init RM RMClient.init(applicationId, txServiceGroup); }
方法中能夠看到初始化了 TMClient
和 RMClient
,對於一個服務既能夠是TM角色也能夠是RM角色,至於何時是 TM 或者 RM 則要看在一次全局事務中 @GlobalTransactional
註解標註在哪。 Client 建立的結果是與 TC 的一個 Netty 鏈接,因此在啓動日誌中能夠看到兩個 Netty Channel,其中標明瞭 transactionRole 分別爲 TMROLE
和 RMROLE
。
2019-04-09 13:42:57.417 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : NettyPool create channel to {"address":"127.0.0.1:8091","message":{"applicationId":"business-service","byteBuffer":{"char":"\u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"transactionServiceGroup":"my_test_tx_group","typeCode":101,"version":"0.4.1"},"transactionRole":"TMROLE"} 2019-04-09 13:42:57.505 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : NettyPool create channel to {"address":"127.0.0.1:8091","message":{"applicationId":"business-service","byteBuffer":{"char":"\u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"transactionServiceGroup":"my_test_tx_group","typeCode":103,"version":"0.4.1"},"transactionRole":"RMROLE"} 2019-04-09 13:42:57.629 DEBUG 93715 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:RegisterTMRequest{applicationId='business-service', transactionServiceGroup='my_test_tx_group'} 2019-04-09 13:42:57.629 DEBUG 93715 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:RegisterRMRequest{resourceIds='null', applicationId='business-service', transactionServiceGroup='my_test_tx_group'} 2019-04-09 13:42:57.699 DEBUG 93715 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:1 2019-04-09 13:42:57.699 DEBUG 93715 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:2 2019-04-09 13:42:57.701 DEBUG 93715 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.RmRpcClient@3b06d101 msgId:1, future :com.alibaba.fescar.core.protocol.MessageFuture@28bb1abd, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null 2019-04-09 13:42:57.701 DEBUG 93715 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.TmRpcClient@65fc3fb7 msgId:2, future :com.alibaba.fescar.core.protocol.MessageFuture@9a1e3df, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null 2019-04-09 13:42:57.710 INFO 93715 --- [imeoutChecker_1] c.a.fescar.core.rpc.netty.RmRpcClient : register RM success. server version:0.4.1,channel:[id: 0xe6468995, L:/127.0.0.1:57397 - R:/127.0.0.1:8091] 2019-04-09 13:42:57.710 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : register success, cost 114 ms, version:0.4.1,role:TMROLE,channel:[id: 0xd22fe0c5, L:/127.0.0.1:57398 - R:/127.0.0.1:8091] 2019-04-09 13:42:57.711 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : register success, cost 125 ms, version:0.4.1,role:RMROLE,channel:[id: 0xe6468995, L:/127.0.0.1:57397 - R:/127.0.0.1:8091]
日誌中能夠看到
RmRpcClient
、TmRpcClient
成功實例化在本例中,TM 的角色是 business-service, BusinessService 的 purchase 方法標註了 @GlobalTransactional
註解:
@Service public class BusinessService { @Autowired private StorageFeignClient storageFeignClient; @Autowired private OrderFeignClient orderFeignClient; @GlobalTransactional public void purchase(String userId, String commodityCode, int orderCount){ storageFeignClient.deduct(commodityCode, orderCount); orderFeignClient.create(userId, commodityCode, orderCount); } }
方法調用後將會建立一個全局事務,首先關注 @GlobalTransactional
註解的做用,在 GlobalTransactionalInterceptor 中被攔截處理。
/** * AOP攔截方法調用 */ @Override public Object invoke(final MethodInvocation methodInvocation) throws Throwable { Class<?> targetClass = (methodInvocation.getThis() != null ? AopUtils.getTargetClass(methodInvocation.getThis()) : null); Method specificMethod = ClassUtils.getMostSpecificMethod(methodInvocation.getMethod(), targetClass); final Method method = BridgeMethodResolver.findBridgedMethod(specificMethod); //獲取方法GlobalTransactional註解 final GlobalTransactional globalTransactionalAnnotation = getAnnotation(method, GlobalTransactional.class); final GlobalLock globalLockAnnotation = getAnnotation(method, GlobalLock.class); //若是方法有GlobalTransactional註解,則攔截到相應方法處理 if (globalTransactionalAnnotation != null) { return handleGlobalTransaction(methodInvocation, globalTransactionalAnnotation); } else if (globalLockAnnotation != null) { return handleGlobalLock(methodInvocation); } else { return methodInvocation.proceed(); } }
handleGlobalTransaction
方法中對 TransactionalTemplate 的 execute 進行了調用,從類名能夠看到這是一個標準的模版方法,它定義了 TM 對全局事務處理的標準步驟,註釋已經比較清楚了。
public Object execute(TransactionalExecutor business) throws TransactionalExecutor.ExecutionException { // 1\. get or create a transaction GlobalTransaction tx = GlobalTransactionContext.getCurrentOrCreate(); try { // 2\. begin transaction try { triggerBeforeBegin(); tx.begin(business.timeout(), business.name()); triggerAfterBegin(); } catch (TransactionException txe) { throw new TransactionalExecutor.ExecutionException(tx, txe, TransactionalExecutor.Code.BeginFailure); } Object rs = null; try { // Do Your Business rs = business.execute(); } catch (Throwable ex) { // 3\. any business exception, rollback. try { triggerBeforeRollback(); tx.rollback(); triggerAfterRollback(); // 3.1 Successfully rolled back throw new TransactionalExecutor.ExecutionException(tx, TransactionalExecutor.Code.RollbackDone, ex); } catch (TransactionException txe) { // 3.2 Failed to rollback throw new TransactionalExecutor.ExecutionException(tx, txe, TransactionalExecutor.Code.RollbackFailure, ex); } } // 4\. everything is fine, commit. try { triggerBeforeCommit(); tx.commit(); triggerAfterCommit(); } catch (TransactionException txe) { // 4.1 Failed to commit throw new TransactionalExecutor.ExecutionException(tx, txe, TransactionalExecutor.Code.CommitFailure); } return rs; } finally { //5\. clear triggerAfterCompletion(); cleanUp(); } }
經過 DefaultGlobalTransaction 的 begin 方法開啓全局事務。
public void begin(int timeout, String name) throws TransactionException { if (role != GlobalTransactionRole.Launcher) { check(); if (LOGGER.isDebugEnabled()) { LOGGER.debug("Ignore Begin(): just involved in global transaction [" + xid + "]"); } return; } if (xid != null) { throw new IllegalStateException(); } if (RootContext.getXID() != null) { throw new IllegalStateException(); } //具體開啓事務的方法,獲取TC返回的XID xid = transactionManager.begin(null, null, name, timeout); status = GlobalStatus.Begin; RootContext.bind(xid); if (LOGGER.isDebugEnabled()) { LOGGER.debug("Begin a NEW global transaction [" + xid + "]"); } }
方法開頭處if (role != GlobalTransactionRole.Launcher)
對 role 的判斷有關鍵的做用,代表當前是全局事務的發起者(Launcher)仍是參與者(Participant)。若是在分佈式事務的下游系統方法中也加上@GlobalTransactional
註解,那麼它的角色就是 Participant,會忽略後面的 begin 直接 return,而判斷是 Launcher 仍是 Participant 是根據當前上下文是否已存在 XID 來判斷,沒有 XID 的就是 Launcher,已經存在 XID的就是 Participant。因而可知,全局事務的建立只能由 Launcher 執行,而一次分佈式事務中也只有一個Launcher 存在。
DefaultTransactionManager負責 TM 與 TC 通信,發送 begin、commit、rollback 指令。
@Override public String begin(String applicationId, String transactionServiceGroup, String name, int timeout) throws TransactionException { GlobalBeginRequest request = new GlobalBeginRequest(); request.setTransactionName(name); request.setTimeout(timeout); GlobalBeginResponse response = (GlobalBeginResponse)syncCall(request); return response.getXid(); }
至此拿到 fescar-server 返回的 XID 表示一個全局事務建立成功,日誌中也反應了上述流程。
2019-04-09 13:46:57.417 DEBUG 31326 --- [nio-8084-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting : offer message: timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int) 2019-04-09 13:46:57.417 DEBUG 31326 --- [geSend_TMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : write message:FescarMergeMessage timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int), channel:[id: 0xa148545e, L:/127.0.0.1:56120 - R:/127.0.0.1:8091],active?true,writable?true,isopen?true 2019-04-09 13:46:57.418 DEBUG 31326 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:FescarMergeMessage timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int) 2019-04-09 13:46:57.421 DEBUG 31326 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:MergeResultMessage com.alibaba.fescar.core.protocol.transaction.GlobalBeginResponse@2dc480dc,messageId:1196 2019-04-09 13:46:57.421 DEBUG 31326 --- [nio-8084-exec-1] c.a.fescar.core.context.RootContext : bind 192.168.224.93:8091:2008502699 2019-04-09 13:46:57.421 DEBUG 31326 --- [nio-8084-exec-1] c.a.f.tm.api.DefaultGlobalTransaction : Begin a NEW global transaction [192.168.224.93:8091:2008502699]
全局事務建立後,就開始執行 business.execute(),即業務代碼storageFeignClient.deduct(commodityCode, orderCount)
進入 RM 處理流程,此處的業務邏輯爲調用 storage-service 的扣減庫存接口。
@GetMapping(path = "/deduct") public Boolean deduct(String commodityCode, Integer count){ storageService.deduct(commodityCode,count); return true; } @Transactional public void deduct(String commodityCode, int count){ Storage storage = storageDAO.findByCommodityCode(commodityCode); storage.setCount(storage.getCount()-count); storageDAO.save(storage); }
storage 的接口和 service 方法並未出現 fescar 相關的代碼和註解,體現了 fescar 的無侵入。那它是如何加入到此次全局事務中的呢?答案在ConnectionProxy中,這也是前面說爲何必需要使用DataSourceProxy
的緣由,經過 DataSourceProxy 才能在業務代碼的本地事務提交時,fescar 經過該切入點,向 TC 註冊分支事務併發送 RM 的處理結果。
因爲業務代碼自己的事務提交被ConnectionProxy
代理實現,因此在提交本地事務時,實際執行的是ConnectionProxy 的 commit 方法。
public void commit() throws SQLException { //若是當前是全局事務,則執行全局事務的提交 //判斷是否是全局事務,就是看當前上下文是否存在XID if (context.inGlobalTransaction()) { processGlobalTransactionCommit(); } else if (context.isGlobalLockRequire()) { processLocalCommitWithGlobalLocks(); } else { targetConnection.commit(); } } private void processGlobalTransactionCommit() throws SQLException { try { //首先是向TC註冊RM,拿到TC分配的branchId register(); } catch (TransactionException e) { recognizeLockKeyConflictException(e); } try { if (context.hasUndoLog()) { //寫入undolog UndoLogManager.flushUndoLogs(this); } //提交本地事務,寫入undo_log和業務數據在同一個本地事務中 targetConnection.commit(); } catch (Throwable ex) { //向TC發送RM的事務處理失敗的通知 report(false); if (ex instanceof SQLException) { throw new SQLException(ex); } } //向TC發送RM的事務處理成功的通知 report(true); context.reset(); } private void register() throws TransactionException { //註冊RM,構建request經過netty向TC發送註冊指令 Long branchId = DefaultResourceManager.get().branchRegister(BranchType.AT, getDataSourceProxy().getResourceId(), null, context.getXid(), null, context.buildLockKeys()); //將返回的branchId存在上下文中 context.setBranchId(branchId); }
經過日誌印證一下上面的流程。
2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor : xid in RootContext null xid in RpcContext 192.168.0.2:8091:2008546211 2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] c.a.fescar.core.context.RootContext : bind 192.168.0.2:8091:2008546211 2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor : bind 192.168.0.2:8091:2008546211 to RootContext 2019-04-09 21:57:48.386 INFO 38933 --- [nio-8081-exec-1] o.h.h.i.QueryTranslatorFactoryInitiator : HHH000397: Using ASTQueryTranslatorFactory Hibernate: select storage0_.id as id1_0_, storage0_.commodity_code as commodit2_0_, storage0_.count as count3_0_ from storage_tbl storage0_ where storage0_.commodity_code=? Hibernate: update storage_tbl set count=? where id=? 2019-04-09 21:57:48.673 INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient : will connect to 192.168.0.2:8091 2019-04-09 21:57:48.673 INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient : RM will register :jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false 2019-04-09 21:57:48.673 INFO 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.NettyPoolableFactory : NettyPool create channel to {"address":"192.168.0.2:8091","message":{"applicationId":"storage-service","byteBuffer":{"char":"\u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"resourceIds":"jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false","transactionServiceGroup":"hello-service-fescar-service-group","typeCode":103,"version":"0.4.0"},"transactionRole":"RMROLE"} 2019-04-09 21:57:48.677 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:RegisterRMRequest{resourceIds='jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false', applicationId='storage-service', transactionServiceGroup='hello-service-fescar-service-group'} 2019-04-09 21:57:48.680 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:9 2019-04-09 21:57:48.680 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.RmRpcClient@7d61f5d4 msgId:9, future :com.alibaba.fescar.core.protocol.MessageFuture@186cd3e0, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null 2019-04-09 21:57:48.680 INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient : register RM success. server version:0.4.1,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091] 2019-04-09 21:57:48.680 INFO 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.NettyPoolableFactory : register success, cost 3 ms, version:0.4.1,role:RMROLE,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091] 2019-04-09 21:57:48.680 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting : offer message: transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1 2019-04-09 21:57:48.681 DEBUG 38933 --- [geSend_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : write message:FescarMergeMessage transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1, channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091],active?true,writable?true,isopen?true 2019-04-09 21:57:48.681 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:FescarMergeMessage transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1 2019-04-09 21:57:48.687 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:MergeResultMessage BranchRegisterResponse: transactionId=2008546211,branchId=2008546212,result code =Success,getMsg =null,messageId:11 2019-04-09 21:57:48.702 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.rm.datasource.undo.UndoLogManager : Flushing UNDO LOG: {"branchId":2008546212,"sqlUndoLogs":[{"afterImage":{"rows":[{"fields":[{"keyType":"PrimaryKey","name":"id","type":4,"value":1},{"keyType":"NULL","name":"count","type":4,"value":993}]}],"tableName":"storage_tbl"},"beforeImage":{"rows":[{"fields":[{"keyType":"PrimaryKey","name":"id","type":4,"value":1},{"keyType":"NULL","name":"count","type":4,"value":994}]}],"tableName":"storage_tbl"},"sqlType":"UPDATE","tableName":"storage_tbl"}],"xid":"192.168.0.2:8091:2008546211"} 2019-04-09 21:57:48.755 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting : offer message: transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null 2019-04-09 21:57:48.755 DEBUG 38933 --- [geSend_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : write message:FescarMergeMessage transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null, channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091],active?true,writable?true,isopen?true 2019-04-09 21:57:48.756 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:FescarMergeMessage transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null 2019-04-09 21:57:48.758 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:MergeResultMessage com.alibaba.fescar.core.protocol.transaction.BranchReportResponse@582a08cf,messageId:13 2019-04-09 21:57:48.799 DEBUG 38933 --- [nio-8081-exec-1] c.a.fescar.core.context.RootContext : unbind 192.168.0.2:8091:2008546211 2019-04-09 21:57:48.799 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor : unbind 192.168.0.2:8091:2008546211 from RootContext
其中第 1 步和第 9 步,是在FescarHandlerInterceptor中完成的,該類並不屬於 fescar,是前面提到的 spring-cloud-alibaba-fescar,它實現了基於 feign、rest 通訊時將 xid bind 和 unbind 到當前請求上下文中。到這裏 RM 完成了 PhaseOne 階段的工做,接着看 PhaseTwo 階段的處理邏輯。
各分支事務執行完成後,TC 對各 RM 的彙報結果進行彙總,給各 RM 發送 commit 或 rollback 的指令。
2019-04-09 21:57:49.813 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null,messageId:1 2019-04-09 21:57:49.813 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.RmRpcClient@7d61f5d4 msgId:1, body:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null 2019-04-09 21:57:49.814 INFO 38933 --- [atch_RMROLE_1_8] c.a.f.core.rpc.netty.RmMessageListener : onMessage:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null 2019-04-09 21:57:49.816 INFO 38933 --- [atch_RMROLE_1_8] com.alibaba.fescar.rm.AbstractRMHandler : Branch committing: 192.168.0.2:8091:2008546211 2008546212 jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false null 2019-04-09 21:57:49.816 INFO 38933 --- [atch_RMROLE_1_8] com.alibaba.fescar.rm.AbstractRMHandler : Branch commit result: PhaseTwo_Committed 2019-04-09 21:57:49.817 INFO 38933 --- [atch_RMROLE_1_8] c.a.fescar.core.rpc.netty.RmRpcClient : RmRpcClient sendResponse branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null 2019-04-09 21:57:49.817 DEBUG 38933 --- [atch_RMROLE_1_8] c.a.f.c.rpc.netty.AbstractRpcRemoting : send response:branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091] 2019-04-09 21:57:49.817 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null
從日誌中能夠看到
具體看下二階段 commit 的執行過程,在AbstractRMHandler類的 doBranchCommit 方法:
/** * 拿到通知的xid、branchId等關鍵參數 * 而後調用RM的branchCommit */ protected void doBranchCommit(BranchCommitRequest request, BranchCommitResponse response) throws TransactionException { String xid = request.getXid(); long branchId = request.getBranchId(); String resourceId = request.getResourceId(); String applicationData = request.getApplicationData(); LOGGER.info("Branch committing: " + xid + " " + branchId + " " + resourceId + " " + applicationData); BranchStatus status = getResourceManager().branchCommit(request.getBranchType(), xid, branchId, resourceId, applicationData); response.setBranchStatus(status); LOGGER.info("Branch commit result: " + status); }
最終會將 branchCommit 的請求調用到AsyncWorker的 branchCommit 方法。AsyncWorker 的處理方式是fescar 架構的一個關鍵部分,由於大部分事務都是會正常提交的,因此在 PhaseOne 階段就已經結束了,這樣就能夠將鎖最快的釋放。PhaseTwo 階段接收 commit 的指令後,異步處理便可。將 PhaseTwo 的時間消耗排除在一次分佈式事務以外。
private static final List<Phase2Context> ASYNC_COMMIT_BUFFER = Collections.synchronizedList( new ArrayList<Phase2Context>()); /** * 將須要提交的XID加入list */ @Override public BranchStatus branchCommit(BranchType branchType, String xid, long branchId, String resourceId, String applicationData) throws TransactionException { if (ASYNC_COMMIT_BUFFER.size() < ASYNC_COMMIT_BUFFER_LIMIT) { ASYNC_COMMIT_BUFFER.add(new Phase2Context(branchType, xid, branchId, resourceId, applicationData)); } else { LOGGER.warn("Async commit buffer is FULL. Rejected branch [" + branchId + "/" + xid + "] will be handled by housekeeping later."); } return BranchStatus.PhaseTwo_Committed; } /** * 經過定時任務消費list中的XID */ public synchronized void init() { LOGGER.info("Async Commit Buffer Limit: " + ASYNC_COMMIT_BUFFER_LIMIT); timerExecutor = new ScheduledThreadPoolExecutor(1, new NamedThreadFactory("AsyncWorker", 1, true)); timerExecutor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { doBranchCommits(); } catch (Throwable e) { LOGGER.info("Failed at async committing ... " + e.getMessage()); } } }, 10, 1000 * 1, TimeUnit.MILLISECONDS); } private void doBranchCommits() { if (ASYNC_COMMIT_BUFFER.size() == 0) { return; } Map<String, List<Phase2Context>> mappedContexts = new HashMap<>(); Iterator<Phase2Context> iterator = ASYNC_COMMIT_BUFFER.iterator(); //一次定時循環取出ASYNC_COMMIT_BUFFER中的全部待辦數據 //以resourceId做爲key分組待commit數據,resourceId是一個數據庫的鏈接url //在前面的日誌中能夠看到,目的是爲了覆蓋應用的多數據源建立 while (iterator.hasNext()) { Phase2Context commitContext = iterator.next(); List<Phase2Context> contextsGroupedByResourceId = mappedContexts.get(commitContext.resourceId); if (contextsGroupedByResourceId == null) { contextsGroupedByResourceId = new ArrayList<>(); mappedContexts.put(commitContext.resourceId, contextsGroupedByResourceId); } contextsGroupedByResourceId.add(commitContext); iterator.remove(); } for (Map.Entry<String, List<Phase2Context>> entry : mappedContexts.entrySet()) { Connection conn = null; try { try { //根據resourceId獲取數據源以及鏈接 DataSourceProxy dataSourceProxy = DataSourceManager.get().get(entry.getKey()); conn = dataSourceProxy.getPlainConnection(); } catch (SQLException sqle) { LOGGER.warn("Failed to get connection for async committing on " + entry.getKey(), sqle); continue; } List<Phase2Context> contextsGroupedByResourceId = entry.getValue(); for (Phase2Context commitContext : contextsGroupedByResourceId) { try { //執行undolog的處理,即刪除xid、branchId對應的記錄 UndoLogManager.deleteUndoLog(commitContext.xid, commitContext.branchId, conn); } catch (Exception ex) { LOGGER.warn( "Failed to delete undo log [" + commitContext.branchId + "/" + commitContext.xid + "]", ex); } } } finally { if (conn != null) { try { conn.close(); } catch (SQLException closeEx) { LOGGER.warn("Failed to close JDBC resource while deleting undo_log ", closeEx); } } } } }
因此對於commit動做的處理,RM只需刪除xid、branchId對應的undo_log便可。
對於rollback場景的觸發有兩種狀況
report(false)
的狀況@GlobalTransactional
註解的方法捕獲到的異常。在前面TransactionalTemplate類的execute模版方法中,對business.execute()的調用進行了catch,catch後會調用rollback,由TM通知TC對應XID須要回滾事務public void rollback() throws TransactionException { //只有Launcher能發起這個rollback if (role == GlobalTransactionRole.Participant) { // Participant has no responsibility of committing if (LOGGER.isDebugEnabled()) { LOGGER.debug("Ignore Rollback(): just involved in global transaction [" + xid + "]"); } return; } if (xid == null) { throw new IllegalStateException(); } status = transactionManager.rollback(xid); if (RootContext.getXID() != null) { if (xid.equals(RootContext.getXID())) { RootContext.unbind(); } } }
TC 彙總後向參與者發送 rollback 指令,RM 在AbstractRMHandler類的 doBranchRollback 方法中接收這個rollback 的通知。
protected void doBranchRollback(BranchRollbackRequest request, BranchRollbackResponse response) throws TransactionException { String xid = request.getXid(); long branchId = request.getBranchId(); String resourceId = request.getResourceId(); String applicationData = request.getApplicationData(); LOGGER.info("Branch rolling back: " + xid + " " + branchId + " " + resourceId); BranchStatus status = getResourceManager().branchRollback(request.getBranchType(), xid, branchId, resourceId, applicationData); response.setBranchStatus(status); LOGGER.info("Branch rollback result: " + status); }
而後將 rollback 請求傳遞到DataSourceManager
類的 branchRollback 方法。
public BranchStatus branchRollback(BranchType branchType, String xid, long branchId, String resourceId, String applicationData) throws TransactionException { //根據resourceId獲取對應的數據源 DataSourceProxy dataSourceProxy = get(resourceId); if (dataSourceProxy == null) { throw new ShouldNeverHappenException(); } try { UndoLogManager.undo(dataSourceProxy, xid, branchId); } catch (TransactionException te) { if (te.getCode() == TransactionExceptionCode.BranchRollbackFailed_Unretriable) { return BranchStatus.PhaseTwo_RollbackFailed_Unretryable; } else { return BranchStatus.PhaseTwo_RollbackFailed_Retryable; } } return BranchStatus.PhaseTwo_Rollbacked; }
最終會執行UndoLogManager類的 undo 方法,由於是純 jdbc 操做代碼比較長就不貼出來了,能夠經過鏈接到github 查看源碼,說一下 undo 的具體流程:
GlobalFinished
的 undo_log。出現沒找到的緣由多是 PhaseOne 階段的本地事務異常了,致使沒有正常寫入。 由於 xid 和 branchId 是惟一索引,因此第 4步的插入,能夠防止 PhaseOne 階段恢復後的成功寫入,那麼 PhaseOne 階段就會異常,這樣一來業務數據也就不會提交成功,數據達到了最終回滾了的效果。本地結合分佈式業務場景,分析了 fescar client 側的主要處理流程,對 TM 和 RM 角色的主要源碼進行了解析,但願能對你們理解 fescar 的工做原理有所幫助。
隨着 fescar 的快速迭代以及後期 Roadmap 規劃的不斷完善,假以時日,相信 fescar 可以成爲開源分佈式事務的標杆解決方案。
本文爲雲棲社區原創內容,未經容許不得轉載。