MysqlIO.readFully問題排查分析過程

問題描述

支付清結算系統"外部對帳"部分任務在早上6:00~7:00時間段內對帳速度緩慢,且常常形成對帳統計結果不許確問題,致使結轉記帳不許確。java

初步分析及處理過程

因爲外部對帳是通過性能優化的,且在壓測環境表現良好,因此最開始初步認爲是因爲數據庫性能瓶頸致使,初步處理方式爲將線上6:00~7:00以前的慢SQL進行了一番排查,並對涉及外部系統的慢SQL進行了優化,這個步驟理論上是能夠解決此類問題,但RDS(阿里雲Mysql可能存在慢SQL統計不許確問題)。mysql

特色時間段內線程棧分析

特定時間段線程棧分析,因爲SQL優化部分並無解決此類問題,爲了查看在早上6:00~7:00這個時間段內程序JVM內部到底發生了什麼,因此採起了定時抓取線程棧日誌的手段(具體方式爲在Linux服務器添加cron任務,並累計輸出日誌的方式,腳本如👇)。git

#clearjstack.shgithub

#!/bin/sh
DATE='/bin/date'
timestamp() {
$DATE +'%Y-%m-%d %H:%M:%S'
}
ts=`timestamp`
echo "start log $ts" >> /tmp/clearjstack.log
/usr/java/jdk1.7.0_67/bin/jstack -l `/usr/java/jdk1.7.0_67/bin/jps -lvm | grep tomcat-clear | awk '{print $1}'` >> /tmp/clearjstack.log
ts=`timestamp`
echo "end log $ts" >> /tmp/clearjstack.log
複製代碼

Linux下添加定時任務的方式:spring

//添加計劃任務
[root@123.57.28.241/10.172.233.8 tmp]cd /var/spool/cron/
[root@123.57.28.241/10.172.233.8 cron]pwd
/var/spool/cron/具體用戶名(如爲root用戶)
[root@123.57.28.241/10.172.233.8 cron]vim root
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1
複製代碼
//查看計劃任務
[root@123.57.28.241/10.172.233.8 cron]crontab -l
0 0 * * * /usr/local/script/Cutlog.sh >/dev/null 2>&1
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1
複製代碼

對線程棧日誌分析(1)

#clearjstack.log

start log 2016-11-24 06:20:01
2016-11-24 06:20:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode):

"Thread-70" prio=10 tid=0x00007f21f4022800 nid=0x12da runnable [0x00007f216cace000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
- locked <0x0000000786facbc8> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:493)
at sun.reflect.GeneratedMethodAccessor169.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:55)
at com.sun.proxy.$Proxy84.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70)
at org.apache.ibatis.executor.ReuseExecutor.doQuery(ReuseExecutor.java:54)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:259)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:132)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:115)
at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:46)
at com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor.intercept(OffsetLimitInterceptor.java:66)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:57)
at com.sun.proxy.$Proxy82.query(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:104)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:98)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:62)
at sun.reflect.GeneratedMethodAccessor203.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:358)
at com.sun.proxy.$Proxy29.selectOne(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:163)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:63)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:43)
at com.sun.proxy.$Proxy87.countOrgSettleFlow(Unknown Source)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl.isExistOrgSettleFlow(OrgSettleFlowServiceImpl.java:45)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$FastClassBySpringCGLIB$$c921c9ca.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:700)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$6b2e2fc1.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078c795ed8> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x000000078deebbf0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Thread-69" prio=10 tid=0x00007f21f401c000 nid=0x12d9 waiting for monitor entry [0x00007f216ccd1000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x000000078defb298> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Thread-68" prio=10 tid=0x00007f21f401b000 nid=0x12d8 waiting for monitor entry [0x00007f216c7cc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x000000078deec880> (a java.util.concurrent.ThreadPoolExecutor$Worker)
複製代碼

從上述日誌分析線程Thread-6八、Thread-69都爲阻塞狀態,且阻塞狀態sql

smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
複製代碼

所對應的代碼主要體如今對帳結果寫入部分,另外從日誌數據庫

- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
複製代碼

因此此時很容易以爲問題出如今鏈接池與MySQL數據庫自己的問題上。按照這個思路(JVM日誌很豐富,去僞存真,每每須要嘗試與不斷試錯),開始了對鏈接池自己及MySQL自己問題的排查(Google),如下爲幾種Google結果連接及緣由概述。apache

1)、網絡緣由&MySQL設置&換鏈接池vim

在以上連接所述的緣由中,重點是說了MySQL及網絡方面的緣由。針對這種敘述,與運維大哥溝通後調取了阿里雲RDS MySQL實例的參數設置,以下:tomcat

show global variables like '%timeout%'
複製代碼

最後一個參數wait_timeout代表Mysql服務器與客戶端鏈接空閒自動斷開的超時時間爲24小時,也就是說只有鏈接在與MySQL服務器鏈接後24小時內無任何操做,MySQL服務器纔會主動斷開與客戶端(這裏爲應用鏈接池中與MySQL數據庫創建的鏈接),然而對於應用來講,這種可能性基本不存在,何況大多數鏈接池配置都有對鏈接是否可用以及自動探測鏈接的配置。

查看應用鏈接池配置:

<property name="timeBetweenEvictionRunsMillis" value="3000" />
<property name="minEvictableIdleTimeMillis" value="300000" />
<property name="validationQuery" value="SELECT 'x'" />
<property name="testWhileIdle" value="true" />
<property name="testOnBorrow" value="true" />
<property name="testOnReturn" value="true" />
<property name="poolPreparedStatements" value="false" />
<property name="maxPoolPreparedStatementPerConnectionSize" value="-1" />
複製代碼

按照上述配置基本上不會出現鏈接長時間與MySQL服務器無任何交互的可能性,除非鏈接池自己的處理存在問題(druid鏈接池爲阿里高性能數據庫鏈接池),可是出於懷疑的態度,按照連接中的說法druid在處理長時間鏈接等待方面可能不如c3p0好,也不排除druid自己存在問題。但出於謹慎角度,並無馬上採用對鏈接池進行替換的措施,而是將焦點轉移到應用服務器與MySQL服務器網絡鏈接上(也確實有人是由於網絡防火牆設置致使出現此類問題的)。

2)、網絡鏈接參數調整

因爲阿里雲底層網絡結構並不對用戶透明,且遇到問題的應用存在批量大數據插入的使用場景,確實是存在TCP緩衝區不夠的可能性,經過對MySQL官方文檔相關章節的閱讀,MySQL鏈接串參數中確實提供了tcpRcvBuf這樣的參數,其對該參數的說明是:「connecting using TCP/IP, should the driver set SO_RCV_BUF to the given value? The default value of '0', means use the platform default value for this property)」。 因而以爲這種可能性會相對大些,因此在線上調整了鏈接參數而並無更換鏈接池(如若不行,可再嘗試替換鏈接池),參數調整以下:

jdbc:mysql://xxx.xxx.xx.xx:3306/xx?useUnicode=true&characterEncoding=UTF-8&tcpRcvBuf=1024000&autoReconnect=true&failOverReadOnly=false&connectTimeout=0
複製代碼

鏈接TCP緩衝區的大小被調整爲1M。

對線程棧日誌分析(2)

在4中經過對網絡鏈接參數的調整後,並無起到做用,因而對線程棧日誌再次進行了調取,並進行了地毯式閱讀,終於發現了問題的癥結:

at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$5775a83.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078b4d8e98> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x00000007880399f0> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterPara
複製代碼

經過對比多日問題時段的線程棧日誌,發現都存在兩處代碼都指向同一個方法,即isExistOrgSettleFlow,因而找到代碼所對應的方法,發現是因爲該方法高頻訪問了一個未加相關索引的大數據表,致使方法自己的執行存在阻塞及鎖住的狀況,佔用了大量鏈接資源,從而致使4中所看到的其餘方法拿不到數據庫鏈接出現長時間等待,甚至超時的狀況。加上索引後,隔日觀察問題消失。

總結

  • 對陌生問題的分析,須要經過各種分析手段從而找到問題所在並進行解決(這其中須要運用各種分析工具Linux命令、JDK命令、網絡命令等等);

  • 要心存思辨之心,對傳統的權威的東西在敬畏的同時也要大膽的懷疑並進行分析、嘗試,切不可畏懼不前;

  • 在編碼方面,對代碼類、方法的命名必定要合理、規範,否則往後出現疑難問題在排查時,會給人形成錯覺從而延長解決時間;

相關文章
相關標籤/搜索