本系列是 我TM人傻了 系列第三期[捂臉],往期精彩回顧:java
最近組裏用第三方給的 SDK 搞了點開發,最近線上忽然開始報錯,而且發現一個特別奇怪的問題,組員和我說,代碼運行到一半不走了,跳過了一段(這代碼是剛參加東奧會參加跳遠麼???)。git
代碼以下,邏輯很是簡單:github
try { log.info("initiate client with conf: {}", conf); SDKClient client = new SDKClient(conf); client.init(); log.info("client initiated"); } catch (Exception e) { log.error("initiate client failed", e); } log.info("start to manipulate...");
咱們發現 client 實際上沒有初始化成功,後面的業務處理一直在報錯。查看日誌,發現:spring
initiate client with conf: xxxxx start to manipulate...
這就是組員說的代碼發生了跳躍。由於既沒有打印 client initiated
,也沒有打印 initiate client failed
...就直接 start to manipulate...
了。express
老讀者知道,咱們的線上是 k8s + Docker,而且每一個鏡像中內置了 Arthas,而且 Java 版本是 Java 16,而且啓用了 JFR。日誌中具備鏈路信息,經過 ELK Agent 拉取到統一日誌服務器。apache
這個 SDK 裏面要訪問的遠程地址都有 IP 白名單,咱們爲了安全本地並不能直接使用 SDK 訪問對方的線上環境。在本地測試鏈接的是對方的測試環境,是沒有問題的。因此這裏,咱們仍是得經過 Arthas 進行定位。編程
首先得看看線上運行的源碼是否和本地咱們看到的一致呢?這個能夠經過 jad 命令:api
jad 要看的類全限定名稱
查看後發現,反編譯後的代碼,和咱們的源碼一致誒。緩存
而後咱們看看代碼的實際執行:安全
trace 要看的類全限定名稱 方法
以後從新執行這個方法,查看 trace 發現,初始化的時候確實拋出異常了:
# 省略咱們這裏不關心的 +---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #130 +---[min=599.388978ms,max=630.23967ms,total=1229.628648ms,count=2] com.dasha13.sdk.SDKClient:<init>() #131 +---[min=203.617545ms,max=221.785512ms,total=425.403057ms,count=2] com.dasha13.sdk.SDKClient:init() #132 [throws Exception,2] +---[min=0.034798ms,max=0.084505ms,total=0.119303ms,count=2] org.apache.logging.log4j.Logger:error() #136 +---[min=0.010174ms,max=0.01184ms,total=0.022014ms,count=2] org.apache.logging.log4j.Logger:info() #138
可是,這個異常日誌,爲什麼沒有打印出來呢?咱們繼續查看下這個異常,使用 watch 方法,而且指定查看深度爲 2,這樣指望能打印出堆棧以及 Message:
watch com.dasha13.sdk.SDKClient init {throwExp} -x 2
可是,這裏只打印了一個看似是 Message 的信息:
method=com.dasha13.sdk.SDKClient init location=AtExceptionExit ts=2021-08-10 02:58:15; [cost=131.20209ms] result=ERROR DATA!!! object class: class java.util.ArrayList, exception class: class com.google.common.util.concurrent.UncheckedExecutionException, exception message: java.lang.IllegalArgumentException
這很奇怪,正常來講,指定深度爲 2,若是有異常拋出,那麼這個輸出信息,會包含異常的 Message 以及堆棧信息的。這是怎麼回事呢?咱們來分別獲取堆棧以及信息試試:
首先獲取堆棧:
watch com.dasha13.sdk.SDKClient init {throwExp.getStackTrace()} -x 2
從新執行出問題的方法,堆棧正常輸出,沒啥問題,不過看堆棧應該問題和 Google 的依賴翻轉 Bean 管理框架(相似於 Spring) Guice 載入某個 Bean 出異常有關:
ts=2021-08-10 03:03:37; [cost=146.644563ms] result=@ArrayList[ @StackTraceElement[][ @StackTraceElement[com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1025)], @StackTraceElement[com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1051)], @StackTraceElement[com.dasha13.sdk.SDKClient.init(SDKClient.java:482)], # 省略以後的
再來看異常信息:
watch com.dasha13.sdk.SDKClient init {throwExp.getMessage()} -x 2
從新執行出問題的方法,這時候發現 watch 失敗:
watch failed, condition is: null, express is: {throwExp.getMessage()}, com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException, visit /app/arthas/arthas.log for more details.
咱們按照提示,查看 arthas 日誌,發現的異常堆棧:
2021-08-10 03:07:11 [XNIO-2 task-3] ERROR c.t.a.c.command.express.OgnlExpress -Error during evaluating the expression: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) at com.google.common.cache.LocalCache.get(LocalCache.java:3937) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830) at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66) at com.google.inject.internal.Errors.formatSource(Errors.java:806) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839) at com.google.inject.internal.Errors.formatSource(Errors.java:800) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.format(Errors.java:584) at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60) cause by: MethodNotFoundException: Method not found: class com.google.common.xxxxxxxxx
咱們發現,竟然是 ProvisionException 的 getMessage()
發生了異常,也就是異常的 getMessage()
發生了異常.查看異常的 Cause 咱們也定位出來,是 Guava 版本與 guice 版本不兼容致使,其根本緣由是三方接口超時,致使初始化異常,有異常拋出被封裝成 ProvisionException
, ProvisionException
異常的 getMessage
依賴 Guava Cache 緩存一些異常信息,可是咱們項目中 Guava 版本與 guice 版本不兼容,致使某些方法不存在,因此 ProvisionException
異常的 getMessage
也會有異常。以前運行沒問題是由於三方沒有尚未過初始化的時候接口超時拋異常。。。
咱們使用的 log4j2 異步日誌配置,而且將異常做爲最後一個參數傳入日誌方法中,正常狀況下,會輸出這個異常的 Message 以及異常堆棧.但從上面的分析咱們知道,獲取 Message 的時候,拋出了異常。Log4j 的設計是使用了日誌事件的生產消費這種架構。這裏是消費者獲取異常的 Message 以及異常堆棧,而且在獲取 Message 的時候,發現有異常。對於 Log4j2 異步日誌,發現有異常的時候,原有日誌事件會被直接拋棄,並將異常輸出到 StatusLogger 中(底層其實就是標準異常輸出)中,這裏對應 log4j 的源碼:
private void tryCallAppender(final LogEvent event) { try { //調用 appender 輸出日誌 appender.append(event); } catch (final RuntimeException error) { //處理 RuntimeException handleAppenderError(event, error); } catch (final Exception error) { //處理其餘 Exception handleAppenderError(event, new AppenderLoggingException(error)); } } private void handleAppenderError(final LogEvent event, final RuntimeException ex) { appender.getHandler().error(createErrorMsg("An exception occurred processing Appender "), event, ex); if (!appender.ignoreExceptions()) { throw ex; } }
ErrorHandler 通常都是默認實現,即 DefaultErrorHandler;DefaultErrorHandler 是輸出到一個 StatusLogger:
private static final Logger LOGGER = StatusLogger.getLogger(); public void error(final String msg, final LogEvent event, final Throwable t) { final long current = System.nanoTime(); if (current - lastException > EXCEPTION_INTERVAL || exceptionCount++ < MAX_EXCEPTIONS) { LOGGER.error(msg, t); } lastException = current; if (!appender.ignoreExceptions() && t != null && !(t instanceof AppenderLoggingException)) { throw new AppenderLoggingException(msg, t); } }
StatusLogger 其實就是標準異常輸出 System.err
:
this.logger = new SimpleLogger("StatusLogger", Level.ERROR, false, true, showDateTime, false, dateFormat, messageFactory, PROPS, //標準異常輸出 System.err);
咱們部署架構中,將標準異常輸出放到了一個很偏僻的位置,基本沒有人看,因此沒注意到。。。查看標準異常輸出,會發現的確有異常:
2021-08-10 03:30:29,810 Log4j2-TF-10-AsyncLoggerConfig-3 ERROR An exception occurred processing Appender file com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) at com.google.common.cache.LocalCache.get(LocalCache.java:3937) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830) at com.google.inject.internal.util.StackTraceElements.forMember(StackTraceElements.java:66) at com.google.inject.internal.Errors.formatSource(Errors.java:806) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.formatInjectionPoint(Errors.java:839) at com.google.inject.internal.Errors.formatSource(Errors.java:800) at com.google.inject.internal.Errors.formatSource(Errors.java:785) at com.google.inject.internal.Errors.format(Errors.java:584) at com.google.inject.ProvisionException.getMessage(ProvisionException.java:60) at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:105) at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:93) at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629) at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63) at org.springframework.boot.logging.log4j2.ExtendedWhitespaceThrowablePatternConverter.format(ExtendedWhitespaceThrowablePatternConverter.java:50) at org.apache.logging.log4j.core.pattern.PatternFormatter.format(PatternFormatter.java:38) at org.apache.logging.log4j.core.layout.PatternLayout$PatternSerializer.toSerializable(PatternLayout.java:345) at org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:244) at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:229) at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:197) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181) at org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:312) at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156) at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129) at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:543) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.callAppenders(AsyncLoggerConfig.java:127) at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:502) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:485) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log(AsyncLoggerConfig.java:121) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncLoggerConfigsOnCurrentThread(AsyncLoggerConfig.java:169) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:111) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor$Log4jEventWrapperHandler.onEvent(AsyncLoggerConfigDisruptor.java:97) at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.IllegalArgumentException at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source) at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source) at com.google.inject.internal.asm.$ClassReader.<init>(Unknown Source) at com.google.inject.internal.util.LineNumbers.<init>(LineNumbers.java:65) at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:46) at com.google.inject.internal.util.StackTraceElements$1.load(StackTraceElements.java:43) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) ... 41 more
而且,在這以後,會根據 Appender 的 ignoreExceptions 配置(默認都是 true),決定調用日誌方法的地方是否會拋出異常,但這個是針對同步日誌的,異步日誌即將異常拋到 Disruptor 的異常處理器,Log4j2 Disruptor 的異常處理也是將異常輸出到 System.err
也就是標準異常輸出。默認狀況下是不拋出的,畢竟對於同步日誌沒人但願由於日誌有異常就讓業務不能正常進行,異步日誌因爲前面的處理已經輸出到標準異常輸出這裏就不必畫蛇添足了。
微信搜索「個人編程喵」關注公衆號,每日一刷,輕鬆提高技術,斬獲各類offer: