記JVM堆外內存泄漏Bug查找

內存泄漏Bug現場

一個作BI數據展現的服務在一個晚上重啓了5次,因爲是經過k8s容器編排,服務掛了之後會自動重啓,因此服務還能繼續提供服務。java

第一時間先上日誌系統查看錯誤日誌,發現以下報錯:react

java.lang.OutOfMemoryError    ERROR    java.lang.OutOfMemoryError: unable to create new native thread 
at java.lang.Thread.start0(Native Method) 
at java.lang.Thread.start(Thread.java:717) 
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.start(CloseableHttpAsyncClientBase.java:83) 
at org.elasticsearch.client.RestClientBuilder.build(RestClientBuilder.java:190) 
at com.xiaohongshu.fls.jbds.handler.JbdsImpl.get_realtime_trend_data_from_es(JbdsImpl.java:637) 
at com.xiaohongshu.fls.jbds.handler.JbdsImpl.get_seller_trend(JbdsImpl.java:316) 
at com.xiaohongshu.fls.jbds.handler.JbdsImpl$$FastClassBySpringCGLIB$$bd2466f7.invoke(<generated>) 
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) 
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:738) 
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) 
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:85) 
at com.xiaohongshu.fls.jbds.aspect.RpcMethodExceptionAspect.requestControllerLog(RpcMethodExceptionAspect.java:33) 
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:629) 
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:618) 
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70) 
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) 
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) 
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) 
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:673) 
at com.xiaohongshu.fls.jbds.handler.JbdsImpl$$EnhancerBySpringCGLIB$$d374b954.get_seller_trend(<generated>) 
at sun.reflect.GeneratedMethodAccessor81.invoke(Unknown Source) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at com.xiaohongshu.infra.rpc.core.ThriftServerBaseProxy.call(ThriftServerBaseProxy.java:119) 
at com.xiaohongshu.infra.rpc.core.ThriftServerCGlibProxy.intercept(ThriftServerCGlibProxy.java:27) 
at com.xiaohongshu.fls.jbds.handler.JbdsImpl$$EnhancerByCGLIB$$19473b8f.get_seller_trend(<generated>) 
at com.xiaohongshu.fls.rpc.jbds.JBusinessDataService$Processor$get_seller_trend.getResult(JBusinessDataService.java:1450) 
at com.xiaohongshu.fls.rpc.jbds.JBusinessDataService$Processor$get_seller_trend.getResult(JBusinessDataService.java:1435) 
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) 
at org.apache.thrift.server.Invocation.run(Invocation.java:18) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

發現是OOM的錯誤,而且有unable to create new native thread的錯誤信息,筆者的第一直覺是建立了大量線程從而致使堆外內存空間不足。spring

隨即去CAT監控系統上去查看線程的活躍狀況,以下圖:apache

alt text

發現服務的活躍線程數達到了1W左右,這顯然有問題。服務自己是一個Thrift Server,開起的worker線程數爲200,再加上一些其餘IO線程,總的線程數不會超過300(服務內本身沒有顯示建立線程或者使用線程池)。segmentfault

查找線索

筆者登陸到線上的Docker實例上,經過jmap -histo:live pid命令,查看JVM中存活的對象,輸出以下內容:安全

num     #instances         #bytes  class name
----------------------------------------------
   1:         19303      656995672  [B
   2:        116670       13942144  [C
   3:         15645        5298648  [I
   4:         10098        3796848  java.lang.Thread
   5:         75817        3639216  java.util.HashMap
   6:        113889        2733336  java.lang.String
   7:         27521        2421848  java.lang.reflect.Method
   8:         68662        2197184  java.util.concurrent.ConcurrentHashMap$Node
   9:         10041        1767032  [J
  10:           372        1525568  [Ljava.nio.ByteBuffer;
  11:         36807        1472280  java.util.LinkedHashMap$Entry
  12:         16504        1352488  [Ljava.util.HashMap$Node;
  13:         40687        1301984  java.lang.ThreadLocal$ThreadLocalMap$Entry
  14:         11495        1277344  java.lang.Class
  15:         29903        1196120  java.util.WeakHashMap$Entry
  16:         70474        1127584  java.lang.Object
  17:         15346         918592  [Ljava.lang.Object;
  18:         51556         824896  java.util.HashSet
  19:         25528         816896  java.util.HashMap$Node
  20:          9989         812176  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
  21:          9661         792032  [Ljava.util.WeakHashMap$Entry;
  22:          9504         760320  org.apache.http.impl.nio.reactor.BaseIOReactor
  23:         30551         733224  java.util.concurrent.ConcurrentLinkedQueue$Node
  24:          9914         713808  sun.nio.ch.EPollArrayWrapper
  25:          9914         713808  sun.nio.ch.EPollSelectorImpl
  26:         29682         712368  java.util.concurrent.ConcurrentLinkedQueue
  27:          3525         694672  [Ljava.util.concurrent.ConcurrentHashMap$Node;
  28:         15664         501248  java.lang.ref.WeakReference
  29:          8364         468384  java.util.LinkedHashMap
  30:          9656         463488  java.util.WeakHashMap
  31:         20064         436536  [Ljava.lang.Class;
  32:          9267         370680  java.security.AccessControlContext
  33:          4856         349632  java.lang.reflect.Field
  34:         10278         328896  java.lang.ref.ReferenceQueue
  35:          9914         317248  sun.nio.ch.AllocatedNativeObject
  36:          4955         317120  java.util.concurrent.ConcurrentHashMap
  37:          7597         303880  java.lang.ref.SoftReference
  38:          9245         293568  [Ljava.security.ProtectionDomain;
  39:         11484         275616  java.util.ArrayList
  40:          3330         239760  org.springframework.core.annotation.AnnotationAttributes
  41:          9989         239736  java.lang.ThreadLocal$ThreadLocalMap
  42:          9951         238824  java.util.Collections$SynchronizedSet
  43:          9922         238128  java.util.BitSet
  44:          9504         228096  org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker
  45:          5460         207944  [Ljava.lang.String;
  46:          2459         196720  java.lang.reflect.Constructor
  47:         12103         193648  java.util.HashMap$KeySet
  48:          8003         192072  java.beans.MethodRef
  49:          3657         175536  org.aspectj.weaver.reflect.ShadowMatchImpl
  50:         10339         165424  java.util.concurrent.atomic.AtomicBoolean
  51:         10281         164496  java.lang.ref.ReferenceQueue$Lock
  52:         10203         163248  java.util.Collections$UnmodifiableSet
  53:          2913         163128  java.beans.MethodDescriptor
  54:          2547         161536  [Ljava.lang.reflect.Method;
  55:          9914         158624  java.nio.channels.spi.AbstractSelector$1
  56:          9913         158608  sun.nio.ch.Util$3
  57:          5481         131544  org.springframework.core.MethodClassKey
  58:          3707         118624  java.util.LinkedList
  59:          3637         116384  org.aspectj.weaver.patterns.ExposedState
  60:          2047         114632  java.lang.Class$ReflectionData
  61:          1155         110880  org.springframework.beans.GenericTypeAwarePropertyDescriptor
  62:           399         109984  [Ljava.lang.Thread;
  63:          4387         105288  sun.reflect.generics.tree.SimpleClassTypeSignature
  64:          1388          99936  java.beans.PropertyDescriptor
  65:          4087          90560  [Ljava.lang.reflect.Type;
  66:          5361          85776  org.springframework.core.annotation.AnnotationUtils$DefaultValueHolder
  67:          1296          82944  io.netty.buffer.PoolSubpage
  68:          4387          82520  [Lsun.reflect.generics.tree.TypeArgument;
  69:           674          75488  org.springframework.boot.loader.jar.JarEntry
  70:          1848          73920  java.util.TreeMap$Entry
  71:          3061          73464  sun.reflect.annotation.AnnotationInvocationHandler
  72:          2737          65688  java.util.LinkedList$Node
  73:          3989          63824  sun.reflect.generics.tree.ClassTypeSignature
  74:           858          61776  org.apache.ibatis.mapping.ResultMapping
  75:           832          53248  org.springframework.core.MethodParameter
  76:            78          51168  io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
  77:          2120          50880  java.util.Collections$UnmodifiableRandomAccessList
  78:          3165          50640  java.util.LinkedHashMap$LinkedKeySet
  79:          2419          49728  [Lsun.reflect.generics.tree.FieldTypeSignature;

alt text

標紅的這個類org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker很是值得懷疑,有9504個實例,從類名看是一個實現了Reactor模式的http客戶端,瞭解Reactor模式的同窗都知道,其包含了一個accept線程處理鏈接事件,NIO bounding的線程處理readwrite事件,和Mworker線程處理業務邏輯。多線程

進一步查找

接下來須要查找哪裏使用了這個http客戶端,在build.gradle的文件裏看到依賴了EShttp clientapp

group: 'org.elasticsearch.client', name: 'elasticsearch-rest-client', version: '6.3.2'

經過全局搜索,發現了以下的代碼:dom

private String get_realtime_data_from_es (String seller_id) throws IOException{
    Date current_date = new Date( );
    SimpleDateFormat sdf = new SimpleDateFormat ("yyyyMMdd");
    String date=sdf.format(current_date);
    RestClient restClient = RestClient.builder(
            new HttpHost("xxxxxxxxx", 9200, "http"),
            new HttpHost("xxxxxxxxx", 9201, "http")).build();
    org.elasticsearch.client.Response response = restClient.performRequest("GET", "/seller/"+date+"/"+seller_id);
    restClient.close();
    return EntityUtils.toString(response.getEntity());
}

筆者發現這裏調用ESRestClient的很是可疑,竟然是每次調用建立一個對象,而不是使用單例模式。elasticsearch

使用過RestTemplate或者HttpClient的同窗都知道,Http客戶端通常都是經過單例的方式注入到Spring容器中,客戶端都是線程安全的,多線程狀況下不會出現串包的狀況。(具體線程安全的實現機制,可參考筆者以前的博客:《Http請求鏈接池-HttpClient的AbstractConnPool源碼分析》,地址:https://segmentfault.com/a/11...

源碼探究

直接進到RestClientBuilder類的build()方法中一探究竟,代碼以下:

public RestClient build() {
    if (failureListener == null) {
        failureListener = new RestClient.FailureListener();
    }
    CloseableHttpAsyncClient httpClient = AccessController.doPrivileged(new PrivilegedAction<CloseableHttpAsyncClient>() {
        @Override
        public CloseableHttpAsyncClient run() {
            return createHttpClient();
        }
    });
    RestClient restClient = new RestClient(httpClient, maxRetryTimeout, defaultHeaders, hosts, pathPrefix, failureListener);
    httpClient.start();
    return restClient;
}

build方法是實際建立RestClient的地方,設置了超時時間、host等參數。再點進start方法,他是抽象類CloseableHttpAsyncClient的一個抽象方法,具體實如今實現類CloseableHttpAsyncClientBase中,以下:

@Override
public void start() {
    if (this.status.compareAndSet(Status.INACTIVE, Status.ACTIVE)) {
        if (this.reactorThread != null) {
            this.reactorThread.start();
        }
    }
}

其調用了成員變量reactorThreadstart方法,而成員變量reactorThread 是一個Thread類,它在構造方法中初始化,以下:

public CloseableHttpAsyncClientBase(
        final NHttpClientConnectionManager connmgr,
        final ThreadFactory threadFactory,
        final NHttpClientEventHandler handler) {
    super();
    this.connmgr = connmgr;
    if (threadFactory != null && handler != null) {
        this.reactorThread = threadFactory.newThread(new Runnable() {

            @Override
            public void run() {
                try {
                    final IOEventDispatch ioEventDispatch = new InternalIODispatch(handler);
                    connmgr.execute(ioEventDispatch);
                } catch (final Exception ex) {
                    log.error("I/O reactor terminated abnormally", ex);
                } finally {
                    status.set(Status.STOPPED);
                }
            }

        });
    } else {
        this.reactorThread = null;
    }
    this.status = new AtomicReference<Status>(Status.INACTIVE);
}

可見,每建立一個CloseableHttpAsyncClient對象,就會建立一個reactorThread線程,而connmgr.execute(ioEventDispatch)是一個永久for循環執行的方法,因此線程的run方法不會主動退出,即,reactorThread線程不會銷燬。

問題解決

找到bug所在後,結局方案很簡單,只需將ESRestClient經過單例的方式注入到服務中,便可解決堆外內存泄漏的問題。

總結

查找bug時,須要從多個方面去定位,經過儘量多的現場信息去定量分析,監控、線上機器、源碼的查看都須要,深刻下去,會有很多收穫。

原文連接

https://segmentfault.com/a/11...

相關文章
相關標籤/搜索