線上服務收到報警,報警內容:虛擬機swap區佔用比例超過80%,如圖:css
本文着重描述排查問題的過程,在這個過程當中不斷的猜想–>驗證–>推翻–>再猜想–>再驗證–>再推翻,這個過程更有意思;java
潛意識中,當進程須要申請內存而機器內存不夠時,須要將一部分不經常使用的進程、數據換出到swap,從而騰出一部分物理內存空間;git
看下關於Linux swap space的介紹:github
Linux divides its physical RAM (random access memory) into chucks of memory called pages. Swapping is the process whereby a page of memory is copied to the preconfigured space on the hard disk, called swap space, to free up that page of memory. The combined sizes of the physical memory and the swap space is the amount of virtual memory available.web
Swapping is necessary for two important reasons. First, when the system requires more memory than is physically available, the kernel swaps out less used pages and gives memory to the current application (process) that needs the memory immediately. Second, a significant number of the pages used by an application during its startup phase may only be used for initialization and then never used again. The system can swap out those pages and free the memory for other applications or even for the disk cache.緩存
若是說swap被佔用了,說明物理內存不夠,那麼須要先找出哪些進程佔用了swap,再去看看這些進程是什麼緣由致使佔用swap;服務器
線上虛擬機配置內存爲8G,swap爲2G,能夠經過/proc/$(pid)/smaps找到哪些進程佔用了swap;app
經過top命令咱們也能夠看到是java進程佔用了1.8G的swap區:less
線上jdk1.8,應用啓動參數配置:-Xmx4g -Xms4g -Xmn2g,metaspace用的默認;dom
經過線上發佈記錄看到,在11-01號下午發佈,11-02號swap開始被佔用:
因此第一個懷疑的點即是上線發佈致使,這次發佈包含兩塊內容:升級jdk1.7至jdk1.八、分單邏輯優化;首先懷疑的是邏輯優化、致使每分鐘java堆內存致使大量對象被建立,可是這些對象的生命週期很短、應該會被快速回收,若是是大量對象建立致使佔用內存過多,那麼gc次數應該比發佈前要頻繁,因此再來看gc監控:
發現gc次數頻率與以前相比並沒有明顯變化,而且年輕代和年老代內存都在合理範圍內;
因此真的是由於分單邏輯優化致使的嗎?
Java堆分配了4G、經過jstat看到metaspace用了200M、線程棧大概佔用300M(300左右個線程、每一個線程1M內存),總共是4.5G左右,而top命令顯示java佔用的物理內存爲6.5G(RES列顯示的數據),另外2G是哪裏佔用了?堆外內存?
因此咱們的思路轉向到了堆外內存,看看從這個思路上是否能夠找出一些線索;
這裏經過google-perftools這個工具來查看java進程內存分配狀況,其實接下來的排查思路相似這篇文章:JAVA使用堆外內存致使swap飆高
首先安裝google-perftools,見這裏,java應用啓動時加上google-perftools的配置:
export LD_PRELOAD=/home/user/gperftools/lib/libtcmalloc.so
export HEAPPROFILE=/home/user/heap-gperftools/gzip
# 此配置表示進程每分配X字節內存輸出一個文件,默認是1G輸出一個文件
export HEAP_PROFILE_ALLOCATION_INTERVAL=4073741824
# 此配置表示進程每使用X字節內存輸出一個文件,默認是100M輸出一個文件,對於我這個應用來講,剛啓動生成50多個文件,不便於分析問題,因此我設置的4G左右
export HEAP_PROFILE_INUSE_INTERVAL=4048576000
環境變量配置好重啓java應用,此時在目錄下就能夠看到生成的gzip.0001.heap類的文件,來分析這些文件:
/home/sankuai/gperftools/bin/pprof –text /usr/local/java8/bin/java gzip.0001.heap
對輸出的結果進行排序顯示前20行:
2060.7 79.1% 79.1% 2060.7 79.1% updatewindow
0.0 0.0% 100.0% 2060.7 79.1% Java_java_util_zip_Inflater_inflateBytes
0.0 0.0% 100.0% 2060.7 79.1% inflate
0.0 0.0% 100.0% 2057.3 79.0% 0x00007f764b47ee05
0.0 0.0% 100.0% 2045.5 78.6% 0x00007f764b6c861f
5.5 0.2% 99.9% 455.3 17.5% Java_java_util_zip_Inflater_init
0.0 0.0% 100.0% 452.7 17.4% 0x00007f764b4f8eb0
449.8 17.3% 96.4% 449.8 17.3% inflateInit2_
0.0 0.0% 100.0% 427.8 16.4% 0x00007f764bb4d16b
45.8 1.8% 98.2% 45.8 1.8% init
0.0 0.0% 100.0% 45.8 1.8% _init@4210
0.0 0.0% 100.0% 45.8 1.8% __do_global_ctors_aux@12b20
0.0 0.0% 100.0% 44.8 1.7% JavaCalls::call_helper
0.0 0.0% 100.0% 43.9 1.7% start_thread
0.0 0.0% 100.0% 42.8 1.6% 0x00007f764b32e4e6
0.0 0.0% 100.0% 41.9 1.6% JavaMain
8.3 1.5% 99.6% 38.3 1.5% os::malloc
0.0 0.0% 100.0% 35.5 1.4% 0x00007f764b33598c
0.0 0.0% 100.0% 28.3 1.1% 0x00007f764b33570f
0.0 0.0% 100.0% 26.3 1.0% jni_invoke_static
能夠看出Java_java_util_zip_Inflater_inflateBytes方法分配了比較多的內存,再分析其它heap文件,也都是這個方法,因此從這個方法入手繼續追蹤,須要找到是業務哪行代碼調用了Inflater類,這裏咱們使用btrace這個工具,btace腳本以下;
import static com.sun.btrace.BTraceUtils.*;
import com.sun.btrace.annotations.*;
import java.nio.ByteBuffer;
import java.lang.Thread;
@BTrace
public class BtracerInflater{
@OnMethod(
clazz="java.util.zip.Inflater",
//method="inflateBytes"
method="inflate"
)
public static void traceCacheBlock(){
println("Who call java.util.zip.Inflater's methods :");
jstack();
}
}
經過打印inflate方法的調用堆棧,找到兩處調用:一處是公司的監控組件、一處是jetty啓動時掃描jar包會用;
先分析公司的監控組件,監控組件會每隔1分鐘去服務器拉取監控配置信息(好比路由IP地址、客戶端鏈接超時時間、長SQL閾值配置等),而服務器返回的配置信息是xml格式,在解析xml時會用到inflate方法,因而咱們經過禁用監控組件、重啓應用、發現java應用所佔的物理內容並無減小,因此排除是監控組件的問題;
再來看btrace打印的jetty調用棧:
Who call java.util.zip.Inflater's methods :
java.util.zip.Inflater.inflate(Inflater.java)
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
java.io.FilterInputStream.read(FilterInputStream.java:133)
java.io.FilterInputStream.read(FilterInputStream.java:133)
org.objectweb.asm.ClassReader.a(Unknown Source)
org.objectweb.asm.ClassReader.<init>(Unknown Source)
org.eclipse.jetty.annotations.AnnotationParser.scanClass(AnnotationParser.java:888)
org.eclipse.jetty.annotations.AnnotationParser$2.processEntry(AnnotationParser.java:847)
org.eclipse.jetty.webapp.JarScanner.matched(JarScanner.java:161)
org.eclipse.jetty.util.PatternMatcher.matchPatterns(PatternMatcher.java:100)
org.eclipse.jetty.util.PatternMatcher.match(PatternMatcher.java:82)
org.eclipse.jetty.webapp.JarScanner.scan(JarScanner.java:84)
org.eclipse.jetty.annotations.AnnotationParser.parse(AnnotationParser.java:859)
org.eclipse.jetty.annotations.AnnotationParser.parse(AnnotationParser.java:874)
com.xxx.boot.RJRAnnotationConfiguration.parseWebInfClasses(RJRAnnotationConfiguration.java:96)
org.eclipse.jetty.annotations.AnnotationConfiguration.configure(AnnotationConfiguration.java:113)
org.eclipse.jetty.webapp.WebAppContext.configure(WebAppContext.java:468)
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1247)
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:710)
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:494)
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
org.eclipse.jetty.server.handler.HandlerCollection.doStart(HandlerCollection.java:229)
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:95)
org.eclipse.jetty.server.Server.doStart(Server.java:280)
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:497)
com.xxx.Bootstrap.main(Bootstrap.java:101)
jetty的調用場景是:爲了支持Servlet規範中的註解方式(使得再也不須要在web.xml文件中進行Servlet的部署描述,簡化開發流程),jetty在啓動時會掃描class、lib包,將使用註解方式聲明的Servlet、Listener註冊到jetty容器,在掃描jar包的時候調用了inflate,分配了大量的內存,此時經過關鍵詞搜索到Memory leak while scanning annotations,這篇文章給出了兩種解決方法,一種是評論處所說,禁用緩存(說是jdk1.8的bug):
Here’s a link to the java bugs database issue: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8156014
I’d like to be able to comment on it, but I can’t seem to find a link to allow me to do that.
The comments I’d like to add are:
the problem is still reproduceable as of jdk8u112
the problem seems to be fixed in jdk9: I tested jdk9ea+149 and couldn’t reproduce
I’ve tried some workarounds for jdk8: it seems the ServiceLoader impl uses the jarurlconnection caching, so it may be of some benefit to try to disable url caching (although not sure of the effects on performance).
Try creating an xml file (eg called 「caches.xml」) with the following contents:
<Configure id="Server" class="org.eclipse.jetty.server.Server">
<Set class="org.eclipse.jetty.util.resource.Resource" name="defaultUseCaches">false</Set>
<New class="java.io.File">
<Arg>
<SystemProperty name="java.io.tmpdir"/>
</Arg>
<Call name="toURI">
<Call id="url" name="toURL">
<Call name="openConnection">
<Set name="defaultUseCaches">false</Set>
</Call>
</Call>
</Call>
</New>
</Configure>
嘗試以後,java進程佔用的物理內存由6.5G降到了5.9G,有必定的效果,可是咱們估算的java進程應該佔4.5G左右,還有1.4G內存被誰佔用了呢?
再嘗試另一種方法:
We were investigating a Jetty application that used much more memory than it was supposed to do (near a Gb more than the max heap size plus the max metaspace size). We found out that the extra memory was Native memory, being allocated using malloc via some library (we don’t have native code in our app). Then we used jemalloc ( http://www.canonware.com/jemalloc/) to find out which class was doing this allocation and we found out that it was java.util.zip.Inflater. Via jstack we were able to find calls to the library and the main caller was Jetty, while in the process of looking for annotations. We disable annotations for the application and the memory usage went back to normal.
既然jetty調用是由於掃描jar包中的註解,咱們應用中沒有經過註解聲明的Servlet、Listener,是否是能夠不須要掃描這一步,故而在jetty.xml中將註解掃描的方式去掉:
<!-- <Item>org.eclipse.jetty.annotations.AnnotationConfiguration</Item> -->
<!-- Item>com.xxx.boot.RJRAnnotationConfiguration</Item> -->
再重啓應用,發現內存使用由6.5G降到了3.9G,而且應用運行穩定後再也不佔用swap,如圖:
看下對比:
機器 | 改動 | 內存使用 | 系統穩定運行後 |
---|---|---|---|
機器1 | 什麼都不變 | java佔用物理內存6.5G | |
機器2 | 禁用jetty resource緩存 | 從6.5G降到5.9G | |
機器3 | 禁用註解掃描 | 從6.5G降到3.9G |
總結下排查問題的大概思路:
出處: