線上java.lang.OutOfMemoryError問題定位三板斧

OOM(OutOfMemoryError) 問題歸根結底三點緣由:html

  1. 自己資源不夠
  2. 申請的內存太多
  3. 資源耗盡

解決思路,換成Java服務分析,三個緣由也能夠解讀爲:java

  • 有多是內存分配確實太小,而正常業務使用了大量內存
  • 某一個對象被頻繁申請,卻沒有釋放,內存不斷泄漏,致使內存耗盡
  • 某一個資源被頻繁申請,系統資源耗盡,例如:不斷建立線程,不斷髮起網絡鏈接

所以,針對解決思路,快速定位OOM問題的三板斧是:apache

  1. 確認是否是內存自己就分配太小
  2. 找到最耗內存的對象
  3. 確認是不是資源耗盡

以正式線上的tomcat爲例,tomcat運行5個ssm架構的java項目,啓動時須要60秒左右,運行一段時間偶爾會有OOM出現,如今逐一排查:tomcat

(1) 確認是否是內存自己就分配太小服務器

在服務器(8核16G)上輸入 top 查看 java啓動時內存變化狀況,順便找到java的進程ID : 10397網絡

clipboard.png

而後, 輸入:jmap -heap 10397,觀察堆、新生代、老年代的內存使用狀況,發現大概都用了一半,能夠肯定,不是內存分配太小問題。架構

wen@S189919:/opt/tomcat8$ jmap -heap 1246
Attaching to process ID 1246, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.65-b04

using thread-local object allocation.
Parallel GC with 8 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 0
   MaxHeapFreeRatio = 100
   MaxHeapSize      = 4208984064 (4014.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 1172307968 (1118.0MB)
   used     = 679248008 (647.781379699707MB)
   free     = 493059960 (470.21862030029297MB)
   57.94108941857845% used
From Space:
   capacity = 85983232 (82.0MB)
   used     = 0 (0.0MB)
   free     = 85983232 (82.0MB)
   0.0% used
To Space:
   capacity = 115343360 (110.0MB)
   used     = 0 (0.0MB)
   free     = 115343360 (110.0MB)
   0.0% used
PS Old Generation
   capacity = 259522560 (247.5MB)
   used     = 147065016 (140.25212860107422MB)
   free     = 112457544 (107.24787139892578MB)
   56.667526707504734% used
PS Perm Generation
   capacity = 63963136 (61.0MB)
   used     = 32219528 (30.72693634033203MB)
   free     = 31743608 (30.27306365966797MB)
   50.37202678742956% used

16612 interned Strings occupying 2080416 bytes.

(2) 找到最耗內存的對象dom

jmap -histo:live 1246| moressh

輸入命令後,會以表格的形式顯示存活對象的信息,並按照所佔內存大小排序:socket

實例數

所佔內存大小

類名

經過觀察,雖然我不知道 [B 是什麼類,可是最大也只有72M,對內存來講簡直沒有知覺。

若是發現某類對象佔用內存很大(例如幾個G),極可能是類對象建立太多,且一直未釋放。例如:

申請完資源後,未調用close()或dispose()釋放資源

消費者消費速度慢(或中止消費了),而生產者不斷往隊列中投遞任務,致使隊列中任務累積過多

wen@S189919:/opt/tomcat8$ jmap -histo:live 1246 | more

 num     #instances         #bytes  class name
----------------------------------------------
   1:         79073       72095344  [B
   2:        103049       13630576  [C
   3:         57516        8155328  <constMethodKlass>
   4:         57516        7373456  <methodKlass>
   5:          5413        6128216  <constantPoolKlass>
   6:          5413        3861128  <instanceKlassKlass>
   7:          4455        3264960  <constantPoolCacheKlass>
   8:        101128        2427072  java.lang.String
   9:         46704        1868160  java.lang.ref.Finalizer
  10:          5314        1486584  [Ljava.util.HashMap$Entry;
  11:         22264        1419160  [Ljava.lang.Object;
  12:         17286        1382880  java.lang.reflect.Method
  13:         20810        1165360  java.util.zip.ZipFile$ZipFileInputStream
  14:         20389        1141784  java.util.zip.ZipFile$ZipFileInflaterInputStream
  15:         34592        1106944  java.util.HashMap$Entry
  16:          1963        1075048  <methodDataKlass>
  17:          1762         943992  [I
  18:         22136         708352  java.util.concurrent.ConcurrentHashMap$HashEntry
  19:          5866         704008  java.lang.Class
  20:         14549         581960  java.util.LinkedHashMap$Entry
  21:         21158         507792  java.util.ArrayList
  22:          7742         453448  [S
  23:          8839         450464  [[I
  24:          7362         412272  java.util.LinkedHashMap
  25:          3735         328416  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
  26:         14544         322536  [Ljava.lang.Class;
  27:          7350         294000  com.sun.org.apache.xerces.internal.dom.DeferredTextImpl
  28:          2973         273488  [Ljava.util.WeakHashMap$Entry;
  29:          6660         266400  com.sun.org.apache.xerces.internal.dom.DeferredAttrImpl
  30:          5394         258912  java.util.HashMap
  31:          6441         257640  javax.servlet.jsp.tagext.TagAttributeInfo
  32:           436         237184  <objArrayKlassKlass>
  33:         14200         227200  java.lang.Object
  34:          2783         222640  sun.net.www.protocol.jar.URLJarFile
  35:          3914         219184  com.sun.org.apache.xerces.internal.dom.DeferredElementImpl
  36:          6016         192512  java.util.concurrent.locks.ReentrantLock$NonfairSync
  37:          4328         173120  java.lang.ref.SoftReference
  38:          2970         166320  java.util.WeakHashMap
  39:          3200         153184  [Ljava.lang.String;
  40:          3735         149400  java.util.concurrent.ConcurrentHashMap$Segment

(3) 確認是不是資源耗盡

經過查看 sshd 進程,得出句柄詳情和線程數

/proc/${PID}/fd

/proc/${PID}/task

最終的結果句柄數和線程數8和4,更不可能引起內存溢出

root@S189919:/home/wen# ps -aux | grep sshd
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
root       749  0.0  0.0  50036  2928 ?        Ss   19:01   0:00 /usr/sbin/sshd -D
root      1321  0.0  0.0  73440  3608 ?        Ss   19:15   0:00 sshd: wen [priv]
wen       1464  0.0  0.0  73440  1528 ?        S    19:15   0:00 sshd: wen@pts/0
root      1585  0.0  0.0   9388   940 pts/0    S+   19:20   0:00 grep --color=auto sshd
root@S189919:/home/wen# ll /proc/749/fd
total 0
dr-x------ 2 root root  0 Sep  4 19:01 ./
dr-xr-xr-x 8 root root  0 Sep  4 19:01 ../
lrwx------ 1 root root 64 Sep  4 19:01 0 -> /dev/null
lrwx------ 1 root root 64 Sep  4 19:01 1 -> /dev/null
lrwx------ 1 root root 64 Sep  4 19:01 2 -> /dev/null
lr-x------ 1 root root 64 Sep  4 19:01 3 -> socket:[7330]
lrwx------ 1 root root 64 Sep  4 19:21 4 -> socket:[7332]
root@S189919:/home/wen# ll /proc/749/task
total 0
dr-xr-xr-x 3 root root 0 Sep  4 19:21 ./
dr-xr-xr-x 8 root root 0 Sep  4 19:01 ../
dr-xr-xr-x 6 root root 0 Sep  4 19:21 749/
root@S189919:/home/wen# ll /proc/749/fd | wc -l
8
root@S189919:/home/wen# ll /proc/749/task | wc -l

(4) 合併相同的 jar 包

最後,想來想去,頗有多是項目啓動時加載太多第三方jar包,因而,將5個ssm的jar包合併,覆蓋掉相同的,放在tomcat的shared lib目錄:修改 ${ TOMCAT_HOME }/conf/catalina.properties文件中shared.loader= ${catalina.base}/shared/lib,${catalina.base}/shared/lib/*.jar 也能夠將公用的jar所有放置${ TOMCAT_HOME }/lib包下1啓動tomcat,加載完用了37秒,希望能解決OOM問題,今後再也不被領導說。

相關文章
相關標籤/搜索