記一次解決tomcat自動關閉的bug

最近一個運行了4年的javaee web項目,常常接到客戶反饋系統沒法打開。登陸服務器查看服務,發現是tomcat自動關閉了。基本是3到4天發生一次。java

運維人員開始覺得是其餘服務殺死了tomcat服務,沒放在心上,解決方法就是直接重啓tomcat。web

最終捅了簍子,運維人員被客戶投訴,扣了一個月的績效。windows

解決這個bug兜兜轉轉來到了我這裏。既然接到任務,那就開幹,沒有解決不了的bug。tomcat

系統的運行環境以下:
tomcat6.0
32位jdk7.0
window server2003 32位,32G內存。服務器

查看日誌,若是tomcat閃崩,都會在tomcat的bin目錄下生成以"hs_err"開頭的日誌文件。打開最新的日誌文件,首先看到的是下面一段話:運維

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32756 bytes for ChunkPool::allocate
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (allocation.cpp:211), pid=7864, tid=6556
#
# JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15)
# Java VM: Java HotSpot(TM) Server VM (24.79-b02 mixed mode windows-x86 )
# Failed to write core dump.

大概意思就是內存不夠了,沒法分配32756字節的空間。同時給出幾個解決方法:
一、減小系統內存負載;
二、增長物理內存或者交換空間;
三、在64位操做系統上使用64位jdk;
四、減小java heap大小;
五、減小java線程數量;
六、減小java線程堆棧大小。jvm

經過上面的內容能夠得出,jvm沒法分配32756 bytes的內存空間。ui

從接到任務開始,我一直覺得是jvm配置出錯,致使內存不夠用,只需調整下新生代、老年代的配置便可。spa

繼續往下看日誌文件,找到"GC Heap History (10 events):"這一行,這個記錄jvm最後10次垃圾回收時堆的變化狀況。操作系統

GC Heap History (10 events):
Event: 572312.299 GC heap before
{Heap before GC invocations=5046 (full 357):
 PSYoungGen      total 201472K, used 200685K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 100% used [0x573c0000,0x63540000,0x63540000)
  from space 3328K, 76% used [0x63540000,0x637bb528,0x63880000)
  to   space 3328K, 0% used [0x63880000,0x63880000,0x63bc0000)
 ParOldGen       total 843776K, used 422602K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d872b18,0x573c0000)
 PSPermGen       total 262144K, used 51848K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62138,0x13bc0000)
Event: 572312.305 GC heap after
Heap after GC invocations=5046 (full 357):
 PSYoungGen      total 201472K, used 1103K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 0% used [0x573c0000,0x573c0000,0x63540000)
  from space 3328K, 33% used [0x63880000,0x63993c90,0x63bc0000)
  to   space 3328K, 0% used [0x63540000,0x63540000,0x63880000)
 ParOldGen       total 843776K, used 423618K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d970b18,0x573c0000)
 PSPermGen       total 262144K, used 51848K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62138,0x13bc0000)
}
Event: 572351.132 GC heap before
{Heap before GC invocations=5047 (full 357):
 PSYoungGen      total 201472K, used 199247K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 100% used [0x573c0000,0x63540000,0x63540000)
  from space 3328K, 33% used [0x63880000,0x63993c90,0x63bc0000)
  to   space 3328K, 0% used [0x63540000,0x63540000,0x63880000)
 ParOldGen       total 843776K, used 423618K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d970b18,0x573c0000)
 PSPermGen       total 262144K, used 51848K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62138,0x13bc0000)
Event: 572351.137 GC heap after
Heap after GC invocations=5047 (full 357):
 PSYoungGen      total 201472K, used 1615K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 0% used [0x573c0000,0x573c0000,0x63540000)
  from space 3328K, 48% used [0x63540000,0x636d3ec8,0x63880000)
  to   space 3328K, 0% used [0x63880000,0x63880000,0x63bc0000)
 ParOldGen       total 843776K, used 423674K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d97eb18,0x573c0000)
 PSPermGen       total 262144K, used 51848K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62138,0x13bc0000)
}
Event: 572398.649 GC heap before
{Heap before GC invocations=5048 (full 357):
 PSYoungGen      total 201472K, used 199759K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 100% used [0x573c0000,0x63540000,0x63540000)
  from space 3328K, 48% used [0x63540000,0x636d3ec8,0x63880000)
  to   space 3328K, 0% used [0x63880000,0x63880000,0x63bc0000)
 ParOldGen       total 843776K, used 423674K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d97eb18,0x573c0000)
 PSPermGen       total 262144K, used 51848K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62138,0x13bc0000)
Event: 572398.655 GC heap after
Heap after GC invocations=5048 (full 357):
 PSYoungGen      total 201472K, used 1998K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 0% used [0x573c0000,0x573c0000,0x63540000)
  from space 3328K, 60% used [0x63880000,0x63a73830,0x63bc0000)
  to   space 3328K, 0% used [0x63540000,0x63540000,0x63880000)
 ParOldGen       total 843776K, used 423703K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d985cc0,0x573c0000)
 PSPermGen       total 262144K, used 51848K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62138,0x13bc0000)
}
Event: 576881.689 GC heap before
{Heap before GC invocations=5049 (full 357):
 PSYoungGen      total 201472K, used 200142K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 100% used [0x573c0000,0x63540000,0x63540000)
  from space 3328K, 60% used [0x63880000,0x63a73830,0x63bc0000)
  to   space 3328K, 0% used [0x63540000,0x63540000,0x63880000)
 ParOldGen       total 843776K, used 423703K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d985cc0,0x573c0000)
 PSPermGen       total 262144K, used 51850K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62850,0x13bc0000)
Event: 576881.696 GC heap after
Heap after GC invocations=5049 (full 357):
 PSYoungGen      total 201472K, used 3155K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 0% used [0x573c0000,0x573c0000,0x63540000)
  from space 3328K, 94% used [0x63540000,0x63854cb0,0x63880000)
  to   space 3328K, 0% used [0x63880000,0x63880000,0x63bc0000)
 ParOldGen       total 843776K, used 423703K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d985cc0,0x573c0000)
 PSPermGen       total 262144K, used 51850K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e62850,0x13bc0000)
}
Event: 580535.452 GC heap before
{Heap before GC invocations=5050 (full 357):
 PSYoungGen      total 201472K, used 201299K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 198144K, 100% used [0x573c0000,0x63540000,0x63540000)
  from space 3328K, 94% used [0x63540000,0x63854cb0,0x63880000)
  to   space 3328K, 0% used [0x63880000,0x63880000,0x63bc0000)
 ParOldGen       total 843776K, used 423703K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d985cc0,0x573c0000)
 PSPermGen       total 262144K, used 51856K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e64228,0x13bc0000)
Event: 580535.459 GC heap after
Heap after GC invocations=5050 (full 357):
 PSYoungGen      total 200960K, used 1858K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 197632K, 0% used [0x573c0000,0x573c0000,0x634c0000)
  from space 3328K, 55% used [0x63880000,0x63a50be0,0x63bc0000)
  to   space 3584K, 0% used [0x634c0000,0x634c0000,0x63840000)
 ParOldGen       total 843776K, used 423703K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d985cc0,0x573c0000)
 PSPermGen       total 262144K, used 51856K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e64228,0x13bc0000)
}

看了上面的內容,並無發現tomcat閃崩是因爲老年代,持久代,新生代空間不足引發的。有好幾回由於eden區空間使用到100%引發的full gc,可是垃圾回收事後eden區的空間都恢復到正常的水平。

日誌中還記錄了tomcat閃崩時heap堆的使用狀況:

Heap
 PSYoungGen      total 200960K, used 95671K [0x573c0000, 0x63bc0000, 0x63bc0000)
  eden space 197632K, 47% used [0x573c0000,0x5cf5d230,0x634c0000)
  from space 3328K, 55% used [0x63880000,0x63a50be0,0x63bc0000)
  to   space 3584K, 0% used [0x634c0000,0x634c0000,0x63840000)
 ParOldGen       total 843776K, used 423703K [0x23bc0000, 0x573c0000, 0x573c0000)
  object space 843776K, 50% used [0x23bc0000,0x3d985cc0,0x573c0000)
 PSPermGen       total 262144K, used 51856K [0x03bc0000, 0x13bc0000, 0x23bc0000)
  object space 262144K, 19% used [0x03bc0000,0x06e64228,0x13bc0000)

一切都那麼正常,同時又那麼詭異。

翻看了以前發生日誌,內容都是大同小異。

從新翻看了幾遍日誌,此次把重點放在日誌中建議的解決方案上:

#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)

其中下面幾個解決方案不採用:

  • Reduce memory load on the system。 系統內存夠用,32G的內存,還剩20G沒用,無需減小內存。
  • Increase physical memory or swap space。 系統內存夠用,32G的內存,還剩20G沒用,無需增長物理內存。
  • Use 64 bit Java on a 64 bit OS。 32位操做系統,沒法使用64位jdk。

只剩下下面的三個解決方案了:

  • Decrease Java heap size (-Xmx/-Xms)。 heap堆設置過大,就會影響剩餘內存。
  • Decrease number of Java threads
  • Decrease Java thread stack sizes (-Xss)

而減小java線程的數量,須要修改代碼,這個也不實際。

最後只剩下

  • Decrease Java heap size (-Xmx/-Xms)
  • Decrease Java thread stack sizes (-Xss)

這兩個解決方案了,就從這裏入手,曙光就在前方。

先看 Decrease Java thread stack sizes (-Xss) 解決方案

java線程運行也是須要內存空間的,-Xss參數指定每一個線程堆棧的大小,爲jvm啓動的每一個線程分配的內存大小。在jdk1.4版本中是256K,JDK1.5及以上版本是1M。

tomcat jvm的參數設置以下:

JAVA_OPTS=%JAVA_OPTS% -server -Xms1024m -Xmx1024m -Xmn200M -XX:PermSize=256M -XX:MaxPermSize=512m -XX:SurvivorRatio=1 -Xss256k

已經經過-Xss設置每一個java線程堆棧的大小爲256K。

在java語言裏, 當你建立一個線程的時候,虛擬機會在JVM內存建立一個Thread對象同時建立一個操做系統線程,而這個系統線程的內存用的不是JVMMemory,而是系統中剩下的內存(MaxProcessMemory - JVMMemory - ReservedOsMemory)。

當須要建立線程,而操做系統剩餘內存不夠分配給一個java線程時,就會報Out of Memory Error的錯誤。

因爲已經設置經過-Xss設置java線程棧的大小爲256K,所以也決定不採用這個解決方案。

如今只剩 下Decrease Java heap size (-Xmx/-Xms) 這個解決方案了。經過減小堆的大小,而留出足夠的內存空間給java線程堆棧使用。

32位的window操做系統給每一個進程分配的內存空間是2G,減去堆的最大容量和PermSize的最大容量,剩下的容量就留給java線程棧使用。

通過分析代碼和以前錯誤的日誌,發現通常在350個線程這樣就出現Out of Memory Error的錯誤。
在出現錯誤時,heap空間才用了不到40%。所以決定將java heap的從1G減小到768M。

修改的jvm參數以下:

JAVA_OPTS=%JAVA_OPTS% -server -Xms768m -Xmx768m -Xmn200M -XX:PermSize=256M -XX:MaxPermSize=512m -XX:SurvivorRatio=1 -Xss256k

到目前爲止,系統已經穩定運行1個月,各個參數指標都在正常範圍內。heap使用率最高才70%。

總結: 一、通過此次解決bug,加深了對java虛擬機的瞭解,特別是線程棧,內存堆,持久代,新生代等概念。 二、必定要仔細閱讀日誌文件,一步一步排除掉潛在的解決方案。綜合系統的運行環境,找出合理的解決方案。

相關文章
相關標籤/搜索