我 以爲在一般狀況下分析Java虛擬機死鎖比分析內存泄漏要容易的多。由於死鎖發生時,JVM一般處於掛起狀態(hang住了),thread dump能夠給出靜態穩定的信息,查找死鎖只須要查找有問題的線程。而內存泄漏的問題卻很難界定,一個運行的JVM裏有無數對象存在,只有寫程序的人才知 道哪些對象是垃圾,而哪些不是,並且對象的引用關係很是複雜,很可貴到一份清晰的對象引用圖。
Java虛擬機死鎖發生時,從操做系統上觀 察,虛擬機的CPU佔用率爲零,很快會從top或prstat的輸出中消失。這時你就能夠收集thread dump了,Unix/Linux 下是kill -3 <JVM pid>,在Windows下能夠在JVM的console窗口上敲Ctrl-Break。根據不一樣的設置,thread dump會輸出到當前控制檯上或應用服務器的日誌裏。
拿到java thread dump後,你要作的就是查找"waiting for monitor entry"的thread,若是大量thread都在等待給同一個地址上鎖(由於對於Java,一個對象只有一把鎖),這說明極可能死鎖發生了。好比:java
"service-j2ee" prio=5 tid=0x024f1c28 nid=0x125 waiting for monitor entry
[62a3e000..62a3f690]
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
com.sun.enterprise.resource.IASNonSharedResourcePool.internalGetResource(IASNonS
haredResourcePool.java:625)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: - waiting to
lock <0x965d8110> (a com.sun.enterprise.resource.IASNonSharedResourcePool)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
com.sun.enterprise.resource.IASNonSharedResourcePool.getResource(IASNonSharedRes
ourcePool.java:520)
................
爲了肯定問題,經常須要在隔兩分鐘後再次收集一次thread dump,若是獲得的輸出相同,仍然是大量thread都在等待給同一個地址上鎖,那麼確定是死鎖了。
如何找到當前持有鎖的線程是解決問題的關鍵。方法是搜索thread dump,查找"locked <0x965d8110>", 找到持有鎖的線程。服務器
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: "Thread-20" daemon prio=5 tid=0x01394f18
nid=0x109 runnable [6716f000..6716fc28]
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
java.net.SocketInputStream.socketRead0(Native Method)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
java.net.SocketInputStream.read(SocketInputStream.java:129)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at oracle.net.ns.Packet.receive(Unknown
Source)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.net.ns.DataPacket.receive(Unknown Source)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.net.ns.NetInputStream.read(Unknown Source)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.net.ns.NetInputStream.read(Unknown Source)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.net.ns.NetInputStream.read(Unknown Source)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:929)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.jdbc.ttc7.Ocommoncall.receive(Ocommoncall.java:106)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.jdbc.ttc7.TTC7Protocol.logoff(TTC7Protocol.java:396)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: - locked <0x954f47a0> (a
oracle.jdbc.ttc7.TTC7Protocol)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
oracle.jdbc.driver.OracleConnection.close(OracleConnection.java:1518)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: - locked <0x954f4520> (a
oracle.jdbc.driver.OracleConnection)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
com.sun.enterprise.resource.JdbcUrlAllocator.destroyResource(JdbcUrlAllocator.java:122)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
com.sun.enterprise.resource.IASNonSharedResourcePool.destroyResource(IASNonSharedResourcePool.java:8
72)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
com.sun.enterprise.resource.IASNonSharedResourcePool.resizePool(IASNonSharedResourcePool.java:1086)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: - locked <0x965d8110> (a
com.sun.enterprise.resource.IASNonSharedResourcePool)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
com.sun.enterprise.resource.IASNonSharedResourcePool$Resizer.run(IASNonSharedResourcePool.java:1178)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
java.util.TimerThread.mainLoop(Timer.java:432)
[27/Jun/2006:10:03:08] WARNING (26140): CORE3283: stderr: at
java.util.TimerThread.run(Timer.java:382)
在這個例子裏,持有鎖的線程在等待Oracle返回結果,卻始終等不到響應,所以發生了死鎖。
若是持有鎖的線程還在等待給另外一個對象上鎖,那麼仍是按上面的辦法順藤摸瓜,直到找到死鎖的根源爲止。
另外,在thread dump裏還會常常看到這樣的線程,它們是等待一個條件而主動放棄鎖的線程。例如:
"Thread-1" daemon prio=5 tid=0x014e97a8 nid=0x80 in Object.wait() [68c6f000..68c6fc28]
at java.lang.Object.wait(Native Method)
- waiting on <0x95b07178> (a java.util.LinkedList)
at com.iplanet.ias.util.collection.BlockingQueue.remove(BlockingQueue.java:258)
- locked <0x95b07178> (a java.util.LinkedList)
at com.iplanet.ias.util.threadpool.FastThreadPool$ThreadPoolThread.run(FastThreadPool.java:241)
at java.lang.Thread.run(Thread.java:534)
有時也會須要分析這類線程,尤爲是線程等待的條件。
其實,Java thread dump並不僅用於分析死鎖,其它Java應用運行時古怪的行爲均可以用thread dump來分析。
最 後,在Java SE 5裏,增長了jstack的工具,也能夠獲取thread dump。在Java SE 6裏, 經過jconsole的圖形化工具也能夠方便地查找涉及object monitors 和java.util.concurrent.locks死鎖。oracle