jstack(Cat消息消費機器CPU消耗太高問題分析)

一、找出當前CPU資源消耗最大的進程

用top命令,發現java進程(pid:8424)CPU資源消耗達到192%,以下圖: 輸入圖片說明java

肯定java資源消耗過多。web

二、打印當前JVM堆棧

jstack 8424 > /tmp/jstack.txt

堆棧信息見附件1oop

三、找到當前堆棧資源消耗多的線程

命令以下:.net

top -H -p8424

結果以下:線程

8320 nwd 20 0 6275m 4.8g 7044 R 44.6 62.7 3:57.00 java 
8365 nwd 20 0 6275m 4.8g 7044 R 32.4 62.7 4:06.09 java 
19016 nwd 20 0 6275m 4.8g 7044 R 22.5 62.7 0:24.47 java 
8289 nwd 20 0 6275m 4.8g 7044 S 14.1 62.7 16:27.13 java 
8356 nwd 20 0 6275m 4.8g 7044 S 13.5 62.7 6:40.18 java 
8333 nwd 20 0 6275m 4.8g 7044 R 10.3 62.7 5:21.39 java

從上面查詢出來的結果能夠找到cpu消耗靠前的線程的pid,下一步根據pid從堆棧裏面查找相應的線程。注意:這裏pid是十進制的,線程nid是十六進制,要作進制轉換。進制轉換網址:http://tool.oschina.net/hexconvert/, 或者參考以下命令:netty

[ylp@ylp-web-01 ~]$ printf "%x\n" 14065
36f1

四、分析資源消耗靠前的線程

下面分析cpu資源消耗靠前的三個線程。code

  1. 這裏是列表文本線程pid=8320(0x2080)的線程堆棧以下:
"epollEventLoopGroup-3-5" prio=10 tid=0x00007fed00272000 nid=0x2080 runnable [0x00007fece29e6000]
java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.unpark(Native Method)
at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:152)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1263)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460)
at java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:174)
at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:429)
at com.dianping.cat.message.io.DefaultMessageQueue.offer(DefaultMessageQueue.java:22)
at com.dianping.cat.analysis.PeriodTask.enqueue(PeriodTask.java:44)
at com.dianping.cat.analysis.Period.distribute(Period.java:85)
at com.dianping.cat.analysis.RealtimeConsumer.consume(RealtimeConsumer.java:53)
at com.dianping.cat.analysis.DefaultMessageHandler.handle(DefaultMessageHandler.java:28)
at com.dianping.cat.analysis.TcpSocketReceiver$MessageDecoder.decode(TcpSocketReceiver.java:147)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:247)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:147)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None

當前線程是消息接收器,從客戶端傳過來消息,而後交給consumer消費。進程

  1. 這裏是列表文本線程pid=8365(0x20ad)堆棧信息以下:
"cat-Message-Gzip-19" daemon prio=10 tid=0x00007fed1c203800 nid=0x20ad runnable [0x00007fece0eb9000]
java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.unpark(Native Method)
at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:152)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1263)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460)
at java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:174)
at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:429)
at com.dianping.cat.consumer.dump.LocalMessageBucketManager$MessageGzip.gzipMessage(LocalMessageBucketManager.java:303)
at com.dianping.cat.consumer.dump.LocalMessageBucketManager$MessageGzip.run(LocalMessageBucketManager.java:332)
at java.lang.Thread.run(Thread.java:745)
at org.unidal.helper.Threads$RunnableThread.run(Threads.java:294)
Locked ownable synchronizers:
- None

當前線程是處理消息分析結果壓縮和解壓的。圖片

  1. 這裏是列表文本線程pid=19016(0x4a48)的堆棧信息以下:
"Cat-RealtimeConsumer-ProblemAnalyzer-10-1" daemon prio=10 tid=0x00007fed181ed800 nid=0x4a48 runnable [0x00007fecd9c1c000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000711fd73b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at com.dianping.cat.message.io.DefaultMessageQueue.poll(DefaultMessageQueue.java:49)
at com.dianping.cat.analysis.AbstractMessageAnalyzer.analyze(AbstractMessageAnalyzer.java:40)
at com.dianping.cat.analysis.PeriodTask.run(PeriodTask.java:80)
at java.lang.Thread.run(Thread.java:745)
at org.unidal.helper.Threads$RunnableThread.run(Threads.java:294)
Locked ownable synchronizers:
- None

該線程是分析當前消息是不是異常消息,若是是異常消息,要統計異常數及異常堆棧信息。ip

分析發現,資源消耗靠前的線程都是在正常處理業務,懷疑是否是消息負載太大致使CPU效果太高。想起來Cat自身有個state報表,查看一下當前消息處理的量。 輸入圖片說明

五、總結

處理這種問題,首先在合適的時機打印出堆棧信息,找到資源消耗靠前的線程重點分析。

附件: 附件1

六、附打印top線程及jstack信息的腳本

#!/bin/sh

cd `dirname $0`
APP_HOME=`pwd`
typeset top=${1:-10}
#typeset pid=${2:-$(pgrep -u $USER java)}
typeset pid=`ps -ef | grep java | grep "${APP_HOME}" |awk '{print $2}'`
typeset tmp_file=./java${pid}.trace

top -H -p $pid -b -d 1 -n 1 | head -20 > $tmp_file

$JAVA_HOME/bin/jstack -l $pid >> $tmp_file
ps H -eo user,pid,ppid,tid,time,%cpu --sort=%cpu --no-headers\
        | tail -$top\
        | awk -v "pid=$pid" '$2==pid{print $4"\t"$6}'\
        | while read line;
do
        typeset nid=$(echo "$line"|awk '{printf("0x%x",$1)}')
        typeset cpu=$(echo "$line"|awk '{print $2}')
        awk -v "cpu=$cpu" '/nid='"$nid"'/,/^$/{print $0"\t"(isF++?"":"cpu="cpu"%");}' $tmp_file
done
相關文章
相關標籤/搜索