先來一些基礎操做:java
查看內存使用狀況:(注意:1. 真實可用內存=free + cached 2.Swap的使用量若是較大,將嚴重影響應用的性能)python
[@yd-80-133 ~]
# free -m
total used
free
shared buffers cached
Mem: 96636 96400 235 0 522 75056
-/+ buffers
/cache
: 20821 75814
Swap: 8189 49 8139
|
查看磁盤使用狀況:(若是你部署應用的磁盤使用率100%,你的應用就會變得不可用)mysql
[@yd-81-74 ~]
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1
3.9G 977M 2.8G 26% /
/dev/sda6
1.4T 194G 1.1T 16%
/opt
/dev/sda3
3.9G 2.4G 1.3G 66%
/var
/dev/sda5
4.9G 3.0G 1.7G 64%
/usr
tmpfs 12G 38M 12G 1%
/dev/shm
10.13.81.44:
/data/scribelog
21T 7.3T 13T 37%
/opt/scribelog
|
查看系統概況:(top命令,能夠看到不少信息。shift+p按cpu倒序,shift+m按內存倒序,1查看每一個cpu繁忙程度)linux
top
- 16:38:58 up 1019 days, 1:53, 28
users
, load average: 0.77, 0.53, 0.56
Tasks: 325 total, 1 running, 324 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.7%us, 0.3%sy, 0.0%ni, 98.6%
id
, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24659996k total, 22502624k used, 2157372k
free
, 118628k buffers
Swap: 4192956k total, 13344k used, 4179612k
free
, 324068k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4033 root 18 0 1996m 975m 13m S 159.0 4.1 6:40.52 java
12336 root 18 0 2020m 1.2g 10m S 9.8 5.3 2860:34 java
3484 root 34 19 0 0 0 S 2.0 0.0 16159:29 kipmi0
7350 root 15 0 12868 1192 740 R 2.0 0.0 0:00.01
top
29636 smc 21 0 1092m 579m 14m S 2.0 2.4 1:20.44 java
30469 smc 21 0 1075m 708m 14m S 2.0 2.9 5:34.31 java
|
查看一個進程有多少線程:nginx
[@yd-81-211 ~]$
ps
-eLf |
grep
24941 |
wc
-l
583
|
查看端口被哪一個進程佔用:(pid顯示的就是那個進程)web
[@yd-81-74 ~]
# lsof -i tcp:8080
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
java 30469 smc 239u IPv4 64471973 TCP 10.13.81.74:webcache (LISTEN)
這個命令還能夠查看文件被哪一個進程佔用
[@yd-81-130 nginx]
# lsof | grep /data/log/scribelog
bash
4171 smc cwd DIR 0,23 16384 17797
/data/log/scribelog/user
(10.13.81.44:
/data/scribelog/
)
cat
4172 smc cwd DIR 0,23 16384 17797
/data/log/scribelog/user
(10.13.81.44:
/data/scribelog/
)
|
查看端口是否在監聽:(最後一列顯示哪一個進程在監聽這個端口)算法
[@yd-81-74 ~]
# netstat -nalp | grep 8080
tcp 0 0 10.13.81.74:8080 0.0.0.0:* LISTEN 30469
/java
|
比較兩個文件是否同樣:spring
md5sum targetfile.txt > targetfile.md5
把targetfile.md5和targetfile.txt放到同一目錄下進行校驗:
md5sum -c targetfile.md5
|
下面是咱們遇到過的狀況,若有錯誤或須要補充的內容,請直接修改。 sql
內存異常:
♦ java.lang.OutOfMemoryError: PermGen space
》resin熱部署,從新加載jar包
》持久代設置得過小:-XX:PermSize=32m -XX:MaxPermSize=64ma
》常見於測試環境,線上基本不會出現這種問題。
♦ java.lang.OutOfMemoryError: Java heap space
》一般狀況是從數據庫或緩存加載了大量數據或者用戶上傳了大量文件。
》通常來講,因爲會觸發GC,只要代碼不存在內存泄漏問題,線上很難出現這個異常。
》解決辦法:重啓、修復代碼隱患
♦ java.lang.OutOfMemoryError: GC overhead limit exceeded
》這是由於使用併發收集算法進行GC,而且jvm啓動參數中加了-XX:-UseGCOverheadLimit選項。
》目前只在hive的應用中遇到此狀況,線上應用通常是CMS算法,不會出現這種狀況。
》解決辦法:增長heap size或者禁用上面那個選項。
♦ 不少時候內存異常並不會表現爲異常,尚未達到這個臨界點,你的系統就已經不可用了,這個時候須要主動去檢查內存使用狀況:
》查詢GC狀態,看一下jvm是否在進行GC操做(下面是一個示例,等下次碰到典型場景再貼一個):數據庫
[@tc-152-92 ~]$ jstat -gcutil 16590 3000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 85.96 90.60 54.20 3336 0.781 38188 13952.038 13952.819
0.00 0.00 91.42 90.60 54.20 3336 0.781 38189 13952.565 13953.346
0.00 0.00 97.43 90.60 54.20 3336 0.781 38190 13952.960 13953.741
|
》查詢java對象內存佔用狀況,看一下內存裏的java對象是否合理(下面只是一個例子,等下次碰到內存佔用異常的場景我再貼一個)。
[@zjm-110-88 ~]$ jmap -histo 2234 |
head
-10
num
#instances #bytes class name
----------------------------------------------
1: 3373503 2209452824 [C
2: 3334031 133361240 java.lang.String
3: 260 101301344 [Lcom.caucho.util.LruCache$CacheItem;
4: 326846 63127704 [Ljava.lang.Object;
5: 151274 50828064 com.wap.sohu.mobilepaper.model.NewsContent
6: 19812 45474976 [I
7: 110209 40197776 [B
8: 145988 30902344 [Ljava.util.HashMap$Entry;
9: 1846859 29549744 java.lang.Object
10: 270121 19448712 com.wap.sohu.mobilepaper.model.xml.Image
|
♦ 經常使用jvm參數:
-XX:MaxPermSize=512m -XX:PermSize=512m -Xss128k
-Xmx4096m -Xms4096m -Xmn1024m
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=85 -XX:+PrintGCDetails
-XX:MaxTenuringThreshold=30
CPU異常:
♦ 運行的線程多了,咱們的應用裏有很多線程去異步地執行任務,可能某個時間點或事件觸發了大量線程同時去執行操做,致使cpu資源緊張。
♦ 程序運行的慢了,好比大量的計算操做,頻繁地進行循環遍歷。
♦ io操做多,好比頻繁地打印日誌,頻繁地進行網絡訪問(mysql,memcache)。
♦ 過多的同步操做。好比synchronize
♦ 通常狀況下,咱們都是經過觀察jvm的棧信息來識別程序的異常,主要看java.lang.Thread.State這個值,通常BLOCKED和RUNNABLE都須要重點關注。BLOCKED狀態確定是有鎖,好比頻繁的IO操做會致使資源BLOCK或者咱們代碼裏顯式的加鎖。RUNNABLE狀態理論上是正常的,可是頗有多是邏輯處理太慢(好比網絡io或計算)或調用頻繁致使一段代碼執行時間較長,這個也須要優化。
[@yd-80-133 ~]$ jstack 1344
2013-06-08 16:15:42
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.8-b03 mixed mode):
"pool-40-thread-5"
prio=10 tid=0x000000005cea6800 nid=0x639c runnable [0x00000000493c5000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.
read
(SocketInputStream.java:129)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.
read
(ReadAheadInputStream.java:189)
- locked <0x000000074c115a08> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
- locked <0x0000000770fb3380> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
- locked <0x0000000770fb3380> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2450)
- locked <0x0000000770fb3380> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2006)
- locked <0x0000000770fb3380> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1467)
- locked <0x0000000770fb3380> (a com.mysql.jdbc.JDBC4Connection)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at org.springframework.jdbc.core.JdbcTemplate$4.doInPreparedStatement(JdbcTemplate.java:873)
at org.springframework.jdbc.core.JdbcTemplate$4.doInPreparedStatement(JdbcTemplate.java:1)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:586)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:614)
at org.springframework.jdbc.core.JdbcTemplate.batchUpdate(JdbcTemplate.java:858)
at com.wap.sohu.mobilepaper.dao.statistic.StatisticDao$BatchUpdateTask2.run(StatisticDao.java:285)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
|
♦ 提供一種查看消耗cpu的線程的方式:
1.
找到java進程的
id
號,從top命令裏能夠看到消耗cpu最多的那個進程。
下面是進程信息:
smc
18950
1
90
Dec19 ?
21
:
22
:
46
java
-
server
-
Xmx768m
-
Xms768m
-
Xss128k
-
Xmn300m
-
XX:MaxPermSize
=
128m
-
XX:PermSize
=
128m
-
XX:
+
UseConcMarkSweepGC
-
XX:CMSInitiatingOccupancyFraction
=
85
-
XX:
+
PrintGCDetails
-
XX:
+
UseMembar
-
Dhost_home
=
/
opt
/
smc
-
Dserver_log_home
=
/
opt
/
smc
/
log
/
server
-
Xloggc:
/
opt
/
smc
/
log
/
server
/
check_instance_gc.log
-
Dserver_ip
=
10.11
.
152.92
-
Dserver_resources
=
/
opt
/
smc
/
apps
/
server
/
server_apps
/
check_instance
/
resources
/
-
Dserver_name
=
check_instance com.wap.sohu.server.SmcApiServer
8010
2.
查看當時消耗cpu最多的線程:(
-
p 指定java進程號,
-
H 顯示全部線程)
PID USER PR NI VIRT RES SHR S
%
CPU
%
MEM TIME
+
COMMAND
18999
smc
16
0
1676m
1.0g
14m
S
32.3
6.5
268
:
50.16
java
18997
smc
16
0
1676m
1.0g
14m
S
30.4
6.5
267
:
47.97
java
18998
smc
17
0
1676m
1.0g
14m
S
30.4
6.5
268
:
15.28
java
19000
smc
16
0
1676m
1.0g
14m
S
30.4
6.5
268
:
06.23
java
19001
smc
15
0
1676m
1.0g
14m
S
5.7
6.5
81
:
34.02
java
3.
保留當時的堆棧信息:
4.
把上面最消耗cpu的線程
ID
轉成十六進制:
Python
2.4
.
3
(
#1, Jun 11 2009, 14:09:37)
[GCC
4.1
.
2
20080704
(Red Hat
4.1
.
2
-
44
)] on linux2
Type
"help"
,
"copyright"
,
"credits"
or
"license"
for
more information.
>>>
hex
(
18999
)
'0x4a37'
>>>
5.
在堆棧文件裏查找這個十六進制的線程號:
"Concurrent Mark-Sweep GC Thread"
prio
=
10
tid
=
0x000000004183b800
nid
=
0x4a39
runnable
"Gang worker#0 (Parallel CMS Threads)"
prio
=
10
tid
=
0x0000000041834000
nid
=
0x4a35
runnable
"Gang worker#1 (Parallel CMS Threads)"
prio
=
10
tid
=
0x0000000041836000
nid
=
0x4a36
runnable
"Gang worker#2 (Parallel CMS Threads)"
prio
=
10
tid
=
0x0000000041837800
nid
=
0x4a37
runnable
"Gang worker#3 (Parallel CMS Threads)"
prio
=
10
tid
=
0x0000000041839800
nid
=
0x4a38
runnable
這裏發現都是gc線程在消耗cpu,從最上面的進程信息能夠看到這個java進程只開了
1G
的堆,top信息裏顯示的內存佔用已經到
1G
了,正好能對上。
|
代碼裏的常見異常:
♦ java.lang.OutOfMemoryError: unable to create new native thread
程序裏起的線程太多了。一種狀況是啓動程序的用戶(smc)最大可運行線程有限制(ulimit -a查看),另外一種狀況就是代碼裏起了不少線程。
當這種異常出現時,咱們登陸到服務器上當前用戶下會出現Resource temporarily unavailable的錯誤。(若是是root用戶啓動的進程,就只能重啓機器了)
♦ Broken Pipe // TODO...
♦ Too many open files // TODO...
♦ c3p0鏈接池異常:Attempted to use a closed or broken resource pool
<property name="breakAfterAcquireFailure" value="true"></property>
改爲:
<property name="breakAfterAcquireFailure" value="false"></property>
若是參數爲true,只要有一次獲取數據庫鏈接失敗後,整個數據源就會聲明爲斷開並永久關閉,服務就不可用了.
若是參數爲false,獲取數據庫鏈接失敗後程序會拋出異常,可是數據源仍然有效,等下次嘗試獲取鏈接成功後就能夠正常使用了.
♦ 類初始化異常
java.lang.NoClassDefFoundError: Could not initialize class com.wap.sohu.mobilepaper.util.ClientUserCenterHttpUtils
> 多是你的classpath路徑不對,或類加載順序的問題(jar衝突)
> 類文件不存在
> 類初始化時拋出未捕獲的異常,好比static塊或static變量初始化有問題。
在線Debug方式
安裝:10.10.76.79:/root/btrace-bin.tar.gz 把這個文件拷到目標機器上,新建一個btrace的目錄,解壓便可用:tar -zxvf btrace-bin.tar.gz -C btrace/
修改權限:cd bin/; chmod 744 *
import
com.sun.btrace.annotations.
*
;
import
static com.sun.btrace.BTraceUtils.
*
;
import
java.util.
*
;
public
class
Response {
@OnMethod
(clazz
=
"com.sohu.smc.reply.core.LocalCache"
, method
=
"getBulk"
, location
=
@Location(Kind.RETURN))
println(
"======================================="
);
printArray(keys);
println(strcat(
"Params keys length:"
,
str
(keys.length)));
println(strcat(
"Result length:"
,
str
(size(result))));
}
/
/
@OnMethod(clazz
=
"com.sohu.smc.reply.core.CommentCursorList"
, method
=
"getIdList"
, location
=
@Location(Kind.RETURN))
/
/
println(strcat(
"ID LIST:"
,
str
(size(result))));
/
/
}
}
|
starting pool
-
7
-
thread
-
1
starting pool
-
17
-
thread
-
1
starting pool
-
6
-
thread
-
2
starting pool
-
7
-
thread
-
2
starting pool
-
7
-
thread
-
3
starting pool
-
7
-
thread
-
4
starting pool
-
6
-
thread
-
3
starting pool
-
6
-
thread
-
4
starting pool
-
6
-
thread
-
5
starting pool
-
6
-
thread
-
6
starting netty
-
io.airlift.http.client
-
cli
-
io
-
boss
-
0
starting netty
-
io.airlift.http.client
-
cli
-
io
-
worker
-
0
starting netty
-
io.airlift.http.client
-
cli
-
io
-
worker
-
1
starting netty
-
io.airlift.http.client
-
cli
-
io
-
worker
-
2
starting netty
-
io.airlift.http.client
-
cli
-
io
-
worker
-
3
starting netty
-
io.airlift.http.client
-
cli
-
io
-
worker
-
4
|