1. 數據庫hang的幾種可能性node
oracle 死鎖 或者系統負載很是高好比cpu使用或其餘一些鎖等待很高均可能致使系統hang住,好比大量的DX鎖。sql
一般來講,咱們所指的系統hang住,是指應用無響應,普通的sqlplus幾乎沒法操做等等。數據庫
2. 如何進行hang分析?hang分析有哪些level?如何選擇level?bash
hanganalyze有以下幾種level:session
10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at alloracle
如何選擇level?this
通常來講,不建議使用3以上級別的hang分析,由於可能會產生很是大的trace,還可能對系統的IO有必定影響。spa
從oracle 9i開始 hanganalyze提供給了對rac的支持。.net
有以下2種方式:debug
1) ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level ';
2) 使用oradebug 命令
ORADEBUG setmypid
ORADEBUG setinst all
ORADEBUG -g def hanganalyze ---針對rac的用法
oradebug setmypid
oradebug hanganalyze 3 ---非rac環境
一般在作hang分析的時候,oracle建議同時作一個systemstate的dump
oradebug SYSTEMSTATE dump level 2 level 2便可, 包含了全部session的信息。
sqlplus -prelim / as sysdba ---10g能夠使用此方式登陸
oradebug setospid
oradebug unlimit
oradebug dump systemstate 10
補充:有時候咱們可能還須要對某個進程進行trace aix環境,咱們能夠使用dbx命令
以下例子:
dbx -a PID (where PID = any oracle shadow process) ---經過ps -ef|grep xxx查看
dbx() print ksudss(10)
dbx() detach
3. 如何解讀hang分析的trace文件,獲取有用信息?
*** ACTION NAME:() 2010-03-12 00:04:01.497
*** MODULE NAME:(sqlplus@S7_C_YZ_YZSJK (TNS V1-V3)) 2010-03-12 00:04:01.497 ---模塊名 跟v$session.module_name同樣
*** SERVICE NAME:(SYS$USERS) 2010-03-12 00:04:01.497
*** SESSION ID:(5184.45287) 2010-03-12 00:04:01.497 ----sid (5184) serial# (35287)
*** 2010-03-12 00:04:01.497
==============
HANG ANALYSIS:
==============
Found 54 objects waiting for
<0/5210/10419/0x99d0a88/11215038/No Wait> ------從這裏看 session 5210 阻塞了54個對象
Open chains found:
Chain 1 : : ---從這裏開始 如下的session都是被前面的5210阻塞 一般來講是一個阻塞另外一個
<0/5210/10419/0x99d0a88/11215038/No Wait>
-- <0/3994/15494/0xd9ac1b0/6574102/enq: TM - contention>
-- <0/4962/58962/0xca03618/5710044/enq: DX - contention>
Other chains found: ---下面的session也是被前面所阻塞 不過不是直接阻塞(by Open chains) 間接阻塞
Chain 2 : :
<0/4001/31548/0xf9f3ab0/4980956/enq: DX - contention>
Chain 3 : :
<0/4014/30717/0xaa27b48/7446746/gc buffer busy>
Chain 4 : :
<0/4039/42115/0xd9f5710/5595180/PX Deq: Table Q Normal>
Cycle 1 : : ---cycle 一般是死鎖 通常來講頗有可能就是hang的根源
<980/3887/0xe4214964/24065/latch free>
-- <2518/352/0xe4216560/24574/latch free>
-- <55/10/0xe41236a8/13751/latch free>
4. 不一樣版本hang分析的差別?trace有何異同?
以下是oracle8~10g的 hanganalyze trace信息格式:
Oracle 8.x : [nodenum]/sid/sess_srno/session/state/start/finish/[adjlist]/predecessor
Oracle9i: [nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
Oracle10g:[nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
Nodenum --》 每一個session作hanganalyze生成的一個序列號
sid --》 Session ID
sess_srno --》 Serial#
ospid --》 OS Process Id (v$process spid)
state --》 State of the node
adjlist --》 adjacent node (Usually represents a blocker node) --一般是阻塞者
predecessor --》 predecessor node (Usually represents a waiter node) --一般是被阻塞者
cnode --》 節點號 從9i開始纔有
關於state 有以下幾種值:
IN_HANG --》 該狀態是一個很是危險的狀態,一般表現爲一個節點陷入了死循環或是hung。 通常來講出現這種狀況,該節點的臨闢節點也是同樣的狀態 即adjlist
以下例子:
[16]/0/17/154/0x24617be0/26800/IN_HANG/29/32/[185]/19 ---從IN_HANG 咱們能夠看出 185是16的鄰居節點,185被16阻塞
[185]/1/16/4966/0x24617270//IN_HANG/30/31/[16]/16 ---從這裏看 185阻塞了16(16是waiter)
LEAF --》一般是被認爲blockers的重點對象。那麼如何去肯定呢? 通常來講,根據後面的predecesor來判斷該session是否是blocker或者是waiter。
以下例子:
[ nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/LEAF/29/30//19 --從這裏看19是waiter 所以咱們認爲17阻塞了20
[19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186
LEAF_NW --》 跟leaf相似 不過可能會佔用cpu
NLEAF --》該狀態的session一般被認爲 「stuck」 session。即其餘session所須要的資源正被其holding。
IGN --》該狀態的session一般是處理IDLE狀態,除非其adjlist存在,若是是,那麼該session正在等待其餘session。
IGN_DMP --》跟 IGN 相似。
以下例子:
[nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/LEAF/29/30//19
[19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186
[189]/1/20/36/0x24619830//IGN/95/96/[19]/none
[176]/1/7/1/0x24611d80//IGN/75/76//none
----從上面看,189在等待19,19在等待16,而176是一個idle session。
SINGLE_NODE,SINGLE_NODE_NW 能夠認爲跟LEAF,LEAF_NW同樣,除非沒有依賴對象。
本節我基於scott用戶產生兩個會話,模擬死鎖會話(一個update,一個delete)
SQL> oradebug help
HELP [command] Describe one or all commands
SETMYPID Debug current process
SETOSPID Set OS pid of process to debug
SETORAPID ['force'] Set Oracle pid of process to debug
SHORT_STACK Dump abridged OS stack
DUMP <dump_name>[addr] Invoke named dump
DUMPSGA [bytes] Dump fixed SGA
DUMPLIST Print a list of available dumps
EVENT Set trace event in process
SESSION_EVENT Set trace event in session
DUMPVAR <p|s|uga>[level] Print/dump a fixed PGA/SGA/UGA variable
DUMPTYPE
SQL> oradebug hanganalyze 3;
Hang Analysis in /oracle/admin/orcl/udump/orcl_ora_2622.trc
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
-bash-3.2$ more /oracle/admin/orcl/udump/orcl_ora_2622.trc
/oracle/admin/orcl/udump/orcl_ora_2622.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /oracle/product/10.2.0/db_1
System name: Linux
Node name: truerhel5
Release: 2.6.18-164.el5
Version: #1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine: x86_64
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 2622, image:oracle@truerhel5(TNS V1-V3)
*** SERVICE NAME:(SYS$USERS) 2010-08-07 21:11:10.818
*** SESSION ID:(145.36) 2010-08-07 21:11:10.818
*** 2010-08-07 21:11:10.818
==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 : : --每列的註解:分爲cnode sid sess_srno proc_ptr ospid wait_event
<0/148/27/0x70e5e4a8/2543/SQL*Net message from client> --會話148(持鎖會話)
-- <0/146/84/0x70e5f478/2607/enq: TX - row lock contention> --會話146(等待鎖會話),競爭事件爲:row lock contention
Other chains found:
Chain 2 : :
<0/144/108/0x70e5ccf0/2614/jobq slave wait>
Chain 3 : :
<0/145/36/0x70e5fc60/2622/No Wait>
Chain 4 : :
<0/150/2/0x70e623e8/2338/Streams AQ: waiting for time man>
Chain 5 : :
<0/151/1/0x70e5ec90/2319/Streams AQ: qmn coordinator idle>
Chain 6 : :
<0/158/7/0x70e61c00/2336/Streams AQ: qmn slave idle wait>
Extra information that will be dumped at higher levels:
[level 4] : 1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level 5] : 5 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 6] : 1 node dumps -- [NLEAF]
[level 10] : 13 node dumps -- [IGN]
State of nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):
[143]/0/144/108/0x70f5dcf8/2614/SINGLE_NODE/1/2//none
[144]/0/145/36/0x70f5f130/2622/SINGLE_NODE_NW/3/4//none
[145]/0/146/84/0x70f60568/2607/NLEAF/5/8/[147]/none
[147]/0/148/27/0x70f62dd8/2543/LEAF/6/7//145
[149]/0/150/2/0x70f65648/2338/SINGLE_NODE/9/10//none
[150]/0/151/1/0x70f66a80/2319/SINGLE_NODE/11/12//none
[154]/0/155/1/0x70f6bb60/2315/IGN/13/14//none
[155]/0/156/1/0x70f6cf98/2313/IGN/15/16//none
[157]/0/158/7/0x70f6f808/2336/SINGLE_NODE/17/18//none
[159]/0/160/1/0x70f72078/2305/IGN/19/20//none
[160]/0/161/1/0x70f734b0/2303/IGN/21/22//none
[161]/0/162/1/0x70f748e8/2301/IGN/23/24//none
[162]/0/163/1/0x70f75d20/2299/IGN/25/26//none
[163]/0/164/1/0x70f77158/2297/IGN/27/28//none
[164]/0/165/1/0x70f78590/2295/IGN/29/30//none
[165]/0/166/1/0x70f799c8/2293/IGN/31/32//none
[166]/0/167/1/0x70f7ae00/2291/IGN/33/34//none
[167]/0/168/1/0x70f7c238/2289/IGN/35/36//none
[168]/0/169/1/0x70f7d670/2287/IGN/37/38//none
[169]/0/170/1/0x70f7eaa8/2285/IGN/39/40//none
====================
END OF HANG ANALYSIS
====================
其內容意思大概以下
cnode--節點代號,若是爲rac,其值就存在,單節點的值爲0
sid---session的sid
sess_srno---session的serial#
proc_ptr--系統進程指向的address
ospid ----進程號
wait_event---session的等待事件
轉摘白大師部分節選
Hanganalyze是從Oracle 8i r2(8.1.6)開始提供的,其用法十分簡單:
ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level ';
或者
ORADEBUG hanganalyze
好比:
sql>oradebug setmypid;
sql>oradebug hanganalyze 3;
對於:
10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at all
-bash-3.2$ sqlplus -prelim '/as sysdba' --經過prelim選項進入已經hang住(正常方式進不了sqlplus)的數據庫
SQL*Plus: Release 10.2.0.1.0 - Production on Sat Aug 7 21:17:42 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> show parameter sga
ORA-01012: not logged on
SQL> conn /as sysdbaPrelim connection establishedSQL>
http://blog.itpub.net/16978544/viewspace-701657/