Oracle Hang分析--轉載

1. 數據庫hang的幾種可能性node

oracle 死鎖 或者系統負載很是高好比cpu使用或其餘一些鎖等待很高均可能致使系統hang住,好比大量的DX鎖。sql

一般來講,咱們所指的系統hang住,是指應用無響應,普通的sqlplus幾乎沒法操做等等。數據庫

2. 如何進行hang分析?hang分析有哪些level?如何選擇level?bash

hanganalyze有以下幾種level:session

10     Dump all processes (IGN state)
5      Level 4 + Dump all processes involved in wait chains (NLEAF state)
4      Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3      Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2    Only HANGANALYZE output, no process dump at alloracle

如何選擇level?this

通常來講,不建議使用3以上級別的hang分析,由於可能會產生很是大的trace,還可能對系統的IO有必定影響。spa

從oracle 9i開始 hanganalyze提供給了對rac的支持。.net

有以下2種方式:debug

1) ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level ';


2) 使用oradebug 命令

   ORADEBUG setmypid
   ORADEBUG setinst all
   ORADEBUG -g def hanganalyze        ---針對rac的用法

   oradebug setmypid
   oradebug hanganalyze 3       ---非rac環境


一般在作hang分析的時候,oracle建議同時作一個systemstate的dump

oradebug SYSTEMSTATE dump level 2     level 2便可, 包含了全部session的信息。
      sqlplus -prelim / as sysdba       ---10g能夠使用此方式登陸
      oradebug setospid 
      oradebug unlimit
      oradebug dump systemstate 10
補充:有時候咱們可能還須要對某個進程進行trace aix環境,咱們能夠使用dbx命令
以下例子:

dbx -a PID (where PID = any oracle shadow process)       ---經過ps -ef|grep xxx查看
dbx() print ksudss(10)
dbx() detach

3. 如何解讀hang分析的trace文件,獲取有用信息?

*** ACTION NAME:() 2010-03-12 00:04:01.497
*** MODULE NAME:(sqlplus@S7_C_YZ_YZSJK (TNS V1-V3)) 2010-03-12 00:04:01.497    ---模塊名 跟v$session.module_name同樣
*** SERVICE NAME:(SYS$USERS) 2010-03-12 00:04:01.497
*** SESSION ID:(5184.45287) 2010-03-12 00:04:01.497         ----sid (5184)   serial# (35287)
*** 2010-03-12 00:04:01.497
==============
HANG ANALYSIS:
==============
Found 54 objects waiting for 
    <0/5210/10419/0x99d0a88/11215038/No Wait>                          ------從這裏看 session 5210 阻塞了54個對象
Open chains found:
Chain 1 : :        ---從這裏開始 如下的session都是被前面的5210阻塞 一般來講是一個阻塞另外一個
    <0/5210/10419/0x99d0a88/11215038/No Wait>
-- <0/3994/15494/0xd9ac1b0/6574102/enq: TM - contention>
-- <0/4962/58962/0xca03618/5710044/enq: DX - contention>
Other chains found:                                                ---下面的session也是被前面所阻塞 不過不是直接阻塞(by Open chains) 間接阻塞
Chain 2 : :
    <0/4001/31548/0xf9f3ab0/4980956/enq: DX - contention>
Chain 3 : :
    <0/4014/30717/0xaa27b48/7446746/gc buffer busy>
Chain 4 : :
    <0/4039/42115/0xd9f5710/5595180/PX Deq: Table Q Normal>


Cycle 1 : :        ---cycle 一般是死鎖 通常來講頗有可能就是hang的根源
    <980/3887/0xe4214964/24065/latch free>
-- <2518/352/0xe4216560/24574/latch free>
-- <55/10/0xe41236a8/13751/latch free>

4. 不一樣版本hang分析的差別?trace有何異同?

以下是oracle8~10g的 hanganalyze trace信息格式:

Oracle 8.x : [nodenum]/sid/sess_srno/session/state/start/finish/[adjlist]/predecessor
Oracle9i: [nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
Oracle10g:[nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
Nodenum     --》 每一個session作hanganalyze生成的一個序列號
sid         --》 Session ID
sess_srno   --》 Serial#
ospid       --》 OS Process Id (v$process spid)
state       --》 State of the node
adjlist     --》 adjacent node   (Usually represents a blocker node) --一般是阻塞者
predecessor --》 predecessor node (Usually represents a waiter node) --一般是被阻塞者
cnode       --》 節點號 從9i開始纔有

關於state 有以下幾種值:

IN_HANG      --》 該狀態是一個很是危險的狀態,一般表現爲一個節點陷入了死循環或是hung。 通常來講出現這種狀況,該節點的臨闢節點也是同樣的狀態 即adjlist

            以下例子:
            [16]/0/17/154/0x24617be0/26800/IN_HANG/29/32/[185]/19      ---從IN_HANG 咱們能夠看出 185是16的鄰居節點,185被16阻塞
            [185]/1/16/4966/0x24617270//IN_HANG/30/31/[16]/16          ---從這裏看 185阻塞了16(16是waiter)

       
LEAF         --》一般是被認爲blockers的重點對象。那麼如何去肯定呢? 通常來講,根據後面的predecesor來判斷該session是否是blocker或者是waiter。


             以下例子:
             [ nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
             [16]/0/17/154/0x24617be0/26800/LEAF/29/30//19         --從這裏看19是waiter 所以咱們認爲17阻塞了20
             [19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186      


LEAF_NW     --》 跟leaf相似 不過可能會佔用cpu
NLEAF       --》該狀態的session一般被認爲 「stuck」 session。即其餘session所須要的資源正被其holding。
IGN         --》該狀態的session一般是處理IDLE狀態,除非其adjlist存在,若是是,那麼該session正在等待其餘session。
IGN_DMP     --》跟 IGN 相似。

以下例子:

[nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/LEAF/29/30//19
[19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186
[189]/1/20/36/0x24619830//IGN/95/96/[19]/none
[176]/1/7/1/0x24611d80//IGN/75/76//none

----從上面看,189在等待19,19在等待16,而176是一個idle session。

SINGLE_NODE,SINGLE_NODE_NW 能夠認爲跟LEAF,LEAF_NW同樣,除非沒有依賴對象。


本節我基於scott用戶產生兩個會話,模擬死鎖會話(一個update,一個delete)

SQL> oradebug help
HELP           [command]                 Describe one or all commands
SETMYPID                                 Debug current process
SETOSPID                          Set OS pid of process to debug
SETORAPID      ['force']        Set Oracle pid of process to debug
SHORT_STACK                              Dump abridged OS stack
DUMP           <dump_name>[addr]  Invoke named dump
DUMPSGA        [bytes]                   Dump fixed SGA
DUMPLIST                                 Print a list of available dumps
EVENT                              Set trace event in process
SESSION_EVENT                      Set trace event in session
DUMPVAR        <p|s|uga>[level]  Print/dump a fixed PGA/SGA/UGA variable
DUMPTYPE      

  Print/dump an address with type info
SETVAR         <p|s|uga>  Modify a fixed PGA/SGA/UGA variable
PEEK           [level]      Print/Dump memory
POKE                 Modify memory
WAKEUP                           Wake up Oracle process
SUSPEND                                  Suspend execution
RESUME                                   Resume execution
FLUSH                                    Flush pending writes to trace file
CLOSE_TRACE                              Close trace file
TRACEFILE_NAME                           Get name of trace file
LKDEBUG                                  Invoke global enqueue service debugger
NSDBX                                    Invoke CGS name-service debugger
-G                Parallel oradebug command prefix
-R                Parallel oradebug prefix (return output
SETINST        <instance# ..="" |="" all="">      Set instance list in double quotes
SGATOFILE               Dump SGA to file; dirname in double quotes
DMPCOWSGA      Dump & map SGA as COW; dirname in double quotes
MAPCOWSGA               Map SGA as COW; dirname in double quotes
HANGANALYZE    [level] [syslevel]        Analyze system hang
FFBEGIN                                  Flash Freeze the Instance
FFDEREGISTER                             FF deregister instance from cluster
FFTERMINST                               Call exit and terminate instance
FFRESUMEINST                             Resume the flash frozen instance
FFSTATUS                                 Flash freeze status of instance
SKDSTTPCS                Helps translate PCs to names
WATCH         
<self|exist|all|target>  Watch a region of memory
DELETE         <local|global|target>watchpoint     Delete a watchpoint
SHOW           <local|global|target>watchpoints        Show  watchpoints
CORE                                     Dump core without crashing process
IPC                                      Dump ipc information
UNLIMIT                                  Unlimit the size of the trace file
PROCSTAT                                 Dump process statistics
CALL           [arg1] ... [argn]  Invoke function with arguments

 

 

SQL> oradebug hanganalyze 3;
Hang Analysis in /oracle/admin/orcl/udump/orcl_ora_2622.trc
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
-bash-3.2$ more /oracle/admin/orcl/udump/orcl_ora_2622.trc
/oracle/admin/orcl/udump/orcl_ora_2622.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /oracle/product/10.2.0/db_1
System name:    Linux
Node name:      truerhel5
Release:        2.6.18-164.el5
Version:        #1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine:        x86_64
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 2622, image:oracle@truerhel5(TNS V1-V3)

*** SERVICE NAME:(SYS$USERS) 2010-08-07 21:11:10.818
*** SESSION ID:(145.36) 2010-08-07 21:11:10.818
*** 2010-08-07 21:11:10.818
==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 : : --每列的註解:分爲cnode sid sess_srno proc_ptr ospid wait_event
    <0/148/27/0x70e5e4a8/2543/SQL*Net message from client>   --會話148(持鎖會話)
 -- <0/146/84/0x70e5f478/2607/enq: TX - row lock contention> --會話146(等待鎖會話),競爭事件爲:row lock contention
Other chains found:
Chain 2 : :
    <0/144/108/0x70e5ccf0/2614/jobq slave wait>
Chain 3 : :
    <0/145/36/0x70e5fc60/2622/No Wait>
Chain 4 : :
    <0/150/2/0x70e623e8/2338/Streams AQ: waiting for time man>
Chain 5 : :
    <0/151/1/0x70e5ec90/2319/Streams AQ: qmn coordinator idle>
Chain 6 : :
    <0/158/7/0x70e61c00/2336/Streams AQ: qmn slave idle wait>
Extra information that will be dumped at higher levels:
[level  4] :   1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level  5] :   5 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level  6] :   1 node dumps -- [NLEAF]
[level 10] :  13 node dumps -- [IGN]
 
State of nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):
[143]/0/144/108/0x70f5dcf8/2614/SINGLE_NODE/1/2//none
[144]/0/145/36/0x70f5f130/2622/SINGLE_NODE_NW/3/4//none
[145]/0/146/84/0x70f60568/2607/NLEAF/5/8/[147]/none
[147]/0/148/27/0x70f62dd8/2543/LEAF/6/7//145
[149]/0/150/2/0x70f65648/2338/SINGLE_NODE/9/10//none
[150]/0/151/1/0x70f66a80/2319/SINGLE_NODE/11/12//none
[154]/0/155/1/0x70f6bb60/2315/IGN/13/14//none
[155]/0/156/1/0x70f6cf98/2313/IGN/15/16//none
[157]/0/158/7/0x70f6f808/2336/SINGLE_NODE/17/18//none
[159]/0/160/1/0x70f72078/2305/IGN/19/20//none
[160]/0/161/1/0x70f734b0/2303/IGN/21/22//none
[161]/0/162/1/0x70f748e8/2301/IGN/23/24//none
[162]/0/163/1/0x70f75d20/2299/IGN/25/26//none
[163]/0/164/1/0x70f77158/2297/IGN/27/28//none
[164]/0/165/1/0x70f78590/2295/IGN/29/30//none
[165]/0/166/1/0x70f799c8/2293/IGN/31/32//none
[166]/0/167/1/0x70f7ae00/2291/IGN/33/34//none
[167]/0/168/1/0x70f7c238/2289/IGN/35/36//none
[168]/0/169/1/0x70f7d670/2287/IGN/37/38//none
[169]/0/170/1/0x70f7eaa8/2285/IGN/39/40//none
====================
END OF HANG ANALYSIS
====================


其內容意思大概以下

cnode--節點代號,若是爲rac,其值就存在,單節點的值爲0

sid---session的sid

sess_srno---session的serial#

proc_ptr--系統進程指向的address

ospid ----進程號

wait_event---session的等待事件

 

轉摘白大師部分節選
Hanganalyze是從Oracle 8i r2(8.1.6)開始提供的,其用法十分簡單:

ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level ';

或者

ORADEBUG hanganalyze

好比:

sql>oradebug setmypid;

sql>oradebug hanganalyze 3;

對於:

      10     Dump all processes (IGN state)
      5      Level 4 + Dump all processes involved in wait chains (NLEAF state)
      4      Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
      3      Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
    1-2    Only HANGANALYZE output, no process dump at all

 
-bash-3.2$ sqlplus -prelim '/as sysdba' --經過prelim選項進入已經hang住(正常方式進不了sqlplus)的數據庫

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Aug 7 21:17:42 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

SQL> show parameter sga
ORA-01012: not logged on

SQL> conn /as sysdbaPrelim connection establishedSQL>

http://blog.itpub.net/16978544/viewspace-701657/
相關文章
相關標籤/搜索