【轉載】row cache lock

轉自:http://blog.itpub.net/26736162/viewspace-2139754/
 
定位的辦法:
--查詢row cache lock等待
select event,p1  from v$session where  event= 'row cache lock' and status='ACTIVE';
 
--查詢rowcache 名稱
select * from v$rowcache where cache# =p1;

名稱 P1 P2 P3 緣由 處理
row cache lock 字典的Cache號,其值爲V$ROWCACHE.CACHE# Mode:Mode Held request:Mode Requested row cache lock等待事件是一個共享池相關的等待事件,是因爲對字典緩衝的訪問形成的。每個行緩衝隊列鎖都對應一個特定的數據字典對象,這被叫作隊列鎖類型,並能夠在V$ROWCACHE視圖中找到。在AWR中須要查看Dictionary Cache Stats部分用以肯定問題。常見的緣由有以下幾點:
① 序列沒有設置CACHE屬性,致使序列爭用
② 表空間不足引發 表空間的擴展速度跟不上表空間的使用速度會發生該等待事件
③ Shared Pool不足,須要增長共享池
④ 用戶密碼錯誤或給出了空密碼而且頻繁登陸
根據P1值查詢V$ROWCACHE視圖(SELECT A.PARAMETER FROM V$ROWCACHE  A WHERE CACHE# =P1;),進而肯定隊列類型。若爲DC_SEQUENCES則增大序列的CACHE值;若爲DC_USERS則多是當用戶密碼錯誤或密碼爲空時用戶頻繁登陸致使,進而查詢審計來肯定(SELECT * FROM DBA_AUDIT_TRAIL WHERE RETURNCODE IN (1017, 1005);。1017表明密碼錯誤,1005表明空密碼);其它類型極可能是Shared Pool太小致使。

運行DDL必須請求一個行緩衝鎖(row cache lock)來鎖住數據字典(Data Dictionary)信息。共享池(Shared Pool)包含來自數據字典的的行緩衝區,能夠減小磁盤的I/O訪問,並容許對行進行加鎖。數據字典行鎖被叫作行緩衝隊列鎖(Row Cache Enqueue Locks)。這個隊列鎖結構從共享池中按需求分配,當這些請求在等待並超時就會看到行緩衝隊列鎖。row cache lock等待事件是一個共享池相關的等待事件,是因爲對字典緩衝的訪問形成的。每個行緩衝隊列鎖都對應一個特定的數據字典對象,這被叫作隊列鎖類型,並能夠在V$ROWCACHE視圖中找到。node


SELECT * FROM V$EVENT_NAME D WHERE D.NAME = 'row cache lock';面試

SELECT *sql

  FROM DBA_HIST_ROWCACHE_SUMMARY D數據庫

 WHERE D.SNAP_ID BETWEEN 1 AND 3014api

   AND D.GETS > 0;緩存

 

SELECT * FROM V$ROWCACHE WHERE CACHE# IN ('7', '10');微信

 

SELECT A.PARAMETER FROM V$ROWCACHE  A WHERE CACHE# =P1;網絡

 

SELECT * FROM DBA_AUDIT_TRAIL d WHERE /*RETURNCODE IN (1017) AND*/ d.action_name='LOGON' AND D.returncode>0 AND D.returncode<>1017;session



[root@rhel6lhr ~]# oerr ora 1005併發

01005, 00000, "null password given; logon denied"

// *Cause:

// *Action:

[root@rhel6lhr ~]# oerr ora 1017

01017, 00000, "invalid username/password; logon denied"

// *Cause:

// *Action:

[root@rhel6lhr ~]#

在AWR中須要查看Dictionary Cache Stats部分用以肯定問題。

 




常見隊列鎖類型
行緩衝隊列鎖等待的調優基於每個隊列鎖類型的行爲,其中常見的有:
① DC_SEQUENCES:在使用序列的時候將發生該行緩衝隊列鎖。調優方式是檢查序列是否指定了緩衝選項並肯定這個緩衝值能夠承受預期的併發insert操做。Check for appropriate caching of sequences for the application requirements.
② DC_USED_EXTENTS和DC_FREE_EXTENTS:該行緩衝隊列鎖可能在空間管理碰到表空間分裂或者沒有足夠區大小時發生。調優方法是檢查表空間是否分裂了、區大小是否過小或者表空間是人工管理。
③ DC_TABLESPACES:該行緩衝隊列鎖會在分配新區是發生。若是區大小設置得太小,程序將常常申請新區,這將致使衝突。調優方法是快速地增長區的數量。Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
④ DC_OBJECTS:該行緩衝隊列鎖會在重編譯對象的時候發生。當對象編譯時將申請一個排他鎖阻塞其餘行爲。經過檢查非法對象和依賴關係來調優。
⑤ DC_SEGMENTS:該行緩衝隊列鎖會在段分配的時候發生,觀察持有這個隊列鎖的會話在作什麼。This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose. 
⑥ DC_USERS:Deadlock and resulting 「WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!」 can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.dc_users是和用戶用錯誤密碼登錄有關,In 11g there is an intentional delay between allowing failed logon attempts to retry. For some specific application types this can cause a problem as the row cache entry is locked for the duration of the delay . This can lead to excessive row cache lock waits for DC_USERS for specific users / schemas.

After 3 successive failures a sleep delay is introduced starting at 3 seconds and extending to 10 seconds max. During each delay the user X row cache lock is held in exclusive mode preventing any concurrent logon attempt as user X (and preventing any other operation which would need the row cache lock for user X).
⑦ DB_ROLLBACK_SEGMENTS:This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.
⑧ DC_AWR_CONTROL:This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these. 
DC_SEQUENCES
For DC_SEQUENCES, consider caching sequences using the cache option.


DC_OBJECTS
Look for any object compilation activity which might require an exclusive lock, blocking other activities


DC_SEGMENTS
Contention here is most likely to be due to segment allocation.  Investigate what segments are being created at the time.


DC_USERS
This may occur if a session issues a GRANT to a user and that user is in the process of logging on to the database. Investigate why grants are being made while the users are active.


DC_TABLESPACES
The most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
 





Resolving Issues Where 'Row Cache Lock' Waits are Occurring (文檔 ID 1476670.1)

In this Document

  Purpose
  Troubleshooting Steps
  Problem Confirmation:
  Row Cache Lock
  Reducing Waits
  Known Issues
  References

 

APPLIES TO:

Oracle Database - Standard Edition - Version 10.2.0.1 and later
Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
Oracle Database - Personal Edition - Version 10.2.0.1 and later
Information in this document applies to any platform.

PURPOSE

Disclaimer: This note is written in the context that the Performance Guided Resolution tool was used and recommended this article. It may not make as much sense if read standalone or by other audiences.

 

TROUBLESHOOTING STEPS

 Brief Definition:

The shared pool contains a cache of rows from the data dictionary that helps reduce physical I/O on the data dictionary tables. The row cache lock is used primarily to serialize changes to the data dictionary and is waited on when a lock on a data dictionary cache is required. Waits on this event usually indicate some form of DDL occuring, or possibly recursive operations such as storage management and incrementing sequence numbers.

Problem Confirmation:

  • Significant wait for latch: row cache objects
  • Slow overall performance with row cache lock
  • High CPU usage

Row Cache Lock

When DDLs execute, they must acquire locks on the row cache in order to access and change the Data Dictionary information. Once the locks are taken then they can be allowed to modify individual rows in the data dictionary. 

Reducing Waits

1. The data dictionary resides in the shared pool. If the shared pool is not sized correctly then the data dictionary might not be fully cached. This should be handled automatically with the automatic shared memory tuning feature. The following documents provide more details: 

Document 257643.1 Oracle Database Automated SGA Memory Tuning
Document 270935.1 Shared pool sizing

 2. Find which cache is being waited for:

SQL> select p1text,p1,p2text,p2,p3text,p3 from v$session where event='row cache lock';

P1TEXT P1 P2TEXT P2 P3TEXT P3

cache id 8 mode 0 request 3

SQL> select parameter,count,gets,getmisses,modifications from v$rowcache where cache#=8;

PARAMETER COUNT GETS GETMISSES MODIFICATIONS

DC_SEQUENCES    869    76843 508432   4500

In this example the cache is the "DC_SEQUENCES" cache.

3. Take cache dependent actions:

DC_SEQUENCES
For DC_SEQUENCES, consider caching sequences using the cache option.

DC_OBJECTS
Look for any object compilation activity which might require an exclusive lock, blocking other activities

DC_SEGMENTS
Contention here is most likely to be due to segment allocation.  Investigate what segments are being created at the time.

DC_USERS
This may occur if a session issues a GRANT to a user and that user is in the process of logging on to the database. Investigate why grants are being made while the users are active.

DC_TABLESPACES
The most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.

4. For further information on row cache issues, review the following:

Document 278316.1 Troubleshooting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"

Measuring Success

Once you have applied the changes to resolve the issues you have found, compare the latest AWR to the AWR that led you here via Guided Resolution (that AWR becomes your baseline). Look at the percentage decrease total wait time for this event. If there are still issues, re- evaluate those and address them according to the specific symptom.

 

Known Issues

Document 1417373.1 Row Cache Latch Contention for DC_USERS While Using VPD
Document 2372926.8 "row cache objects" latch contention on DC_USERS row cache

Document 1466896.1 High Waits on 'Row Cache Lock' with Pipelined Functions After Upgrade to 11.2.0.3
Document 13502860.8 Bug 13502860 - "row cache lock" contention on SYS_PLSQL_xx objects using PIPELINED functions

 

 

 

REFERENCES


NOTE:34609.1 - WAITEVENT: "row cache lock" Reference Note



WAITEVENT: "row cache lock" Reference Note (文檔 ID 34609.1)

"row cache lock" Reference Note

This is a reference note for the wait event  "row cache lock" which includes the following subsections: See  Note:61998.1 for an introduction to Wait Events.

Definition:

  • Versions: 7.3 - 12.1
    Documentation: 12.1 11.2 11.1 10.2 10.1 

  • This event is used to wait for a lock on a data dictionary cache specified by "cache id" (P1).

    If running Real Application Clusters (RAC) then LCK0 is signalled to get the row cache lock for the foreground waiting on this event. The LCK0 process will get the lock asynchronously. In exclusive mode the foreground process will try to get the lock.

Individual Waits:

  Parameters:

  • P1 = cache - ID of the dictionary cache
  • P2 = mode - Mode held
  • P3 = request - Mode requested
  • cache - ID of the dictionary cache
    Row cache lock we are waiting for. Note that the actual CACHE# values differ between Oracle versions. The cache can be found using this select - "PARAMETER" is the cache name:
    SELECT cache#, type, parameter 
      FROM v$rowcache     
     WHERE cache# = &P1
    ;

    In a RAC environment the row cache locks use global enqueues of type "Q[A-Z]" with the lock id being the hashed object name.

     

  • mode - Mode held
    The mode the lock is currently held in:
      KQRMNULL 0   null mode - not locked
      KQRMS    3   share mode
      KQRMX    5   exclusive mode KQRMFAIL 10  fail to acquire instance lock
  • request - Mode requested
    The mode the lock is requested in:
      KQRMNULL 0   null mode - not locked
      KQRMS    3   share mode
      KQRMX    5   exclusive mode KQRMFAIL 10  fail to acquire instance lock

  Wait Time:

In exclusive mode any process other than PMON will timeout after 8 hours (10000 waits of 3 seconds)

In RAC the foreground will wait 60 seconds for the LCK0 to get the lock, the foreground will wait in infinite loop until the lock has been granted (LCK0 will notify foreground).

In either case PMON will wait for only 5 seconds.

If a session times out when waiting for a row cache lock then it will report this to the alert log and tracefile with a message like:

WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK

 

  Finding Blockers:

Holders and requesters can be seen in view X$KQRFP for parent objects, and X$KQRFS for subordinates. 
eg: The following select will show all holders of parent row cache objects so can be used to help find the blocking session.
SELECT * FROM x$kqrfp WHERE kqrfpmod!=0;
(KQRFPSES is the address of the holding session V$SESSION.SADDR)

 

Systemwide Waits:

It is important to determine which cache is being waited for. The V$ROWCACHE view gives an overview of which caches are being used most, but the waits may not necessarily be on the most used cache. The V$ACTIVE_SESSION_HISTORY view can be used to get an idea of which cache ids (P1) are involved in waits.
  • If the issue is general across various caches (different cache ids) then the shared pool may need increasing in size to allow more dictionary information to be cached
  • If the issue is focused on a specific cache id then options typically depend on the cache involved - see the Troubleshootingsection below.

Reducing Waits / Wait times:

Options to reduce waits depends on the specific cache that has the contention. See the documents in the Troubleshooting section below for advice on various caches.

Troubleshooting

See the following documents for help troubleshooting issues relating to "row cache lock" waits
Document:1476670.1 Resolving Issues Where 'Row Cache Lock' Waits are Occurring
Document:278316.1 Troubleshooting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"

Known Issues / Bugs:

You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button: 
                


 
NB Prob Bug Fixed Description
  - 17828499 12.1.0.1.4, 12.1.0.2, 12.2.0.1 Opening PDB hangs waiting for row cache lock on open UPGRADE
  - 16921340 12.1.0.1.1, 12.1.0.2, 12.2.0.1 Non-CDB to PDB plugin hangs
  II 13884774 12.1.0.2, 12.2.0.1 Deadlock from concurrent select/ALTER SUMMARY and ALTER TABLE if query_rewrite_enabled set to true/force - superseded
  II 16994952 12.1.0.1 Unable to unschedule propagation due to CJQ self-deadlock / hang
  II 14117976 11.2.0.3.BP14, 11.2.0.4, 12.1.0.1 Database hangs when executing multiple MV DDL's in parallel
  III 13496395 11.2.0.4, 12.1.0.1 Hang / deadlock involving ACCOUNT_STATUS object for concurrent LOGON and ALTER USER
  III 7715339 11.2.0.1 Logon failures causes "row cache lock" waits - Allow disable of logon delay
  III 21153142 12.2.0.1 Row cache lock self deadlock accessing seed PDB
  II 21091431 12.1.0.2.160419, 12.1.0.2.DBBP13, 12.2.0.1 row cache lock during trigger creation using editions
  II 19907473 12.2.0.1 GEN0 process causing database hang
  III 15850031 11.2.0.4, 12.1.0.2, 12.2.0.1 Rare instance hang: deadlock between 'row cache lock' and 'cursor: pin S wait for X'
  II 13622515 11.2.0.4, 12.1.0.2, 12.2.0.1 library cache <-> row cache deadlock / hang when altering constraint with MV involved - superseded
  II 13869467 12.1.0.2 Many waits for 'row cache lock' in RAC while creating many tables with constraints
  II 13979519 11.2.0.3.BP11, 11.2.0.4, 12.1.0.1 Locks on dc_users kept too long
  II 13916228 11.2.0.4, 12.1.0.1 Enabling/disabling constraints makes DML timeout in RAC - superseded
  II 13502860 11.2.0.4, 12.1.0.1 "row cache lock" contention on SYS_PLSQL_xx objects using PIPELINED functions
  - 13387978 11.2.0.4, 12.1.0.1 Sessions running TRUNCATE causing a deadlock, even if the constraints are disabled
  - 12953743 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1 Parallel CTAS of table with a Securefile is slower than parallel IAS / PIDL
  I 12889054 11.2.0.2.BP16, 11.2.0.3.BP05, 11.2.0.4, 12.1.0.1 AWR snapshot hangs on dc_objects row_cache_lock held by ctas job
D - 12792862 11.2.0.4, 12.1.0.1 Performance of INSERT using binary xml is very slow due to "row cache lock" in RAC environment - superseded
  II 12351027 11.2.0.4, 12.1.0.1 Redefinition causing deadlock between "row cache lock" and "library cache lock"
  II 11693365 11.2.0.3, 12.1.0.1 Concurrent Drop table and Select on Reference constraint table hangs (deadlock)
  I 10382754 11.2.0.2.BP17, 11.2.0.3, 12.1.0.1 Poor performance/ rowcache contention in 11g with partitioning due to invalidation of objects
  IIII 10204505 11.2.0.3, 12.1.0.1 SGA autotune can cause row cache misses, library cache reloads and parsing
  II 10126219 11.2.0.1.BP08, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Undetected deadlock 'library cache lock'/'row cache lock' with concurrent DDLs on partition tables.
  II 9952554 11.2.0.2.8, 11.2.0.2.BP18, 11.2.0.3, 12.1.0.1 Undetected deadlock 'library cache lock'/'row cache lock' with a session modifying a constraint
  II 9866045 11.2.0.3, 12.1.0.1 Long wait on 'wait for master scn' in LCK causing long row cache lock waits
  III 9776608 11.2.0.2, 12.1.0.1 Hang from concurrent login to same account with a wrong password
  II 9278979 11.2.0.2, 12.1.0.1 Instance hang / ORA-4021 with OPTIMIZER_USE_PENDING_STATISTICS = true
  I 8268775 11.1.0.7.4, 11.2.0.1.2, 11.2.0.1.BP07, 11.2.0.2, 12.1.0.1 High US enqueue contention during a login storm or session failover
  - 8364676 11.1.0.7.2, 11.2.0.1 row cache lock waits from background space preallocation
  III 7529174 10.2.0.5, 11.1.0.7.3, 11.2.0.1 Deadlock / hang between SMON and foreground process
  - 7416901 11.1.0.7.1, 11.2.0.1 Deadlock between QC and PQ slaves when CELL_PARTITION_LARGE_EXTENTS = ALWAYS
  II 7313166 10.2.0.5, 11.2.0.1 Startup hang with self deadlock on dc_rollback_segments
  II 6870994 10.2.0.5, 11.1.0.7.3, 11.2.0.1 High US enqueue / rowcache lock while trying to online a NEW undo segment
  III 6027068 10.2.0.5, 11.2.0.1 Contention on ORA_TQ_BASE$ sequence
  - 5756769 10.2.0.4.1, 10.2.0.5, 11.1.0.7, 11.2.0.1 Deadlock between Create MVIEW and DML
  III 6143420 10.2.0.5, 11.1.0.6 Deadlock involving "ROW CACHE LOCK" on dc_users AND "CURSOR: PIN S WAIT ON X"
  III 6004916 10.2.0.5, 11.1.0.6 Hang involving row cache enqueues in RAC (ORA-4021)
  II 5883112 10.2.0.4, 11.1.0.6 False deadlock in RAC
  II 5138741 10.2.0.4, 11.1.0.6 High waits on 'row cache lock' when using materialized views on RAC
  II 4604972 11.1.0.6 Deadlock on dc_users by Concurrent Grant/Revoke
  - 4579381 10.1.0.5, 10.2.0.2, 11.1.0.6 Deadlock on DC_USERS in RAC (ORA-4020)
  - 4446011 9.2.0.8, 10.1.0.5, 10.2.0.2, 11.1.0.6 Hang with row cache lock deadlock from concurrent ALTER USER / TRUNCATE
  II 4390868 10.1.0.5, 10.2.0.3, 11.1.0.6 Contention on DC_SEGMENTS due to small cache size on SYS.AUDSES$
  - 4313246 9.2.0.8, 10.1.0.5, 10.2.0.2, 11.1.0.6 PLSQL execution can hold dc_users row cache lock leading to hang / deadlocks
  I 4153150 9.2.0.8, 10.2.0.2, 11.1.0.6 Deadlock on dc_rollback_segments from concurrent parallel load and undo segment creation
  I 6051177 10.2.0.4.1, 10.2.0.5 Hang / deadlock between coalesce and DBMS_STATS.gather_table_stats
  - 5983020 10.2.0.4 MMON deadlock with user session executing ALTER USER
  - 4275733 9.2.0.8, 10.1.0.5, 10.2.0.1 Deadlock between library cache lock and row cache lock from concurrent rename partition
  I 5641198 10.2.0.1 Some waits may be longer than needed ("row cache lock") in RAC
  - 4137000 10.1.0.5, 10.2.0.1 Concurrent SPLIT PARTITION can deadlock / hang
  - 3627263 9.2.0.6, 10.1.0.4, 10.2.0.1 Deadlock / hang during RAC instance startup
  II 3424721 9.2.0.6, 10.1.0.3, 10.2.0.1 Hang/deadlock from ALTER INDEX REBUILD on partition with concurrent SQL
  - 2615271 9.2.0.6, 10.1.0.2 Deadlock from concurrent GRANT and logon
  • '*' indicates that an alert exists for that issue.
  • '+' indicates a particularly notable issue / bug.
  • See Note:1944526.1 for details of other symbols used

Related:

Document:1628089.1 AWR Report Interpretation Checklist for Diagnosing Database Performance Issues
Document:1359094.1 How to Use AWR Reports to Diagnose Database Performance Issues
Document:1274511.1 General SQL_TRACE / 10046 trace Gathering Examples


Troubleshooting: "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! " (文檔 ID 278316.1)

故障排除:"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! " (文檔 ID 2016422.1)

 

文檔內容

  用途
  排錯步驟
  什麼是 row cache enqueue 鎖(Row Cache Enqueue Lock)?
  "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 警告信息是什麼意思?
  "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 可能的緣由
  SGA收縮(shrink)/調整大小的操做(resize)
  row cache enqueue 類型
  DC_TABLESPACES
  DC_SEQUENCES
  DC_USERS
  DC_OBJECT_IDS
  DC_SEGMENTS
  DC_ROLLBACK_SEGMENTS
  DC_TABLE_SCNS
  DC_AWR_CONTROL
  我能夠收集哪些信息,以肯定緣由?
  Systemstate dump
  AWR,ADDM 和 ASH 報告
  如何分析收集到的診斷信息?
  Systemstate dump
  示例1:
  示例2:
  AWR 報告
  10g 之前的版本可能存在的問題
  其餘問題疑難解答
  參考

 

適用於:

Oracle Database - Personal Edition - 版本 8.0.6.0 和更高版本
Oracle Database - Enterprise Edition - 版本 8.0.6.0 和更高版本
Oracle Database - Standard Edition - 版本 8.0.6.0 和更高版本
本文檔所含信息適用於全部平臺

用途

本文檔的目的是幫助排查緣由"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "

排錯步驟

什麼是 row cache enqueue 鎖(Row Cache Enqueue Lock)?

行緩存(Row Cache)或數據字典緩存(Data Dictionary Cache)是保存數據字典信息的共享池的內存區域。row cache 保存數據時並非以數據塊的形式,而是以行的形式。row cache enqueue 鎖是在數據字典行的鎖。此 enqueue 是關於特定數據字典對象的。這就是所謂的 enqueue 類型,能夠在視圖 V$rowcache 中找到。

對於每一個版本 row cache 類型的列表,請參閱:

Document 468334.1  How to Match a Row Cache Object Child Latch to its Row Cache

"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 警告信息是什麼意思?

當咱們試圖得到 row cache 鎖,這種等待事件將被使用。
當 row cache 衝突發生時,若是不能在一個預約的時間週期內獲得 enqueue,將在 USER_DUMP_DEST 或 background_dump_dest 目錄下生成一個跟蹤文件,這取決因而用戶仍是後臺進程建立的跟蹤文件。alert.log 一般會相應的更新警告消息和跟蹤文件的位置。
數據庫檢測到核心資源被持有過久並通知管理員,從而讓這種狀況能夠獲得解決。這也可能伴隨着數據庫掛起或變慢。
alert.log 的消息和生成的跟蹤文件趨向於包含消息:

> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<

若是不能當即獲取 rowcache entry 鎖,那麼進入一個循環,先釋放 row cache 對象閂鎖,等待(等待上述等待事件),從新得到閂鎖,而後再次嘗試獲取 rowcache 鎖。在單實例模式,會重複 1000次直到進程報錯「WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK」。在 RAC 環境會一直重複,直到不能得到實例鎖或者被中斷。

Systemstate dump 能夠提供一些有用的信息診斷爭用的緣由。

請注意:The "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 當達到閾值會引起這個消息,所以,若是未達到閾值它不會被引起。這意味着,不太嚴重的問題,即便具備相同的緣由,也能夠不輸出該消息。

 

"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 可能的緣由

SGA收縮(shrink)/調整大小的操做(resize)

若是 SGA 動態地改變大小,須要持有各類 latches 來避免其它進程同時操做,直到操做完成。若是調整大小須要一段時間,或者是常常發生,你會看到"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"的發生。定位這種狀況的方法是,有不少'SGA: allocation forcing component growth'等待事件,或 AWR 的 TOP 列表有相似等待,以及阻塞等待"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 的會話在等待'SGA: allocation forcing component growth'(或相似)。有一些可用的代碼修復,請參閱:

Document 7189722.8 Bug 7189722 - Frequent grow/shrink SGA resize operations 
Document 9267837.8 Bug 9267837 - Auto-SGA policy may see larger resizes than needed

row cache enqueue 類型

對於每個 enqueue 類型,都有對應的一些操做會須要獲取這類 enqueue。隊列的類型,可能給出因爲操做可能致使的問題的指示。一些常見的緣由以下:

DC_TABLESPACES

最可能的緣由是新 extent 的分配。若是 extent 大小設置太小,那麼應用程序可能會不斷地要求新的 extent,這可能致使爭用。你有很小的 extent 尺寸,正在迅速增加的對象嗎?(經過查找具備大量 extents 的對象能夠定位它們)。檢查 insert/update 活動的 trace,查找那些就有不少 extents 的對象。

DC_SEQUENCES

檢查應用程序用到的 sequence 的 cache 的大小:

Document 853652.1 RAC and Sequences
Document 395314.1 RAC Hangs due to small cache size on SYS.AUDSES$ - fixed in 10.2.0.3
Document 6027068.8 Bug 6027068 - Contention on ORA_TQ_BASE sequence -fixed in 10.2.0.5 and 11.2.0.1
DC_USERS

一個會話正在對一個用戶執行 GRANT,與此同時此用戶正在登陸到數據庫中,此時可能會發生死鎖或致使"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 。

Document 4604972.8 Bug 4604972 - Deadlock on dc_users by Concurrent Grant/Revoke - fixed in 11.1.0.6
Document 6143420.8 Bug 6143420 - Deadlock involving "ROW CACHE LOCK" on dc_users AND "CURSOR: PIN S WAIT ON X"- fixed in 10.2.0.5 and 11.1.0.6  DC_OBJECTS
Document 12772404.8 Bug 12772404 - Significant "row cache objects" latch contention when using VPD
DC_OBJECT_IDS
Document 11693365.8 Bug 11693365 - Concurrent Drop table and Select on Reference constraint table hangs(deadlock) - fixed in 12.1
DC_SEGMENTS

這極可能是 segment 的分配致使的。肯定持有鎖的用戶正在作什麼並使用 errorstacks 進行診斷。

DC_ROLLBACK_SEGMENTS

這多是因爲 rollback 段的分配致使的。正如 dc_segments,肯定誰持有鎖並收集 errorstack 來進行診斷。請記住,在多節點系統(RAC)上,持有者可能在另外一節點上,所以須要全部節點的 systemstate。

DC_TABLE_SCNS
Document 5756769.8 Bug 5756769 - Deadlock between Create MVIEW and DML - fixed in 10.2.0.5 ,11.1.07 and 11.2.0.1
DC_AWR_CONTROL

此 enqueue 關係到 AWR(Automatic Workload Repository)的控制權。任何操縱 AWR 資料庫的操做將持有它. 要分析這個問題,須要查找是那些進程阻塞了它們。

  RAC 相關的 Bugs

Document 6004916.8 Bug 6004916 - Hang involving row cache enqueues in RAC (ORA-4021) - fixed in 102.0.5 and 11.1.0.6
Document 8666117.8 Bug 8666117 - High row cache latch contention in RAC - fixed in 11.2.0.2 and 12.1
Document 9866045.8 Bug 9866045 - Long wait on 'wait for master scn' in LCK causing long row cache lock waits - fixed in 12.1

 

我能夠收集哪些信息,以肯定緣由?

Systemstate dump

當問題發生時,錯誤會記入 alert.log,並自動產生一個 systemstate dump 文件。

Wed Sep 21 13:39:19 2011
> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37
System State dumped to trace file /oracle/diag/rdbms/..../.trc

 

AWR,ADDM 和 ASH 報告

收集兩份 AWR 報告,一份有問題時間段的,另外一個是沒有問題時間段的,由於這些能夠幫助咱們理解問題發生時數據庫的情況 AWR,ADDM,ASH 報告,能夠相互取長補短,從而更完整地理解整個問題。

取決於 AWR 快照生成的時間間隔,收集最小時間間隔的報告。默認的快照是一個小時的時間間隔。

SQL>@$ORACLE_HOME/rdbms/admin/awrrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/addmrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/ashrpt.sql


鑑於分析 systemstate 是一件很複雜的事情,您能夠建立一個服務請求,並上傳 alert.log,systemstate dump,以及問題發生前和問題發生時的 AWR 報告請 Oracle 技術支持來分析。

如何分析收集到的診斷信息?

Systemstate dump

一般狀況下, row cache enqueue 是一系列事件的一部分,阻塞了申請 row cache enqueue 的進程的進程極可能被另外一個進程阻塞。Row cache enqueue 常常是問題的表象。

Systemstate dump 能夠幫助查找申請的是哪個 row cache ,並可能有助於發現阻塞進程。

示例1:
Oracle process number: 77
Unix process pid: 10846, image: oracle@cpdb4532

*** 2011-05-13 08:08:58.775
*** SERVICE NAME:(ALFCMR_SERVICE) 2011-05-13 08:08:58.775
*** SESSION ID:(1076.796) 2011-05-13 08:08:58.775
> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
row cache enqueue: session: 0x1df57ade8, mode: N, request: S


trace 的標題顯示下列內容:

  • 等待 row cache enqueue 鎖的 Oracle 進程號(PID)(在這個案例,進程 77)。
  • 正在申請的 row cache enqueue 的模式(請求:S)。

所以,在上述例子中,進程 77 是在請求以共享模式得到 row cache(請求:S)。

Systemstate 包含數據庫中每個進程的狀態信息,所以能夠在 systemstate 中查找這個進程:

PROCESS 77
----------------------------------------
.
.
----------------------------------------
SO: 0x1cdf11958, type: 50, owner: 0x17d198288, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1df57ade8 object=0x1dc9a5d30, request=S
savepoint=0x87b70d
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
.
.


從上面咱們看到,進程 77 請求共享模式得到 row cache dc_users。
進程 77 處於等待狀態,意味着被其它進程阻塞,咱們如今須要檢查 systemstate 判斷誰持有資源並阻塞了這個進程。

搜索引用的對象,在這個例子中,是 object=0x1dc9a5d30。
這樣作完之後,咱們發現,進程 218 正以獨佔模式持有這個對象:

PROCESS 218:
----------------------------------------
.
.
SO: 0x1cdf118f8, type: 50, owner: 0x1ceb0f178, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1da54cf68 object=0x1dc9a5d30,  request=X
savepoint=0x11e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)


獨佔模式的請求,將會一直阻塞共享模式的請求,直到該進程獨佔模式的請求被知足並稍後釋放了這個資源。所以,這將阻止其餘共享模式請求。請注意,這是請求獨佔而不是獨佔持有,因此這個請求也必定被阻塞了。查看其餘進程中,咱們看到進程 164 在以共享模式(mode=s)持有這個對象。

PROCESS 164:
----------------------------------------
.
.
O/S info: user: u1m, term: , ospid: 1234, machine: cpc44711
program:
last wait for 'SQL*Net message from client' blocking sess=0x(nil) seq=36289 wait_time=6943 seconds since wait started=2539
driver id=54435000, #bytes=1, =0
.
.
SO: 0x1cdf11418, type: 50, owner: 0x1ccc26120, flag: INIT/-/-/0x00
row cache enqueue: count=2 session=0x1df578318  object=0x1dc9a5d30, mode=S
savepoint=0xb1bd8e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
hash=fc968070 typ=11 transaction=(nil) flags=00000002
own=0x1dc9a5e00[0x1cdf11448,0x1cdf11448] wat=0x1dc9a5e10[0x1cdf11928,0x17d5192e0] mode=S


所以,進程 164 以共享模式持有 row cache enqueue(mode= S),從而防止了進程 218 以獨佔模式得到該 row cache enqueue。此外,咱們看到,進程 164 在 ON CPU(systemstate 顯示最後一個等待是'SQL*Net message from client',而不是等待'SQL*Net message from client')。爲了進一步診斷,技術支持須要檢查堆棧調用,以肯定爲何這個進程在 ON CPU 並持有該隊列這麼久(從開始已經等待了2539秒)。

示例2:


在這個例子中,進程 18(MMON)等待以共享模式得到類型爲 dc_awr_control 的 row cache。

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /opt/oracle10/product/10.2.0
System name: SunOS
Node name: saecopt51
Release: 5.10
Version: Generic_144488-04
Machine: sun4v
Instance name: PORT_V16
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 6196, image: oracle@sae (MMON)
.
.

PROCESS 18:
----------------------------------------
.
.
last wait for 'ksdxexeotherwait' wait_time=0.000013 sec, seconds since wait started=6
.
.
SO: 39bf1f0e8, type: 50, owner: 3980783a0, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be37ea80  object=39a79f090, request=S
savepoint=0x41f0ae
row cache parent object:  address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.


對象(object=39a79f090)的 row cache lock 被進程 269 以獨佔模式(mode=x)持有。進程在等待'SGA: allocation forcing component growth'。

PROCESS 269:
----------------------------------------
.
.

waiting for 'SGA: allocation forcing component growth' wait_time=0, seconds since wait started=3

.
.
SO: 39bf1f148, type: 50, owner: 3bc39f560, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be1b7c98  object=39a79f090, mode=X
savepoint=0x41efe8
row cache parent object:  address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.


所以根本緣由就是 SGA 的大小調整,等待 row cache 是次要結果。
咱們使用該期間的 AWR 報告看相關信息:

AWR 報告

Top 5 Timed Events                                         Avg %Total               
~~~~~~~~~~~~~~~~~~                                        wait   Call               
Event                                 Waits    Time (s)   (ms)   Time Wait Class    
------------------------------ ------------ ----------- ------ ------ ----------    
SGA: allocation forcing compon   42,067,317      38,469      1    7.6      Other    
CPU time                                          2,796           0.6               
db file sequential read             132,906         929      7    0.2   User I/O    
latch free                        4,282,858         704      0    0.1      Other    
log file switch (checkpoint in          904         560    620    0.1 Configurat    
          -------------------------------------------------------------       


咱們能夠清楚地看到,在 Top 5 等待事件中,整個系統中有針對此事件的一個顯著等待;而且'SGA: allocation forcing component growth' 是這一時間點的一個主要問題。"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" 消息的根本緣由就是內存調整活動,TOP 5 的等待事件甚至不顯示等待「row cache"症狀。

注意:
若是調整內存大小沒有那麼嚴重,有些時候沒有"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"消息。 - 主要是由於沒有達到在此以前提到的閾值。可是,您可能會看到其餘等待事件。在如下文獻中概述了一個常見的例子:

Document 742599.1 High 'cursor: pin S wait on X' and/or 'library cache lock' Waits Generated by Frequent Shared Pool/Buffer Cache Resize Activity

對於頻繁的內存調整,有幾個潛在的可用修復,請參閱:

Document 7189722.8 Bug 7189722 - Frequent grow/shrink SGA resize operations
Document 9267837.8 Bug 9267837 - Auto-SGA policy may see larger resizes than needed

10g 之前的版本可能存在的問題

10g 以前的版本,檢測 row cache 級別死鎖的方法有限。爲了儘可能減小發生死鎖的可能性,可能的解決方法:

  • 設置 TIMED_STATISTICS=FALSE
  • 設置 _row_cache_cursors=20 以上(默認值 10)
  • 不要作任何 tracing
Document 30802.1 Init.ora Parameter "ROW_CACHE_CURSORS" Reference Note

其餘問題疑難解答

對於其餘性能問題的故障排除,請參閱:

Document 1377446.1 Troubleshooting Performance Issues

 

 

Bug 12772404 - Significant "row cache objects" latch contention when using VPD (Doc ID 12772404.8)

參考

BUG:11693365 - GETTING ERROR 'WAITED TOOL LONG FOR ROW CACHE ENQUEUE LOCK'


BUG:5756769 - ROW CACHE DEADLOCK "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"
NOTE:11693365.8 - Bug 11693365 - Concurrent Drop table and Select on Reference constraint table hangs (deadlock)
NOTE:1377446.1 - * Troubleshooting Performance Issues
NOTE:30802.1 - Init.ora Parameter "ROW_CACHE_CURSORS" Reference Note
NOTE:395314.1 - RAC Hangs due to small cache size on SYS.AUDSES$
NOTE:4604972.8 - Bug 4604972 - Deadlock on dc_users by Concurrent Grant/Revoke
NOTE:468334.1 - How to Match a Row Cache Object Child Latch to its Row Cache
NOTE:5756769.8 - Bug 5756769 - Deadlock between Create MVIEW and DML
NOTE:6004916.8 - Bug 6004916 - Hang involving row cache enqueues in RAC (ORA-4021)
NOTE:6027068.8 - Bug 6027068 - Contention on ORA_TQ_BASE$ sequence
NOTE:6143420.8 - Bug 6143420 - Deadlock involving "ROW CACHE LOCK" on dc_users AND "CURSOR: PIN S WAIT ON X"
NOTE:742599.1 - High 'Cursor: Pin S Wait On X', 'Library Cache Lock' And "Latch: Shared Pool" Waits due to Shared Pool/Buffer Cache Resize Activity
NOTE:853652.1 - RAC and Sequences
NOTE:8666117.8 - Bug 8666117 - High row cache latch contention in RAC
NOTE:9866045.8 - Bug 9866045 - Long wait on 'wait for master scn' in LCK causing long row cache lock waits



BUG:8666117 - LCK0 PROCESS STUCK AT WAITING FOR "LATCH: ROW CACHE OBJECTS"




一次awr報告分析(密碼錯誤引起sql執行時間過長) 
預備知識:
1)row cache lock事件  
--內存share pool分爲library cache,dictionary cache;而row cache lock對象分佈在dictionary cache中,是因爲對於字典緩衝的訪問形成的;  
--這類屬於latch類的資源競爭,至關耗CPU,若是併發量大的話,容易down機
若是發現這個等待十分高,通常來講可能因爲2種緣由,一是共享池過小了,須要增長共享池,另一種狀況是SQL分析過於頻繁,對於共享池的併發訪問量過大。對於任何一種狀況,絕大多數狀況下加大共享池會有助於下降該等待,不過加大共享池的時候也要注意,並不必定全部的狀況下增長共享池都會有明顯的效果。特別是對於第二種狀況,精確的分析十分重要。另外進一步分析,弄清楚哪些ROW CACHE的等待最爲嚴重,有助於解決問題。
row cache lock事件的調優基於每個隊列鎖類型的行爲,常見的隊列鎖類型有: 
-------DC_SEQUENCES:在使用序列的時候將發生該行緩衝隊列鎖。調優方式是檢查序列是否指定了緩衝選項並肯定這個緩衝值能夠承受預期的併發insert操做。
Check for appropriate caching of sequences for the application requirements.
-------DC_USED_EXTENTS和DC_FREE_EXTENTS:該行緩衝隊列鎖可能在空間管理碰到表空間分裂或者沒有足夠區大小時發生。調優方法是檢查表空間是否分裂了、區大小是否過小或者表空間是人工管理。 
-------DC_TABLESPACES:該行緩衝隊列鎖會在分配新區是發生。若是區大小設置得太小,程序將常常申請新區,這將致使衝突。調優方法是快速地增長區的數量。
Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents. 
-------DC_OBJECTS:該行緩衝隊列鎖會在重編譯對象的時候發生。當對象編譯時將申請一個排他鎖阻塞其餘行爲。經過檢查非法對象和依賴關係來調優。 
-------DC_SEGMENTS:該行緩衝隊列鎖會在段分配的時候發生,觀察持有這個隊列鎖的會話在作什麼。
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose. 
-------DC_USERS:Deadlock and resulting 「WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!」 can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
-------DB_ROLLBACK_SEGMENTS:This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.
-------DC_AWR_CONTROL:This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these. 
2)相關視圖字段說明:
ROW CACHE LOCK基礎說明
    P1 – Cache Id
    P2 – Mode Held
    P3 – Mode Requested
mode 和REQUEST的取值:
    KQRMNULL 0 null mode – not locked
    KQRMS 3 share mode
    KQRMX 5 exclusive mode
    KQRMFAIL 10 fail to acquire instance lock
3)SQL查詢
-----查詢row cache lock等待
    select *  from v$session_wait where wait_class = 'row cache lock'; 
    查出p1的值
-----查詢rowcache 名稱
    根據p1的值來查詢
    select * from v$rowcache where cache# = &p1;
其餘:
select event,p1 from v$session a where a.username is not null and a.status='ACTIVE';
4)dba_hist_active_sess_history視圖
dba_hist_active_sess_history視圖記錄了內存中活動會話的歷史信息,而動態性能視圖V$ACTIVE_SESSION_HISTORY則記錄了當前活動會話信息。 經過dba_hist_active_sess_history視圖關聯v$sqlarea和DBA_HIST_SNAPSHOT能夠跟蹤到某個時間段內的sql。固然能跟蹤到的sql多少取決於v$sqlarea,畢竟只有還保留在v$sqlarea中的sql才能跟蹤到。

AWR報告分析
數據庫出現情況是8月30號下午3點到4點,因而拉取了兩份AWR報告,一份是8月30號下午3點到4點,另外一份是31號下午3點到4點,將兩份報告比對着看,不一樣的地方應該就是有問題的地方。
1)發現top 5的前臺事件中,排第一的不是DB CPU。默認正常狀況下,DB CPU會排第一。

2)繼續看Wait Events Statistics ,發現排第一也不是DB CPU,而是connection management call elapsed time,看名字感受和數據庫鏈接有關。

3)繼續比對,發現dc_users的Pct Miss比較大,超過了20%

4)查看可疑的sql語句:

標紅的sql執行速度很慢,有些異常。
SQL> select to_char(sample_time, 'YY-MM-DD HH24:MI:SS') sample_time,
  2         instance_number,
  3         sql_id,
  4         P1,
  5         event,
  6         wait_class
  7    from dba_hist_active_sess_history
  8   where sample_time between
  9         to_date('16-08-29 13:00:00', 'YY-MM-DD HH24:MI:SS') and
10         to_date('16-08-29 13:30:00', 'YY-MM-DD HH24:MI:SS')
11     and sql_id in ('bvtu633rnwrwv',
12                    '4a6uhr508t0p6',
13                    'fpm2zazkfqhy6',
14                    'bhrcaykh5tzsw',
15                    '9fxv1px768bd5')
16   order by 1
17  ;
SAMPLE_TIME       INSTANCE_NUMBER SQL_ID                P1 EVENT                 WAIT_CLASS           
----------------- --------------- ------------- ---------- --------------------- ---------------------
16-08-29 13:00:59               1 4a6uhr508t0p6         10 row cache lock        Concurrency
16-08-29 13:01:09               1 4a6uhr508t0p6         10 row cache lock        Concurrency
16-08-29 13:01:19               1 4a6uhr508t0p6         10 row cache lock        Concurrency
16-08-29 13:01:29               1 4a6uhr508t0p6          7 row cache lock        Concurrency
16-08-29 13:01:39               1 bvtu633rnwrwv         10 row cache lock        Concurrency
16-08-29 13:01:49               1 bvtu633rnwrwv         10 row cache lock        Concurrency
16-08-29 13:01:59               1 bvtu633rnwrwv         10 row cache lock        Concurrency
16-08-29 13:02:10               1 bvtu633rnwrwv          7 row cache lock        Concurrency
16-08-29 13:02:20               1 fpm2zazkfqhy6         10 row cache lock        Concurrency
16-08-29 13:02:30               1 fpm2zazkfqhy6         10 row cache lock        Concurrency
16-08-29 13:02:40               1 fpm2zazkfqhy6         10 row cache lock        Concurrency
16-08-29 13:02:41               2 bvtu633rnwrwv          7 row cache lock        Concurrency
16-08-29 13:02:50               1 fpm2zazkfqhy6          7 row cache lock        Concurrency
16-08-29 13:04:31               1 bhrcaykh5tzsw         10 row cache lock        Concurrency
16-08-29 13:04:41               1 bhrcaykh5tzsw         10 row cache lock        Concurrency
16-08-29 13:04:51               1 bhrcaykh5tzsw          7 row cache lock        Concurrency
16-08-29 13:06:32               1 9fxv1px768bd5         10 row cache lock        Concurrency
16-08-29 13:06:42               1 9fxv1px768bd5         10 row cache lock        Concurrency
18 rows selected
根據上面的sql,發現sql致使的等待事件是row cache lock。再根據P1=7或者10,查出等待事件是發生在具體哪一個類別上
SQL> select cache#,parameter from v$rowcache where cache# in ('7','10');
    CACHE# PARAMETER
---------- --------------------------------
        10 dc_users
         7 dc_users
         7 dc_users
         7 dc_users
根據上面的sql,發現參數類別是dc_users,網上說dc_users是和用戶用錯誤密碼登錄有關:
In 11g there is an intentional delay between allowing failed logon attempts to retry. For some specific application types this can cause a problem as the row cache entry is locked for the duration of the delay . This can lead to excessive row cache lock waits for DC_USERS for specific users / schemas.

After 3 successive failures a sleep delay is introduced starting at 3 seconds and extending to 10 seconds max. During each delay the user X row cache lock is held in exclusive mode preventing any concurrent logon attempt as user X (and preventing any other operation which would need the row cache lock for user X).
驗證一下,看是否在發生row cache lock期間,有用戶登錄失敗。
SQL> select username,
  2         userhost,
  3         to_char(timestamp, 'YY-MM-DD HH24:MI:SS') timestamp,
  4         action_name
  5    from dba_audit_trail
  6   where action_name = 'LOGON'
  7     and priv_used is null
  8     and timestamp between
  9         to_date('16-08-29 13:00:00', 'YY-MM-DD HH24:MI:SS') and
10         to_date('16-08-29 13:30:00', 'YY-MM-DD HH24:MI:SS'); 
USERNAME                       USERHOST              TIMESTAMP         ACTION_NAME
------------------------------ --------------------- ----------------- ----------------------------
MAPP_PLATFORM                  IDC-APP-02            16-08-29 13:03:51 LOGON
MAPP_PLATFORM                  IDC-APP-01            16-08-29 13:03:41 LOGON
MAPP_PLATFORM                  IDC-APP-02            16-08-29 13:03:31 LOGON
MAPP_PLATFORM                  IDC-APP-01            16-08-29 13:03:21 LOGON
MAPP_PLATFORM                  IDC-APP-02            16-08-29 13:03:11 LOGON
MAPP_PLATFORM                  IDC-APP-01            16-08-29 13:03:01 LOGON
MAPP_PLATFORM                  IDC-APP-02            16-08-29 13:02:51 LOGON
MAPP_PLATFORM                  IDC-APP-01            16-08-29 13:02:41 LOGON
MAPP_PLATFORM                  IDC-APP-02            16-08-29 13:02:31 LOGON
。。。。
發現用戶MAPP_PLATFORM從8.14到8.31,一直以錯誤密碼嘗試登錄數據庫,驗證了以錯誤密碼登錄會致使row cache lock同時sql執行慢的現象發生。
 進一步查詢登陸審計信息:
select * from
(
select os_username,userhost,terminal,username,count(*) failures
    from dba_audit_trail
   where returncode = 1017 and timestamp between to_date ('2016-8-29 11:30:00','yyyy-mm-dd hh24:mi:ss') and to_date ('2016-8-29 13:30:00','yyyy-mm-dd hh24:mi:ss')
   group by os_username,userhost,username,terminal
   order by 5 desc );
returncode = 1017 這裏的1017就是oracle內部定義的錯誤返回代碼值。

實際上無論用戶提供的密碼是否正確,Oracle都會爲新的connection分配一個shadow process,該服務進程爲了進一步驗證用戶信息都不得不獲取少許資源;若是以此爲契機,即使在沒法正常登陸的狀況下依然能在短時間內形成實例僵死。
 
 
 

● 本文做者:小麥苗,只專一於數據庫的技術,更注重技術的運用

● 本文在itpub(http://blog.itpub.net/26736162)、博客園(http://www.cnblogs.com/lhrbest)和我的微信公衆號(xiaomaimiaolhr)上有同步更新

● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/

● 本文博客園地址:http://www.cnblogs.com/lhrbest

● 本文pdf版及小麥苗雲盤地址:http://blog.itpub.net/26736162/viewspace-1624453/

● 數據庫筆試面試題庫及解答:http://blog.itpub.net/26736162/viewspace-2134706/

● QQ羣:230161599     微信羣:私聊

● 聯繫我請加QQ好友(646634621),註明添加原因

● 於 2017-05-09 09:00 ~ 2017-05-30 22:00 在魔都完成

● 文章內容來源於小麥苗的學習筆記,部分整理自網絡,如有侵權或不當之處還請諒解

● 版權全部,歡迎分享本文,轉載請保留出處

相關文章
相關標籤/搜索