高併發壓力下致使數據庫bug

環境信息:  linux 6.1 + oracle11.2.0.3 RAC
 
 
問題現象:
學校晚上6點選課,人數大概有3000,7點時,數據庫報錯以下(數據庫到6點多仍是能夠鏈接的),數據庫hung住了。

Tue Dec 16 18:00:33 2014
Dumping diagnostic data in directory=[cdmp_20141216180033], requested by (instance=2, osid=24917 (M001)), summary=[incident=45211].
Tue Dec 16 18:02:23 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:03:23 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:04:54 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:05:55 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:07:26 2014
PMON failed to acquire latch, see PMON dump
Tue Dec 16 18:08:26 2014
PMON failed to acquire latch, see PMON dump
.........................................
.........................................
Tue Dec 16 19:31:32 2014
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x9379D18, kglic0()+1086] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/trace/ywkdb2_m003_28984.trc  (incident=44523):
ORA-07445: exception encountered: core dump [kglic0()+1086] [SIGSEGV] [ADDR:0x18] [PC:0x9379D18] [Address not mapped to object] []
Incident details in:  /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/incident/incdir_44523/ywkdb2_m003_28984_i44523.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

Tue Dec 16 20:00:23 2014
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x18] [PC:0x9379D18, kglic0()+1086] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/trace/ywkdb2_m000_30227.trc  (incident=44811):
ORA-07445: exception encountered: core dump [kglic0()+1086] [SIGSEGV] [ADDR:0x18] [PC:0x9379D18] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/ywkdb/ywkdb2/incident/incdir_44811/ywkdb2_m000_30227_i44811.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.


最後排查,是數據庫該版本的一個bug。
 
詳見:
 
 
 
排查過程:
 
 根據ORA-07445提報的錯誤,排查oracle官方文檔,發如今官方文檔中標明這是一個oracle bug;文檔中說明,在如下條件下可觸發該bug:

  這幾點,目前數據庫狀態都符合:
   1)。目前數據庫版本是11.2.0.3
   2)。報錯提示是ORA-07445: [kglic0()]
   3)。在trc文件中,有kksIterCursorStat提示
           
    
   4)。_kghdsidx_count值>1
         

   5)。查詢select count(*) from v$sql_bind_capture where name like ':SYS%';總數>0

         

   根據官方文檔的處理意見,修改隱含參數"_kghdsidx_count"=1,重啓數據庫服務能夠解決此問題。
      

 另外根據官方文檔中的延伸說明:Bug 12340939  ORA-7445 [kglic0] can occur capturing cursor stats for V$SQLSTATS也可能會觸發該問題,修改隱含參數
"_cursor_stats_enabled"=FALSE,重啓數據庫服務能夠解決此問題。

 
 
修改後,學校次日選課沒有再報錯了。
 
 
案例總結:
 
1.有問題先查MOS
 
2.具體到這個問題,以前沒有出現這個問題,選課時出來了,說明一點:在大數據高併發狀況下,這個bug顯現了。
 
 
致使問題的緣由:
 
During SQL statistics analysis for query execution, we load dependency information into the Library Cache.  These dependency memory pieces can be overwritten by other sessions without causing an error.  When the original session runs into the wrong dependency information it can lead to the internal error and performance issues during subsequent queries of these Library Cache objects.
 
---簡而言之:Library Cache因爲併發壓力,歷史信息找不到了
 
或者
 

During SQL statistics analysis for query execution, we load dependency information into the Library Cache.  These dependency memory pieces can be overwritten by other sessions without causing an error.  When the original session runs into the wrong dependency information it can lead to the internal error and performance issues during subsequent queries of these Library Cache objects.
 
---簡而言之:表的統計信息有問題
相關文章
相關標籤/搜索