環境:RHEL 6.5 + Oracle 11.2.0.4 RAC + ADGsql
起初發現本身的ADG測試環境再也不同步,進一步分析是DATA磁盤組空間耗盡致使的,但是最近在磁盤組上的數據庫都沒有作過什麼測試,且測試磁盤組一直都留有2G+剩餘空間,那是什麼致使忽然沒空間了呢?
通過查詢dba_data_files發現數據文件的確沒有什麼增加,但查詢dba_temp_files發現臨時文件空間增加嚴重。數據庫
sys@DEMO> set lines 180 sys@DEMO> select file_id, file_name, tablespace_name, bytes/1048576 "MB" from dba_temp_files; FILE_ID FILE_NAME TABLESPACE_NAME MB ---------- ------------------------------------------------------- ------------------------------ ---------- 1 +DATA/demo/tempfile/temp.264.1018829761 TEMP 2701 2 +DATA/demo/tempfile/temp_jingyu.258.1018830415 TEMP_JINGYU 30 sys@DEMO>
或者直接從asmcmd中也能夠清楚看到TEMP文件佔用大量空間:網絡
ASMCMD> pwd +data/demo/tempfile ASMCMD> ls -ls Type Redund Striped Time Sys Block_Size Blocks Bytes Space Name TEMPFILE UNPROT COARSE NOV 16 09:00:00 Y 8192 345729 2832211968 2834300928 TEMP.264.1018829761 TEMPFILE UNPROT COARSE NOV 04 10:00:00 Y 8192 3841 31465472 32505856 TEMP_JINGYU.258.1018830415 ASMCMD> du Used_MB Mirror_used_MB 2734 2734
基本已經確認了就是temp文件佔用了空間,致使DATA磁盤組空間耗盡,那麼是誰使用了臨時表空間呢?根據DG不一樣步的時間點初步定位是在11-16號這天,咱們能夠直接根據DBA_HIST_ACTIVE_SESS_HISTORY中的TEMP_SPACE_ALLOCATED字段進一步定位:session
DBA_HIST_ACTIVE_SESS_HISTORY
TEMP_SPACE_ALLOCATED
Amount of TEMP memory (in bytes) consumed by this session at the time this sample was takenapp
select instance_number, to_char(sample_time,'mm-dd hh24:mi'), sum(TEMP_SPACE_ALLOCATED) from dba_hist_active_sess_history where sample_time > to_date('11-16 00:00','mm-dd hh24:mi') and sample_time < to_date('11-17 00:00','mm-dd hh24:mi') group by instance_number, to_char(sample_time,'mm-dd hh24:mi') order by 1, 2; 省略部分輸出.. 2 11-16 08:43 2 11-16 08:44 2 11-16 08:50 574619648 2 11-16 08:51 1743781888 2 11-16 08:52 3379560448 2 11-16 08:53 5015339008 2 11-16 08:54 6727663616 2 11-16 08:55 8554283008 2 11-16 08:56 1.0105E+10 2 11-16 08:57 1.1897E+10 2 11-16 08:58 1.3597E+10 2 11-16 08:59 1.4906E+10 2 11-16 09:00 2787115008 2 11-16 09:02 2 11-16 09:04
進一步細化時間,秒級別(注意DBA_HIST_ACTIVE_SESS_HISTORY默認採樣數據間隔爲10s)測試
select instance_number, to_char(sample_time,'mm-dd hh24:mi:ss'), sum(TEMP_SPACE_ALLOCATED) from dba_hist_active_sess_history where sample_time > to_date('11-16 08:43','mm-dd hh24:mi') and sample_time < to_date('11-16 09:02','mm-dd hh24:mi') group by instance_number, to_char(sample_time,'mm-dd hh24:mi:ss') order by 1, 2; INSTANCE_NUMBER TO_CHAR(SAMPLE SUM(TEMP_SPACE_ALLOCATED) --------------- -------------- ------------------------- 省略部分輸出.. 2 11-16 08:44:29 2 11-16 08:50:29 35651584 2 11-16 08:50:39 134217728 2 11-16 08:50:49 182452224 2 11-16 08:50:59 222298112 2 11-16 08:51:10 266338304 2 11-16 08:51:20 318767104 2 11-16 08:51:30 359661568 2 11-16 08:51:40 386924544 2 11-16 08:51:50 412090368 2 11-16 08:52:00 445644800 2 11-16 08:52:11 492830720 2 11-16 08:52:21 545259520 2 11-16 08:52:31 591396864 2 11-16 08:52:41 628097024 2 11-16 08:52:51 676331520 2 11-16 08:53:01 723517440 2 11-16 08:53:11 771751936 2 11-16 08:53:21 819986432 2 11-16 08:53:31 866123776 2 11-16 08:53:41 897581056 2 11-16 08:53:51 936378368 2 11-16 08:54:01 991952896 2 11-16 08:54:11 1048576000 2 11-16 08:54:21 1099956224 2 11-16 08:54:31 1145044992 2 11-16 08:54:41 1194328064 2 11-16 08:54:51 1247805440 2 11-16 08:55:01 1301282816 2 11-16 08:55:12 1354760192 2 11-16 08:55:22 1401946112 2 11-16 08:55:32 1449132032 2 11-16 08:55:42 1502609408 2 11-16 08:55:52 1544552448 2 11-16 08:56:02 1572864000 2 11-16 08:56:12 1611661312 2 11-16 08:56:22 1657798656 2 11-16 08:56:32 1704984576 2 11-16 08:56:42 1754267648 2 11-16 08:56:52 1803550720 2 11-16 08:57:02 1853882368 2 11-16 08:57:12 1906311168 2 11-16 08:57:22 1957691392 2 11-16 08:57:32 2012217344 2 11-16 08:57:42 2058354688 2 11-16 08:57:52 2108686336 2 11-16 08:58:02 2153775104 2 11-16 08:58:12 2203058176 2 11-16 08:58:23 2253389824 2 11-16 08:58:33 2298478592 2 11-16 08:58:43 2332033024 2 11-16 08:58:53 2356150272 2 11-16 08:59:03 2383413248 2 11-16 08:59:13 2403336192 2 11-16 08:59:24 2443182080 2 11-16 08:59:34 2487222272 2 11-16 08:59:44 2550136832 2 11-16 08:59:54 2638217216 2 11-16 09:00:04 2786066432 2 11-16 09:00:14 1048576 2 11-16 09:00:24 81 rows selected.
實例2從11-16 08:50:29 開始,到 09:00:04 結束,temp增加到2G+(2786066432),那麼具體是哪些SQL消耗的呢?
上面的查詢能夠直接加入sql_id字段定位,發現都是同一個SQL致使的:this
select instance_number, to_char(sample_time,'mm-dd hh24:mi:ss'), sql_id, sum(TEMP_SPACE_ALLOCATED) from dba_hist_active_sess_history where sample_time > to_date('11-16 08:43','mm-dd hh24:mi') and sample_time < to_date('11-16 09:02','mm-dd hh24:mi') group by instance_number, to_char(sample_time,'mm-dd hh24:mi:ss'), sql_id order by 1, 2; 2 11-16 08:59:24 auyf8px9ywc6j 2443182080 2 11-16 08:59:34 auyf8px9ywc6j 2487222272 2 11-16 08:59:44 auyf8px9ywc6j 2550136832 2 11-16 08:59:54 auyf8px9ywc6j 2638217216 2 11-16 09:00:04 auyf8px9ywc6j 2786066432
SQL_ID auyf8px9ywc6j 對應的文本爲:spa
select sql_text from dba_hist_sqltext where sql_id = 'auyf8px9ywc6j'; SQL_TEXT -------------------------------------------------------------------------------- WITH SNAP_RANGES AS (SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER ,SN.STAR TUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP ,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS MIN_DATE ,MAX(CAST(END_I NTERVAL_TIME AS DATE)) AS MAX_DATE FROM DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST WHERE SN.BEGIN_INTERVAL_TIME > TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRU NC(SYSDATE) AND SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROU P BY SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME ) ,DELTA _DATA AS (SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE WHEN SR.STARTUP _TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE + (TM2.VALUE - TM1.VALU E) ELSE (TM2.VALUE - TM1.VALUE) END AS DELTA_TIME FROM WRH$_SYS_TIME_MODEL TM1 , WRH$_SYS_TIME_MODEL TM2 ,SNAP_RANGES SR WHERE TM1.DBID = SR.DBID AND TM1.INSTANC E_NUMBER = SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID = SR .STAT_ID AND TM2.DBID = SR.DBID AND TM2.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM2.SNAP_ID = SR.MAX_SNAP AND TM2.STAT_ID = SR.STAT_ID ) SELECT STAT_NAME ,ROUN D(SUM(DELTA_TIME/1000000),2) AS SECS FROM DELTA_DATA GROUP BY STAT_NAME
SQL執行計劃爲:日誌
set lines 1000 pages 1000 select * from table(dbms_xplan.display_awr('auyf8px9ywc6j')); Plan hash value: 295135324 ------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 13 (100)| | | | | 1 | HASH GROUP BY | | 1 | 87 | 13 (8)| 00:00:01 | | | | 2 | VIEW | VM_NWVW_1 | 1 | 87 | 13 (8)| 00:00:01 | | | | 3 | FILTER | | | | | | | | | 4 | HASH GROUP BY | | 1 | 247 | 13 (8)| 00:00:01 | | | | 5 | HASH JOIN | | 1 | 247 | 12 (0)| 00:00:01 | | | | 6 | NESTED LOOPS | | 1 | 204 | 7 (0)| 00:00:01 | | | | 7 | NESTED LOOPS | | 1 | 204 | 7 (0)| 00:00:01 | | | | 8 | NESTED LOOPS | | 1 | 127 | 7 (0)| 00:00:01 | | | | 9 | TABLE ACCESS FULL | WRM$_SNAPSHOT | 1 | 50 | 7 (0)| 00:00:01 | | | | 10 | PARTITION RANGE ITERATOR | | 1 | 77 | 0 (0)| | KEY | KEY | | 11 | TABLE ACCESS BY LOCAL INDEX ROWID| WRH$_SYS_TIME_MODEL | 1 | 77 | 0 (0)| | KEY | KEY | | 12 | INDEX RANGE SCAN | WRH$_SYS_TIME_MODEL_PK | 1 | | 0 (0)| | KEY | KEY | | 13 | PARTITION RANGE ITERATOR | | 1 | | 0 (0)| | KEY | KEY | | 14 | INDEX RANGE SCAN | WRH$_SYS_TIME_MODEL_PK | 1 | | 0 (0)| | KEY | KEY | | 15 | TABLE ACCESS BY LOCAL INDEX ROWID | WRH$_SYS_TIME_MODEL | 1 | 77 | 0 (0)| | 1 | 1 | | 16 | TABLE ACCESS FULL | WRH$_STAT_NAME | 2 | 86 | 5 (0)| 00:00:01 | | | ------------------------------------------------------------------------------------------------------------------------------------- 45 rows selected.
若是進一步獲取SQL的awr報告還能夠看到SQL執行的統計信息:code
Stat Name Statement Total Per Execution % Snap Total Elapsed Time (ms) 582,925 582,924.68 1.84 CPU Time (ms) 64,811 64,811.15 5.89 Executions 1 Buffer Gets 2,472,595 2,472,595.00 8.88 Disk Reads 1 1.00 0.00 Parse Calls 1 1.00 0.00 Rows 0 0.00 User I/O Wait Time (ms) 65,348 Cluster Wait Time (ms) 17 Application Wait Time (ms) 0 Concurrency Wait Time (ms) 1 Invalidations 0 Version Count 1 Sharable Mem(KB) 64
查到這裏其實已經確認這個事情並非人爲,由於若是拿這個SQL_ID去網絡搜索,會發現有人提到過這個SQL_ID,說這是由MMON發起的SQL
It looks like something made by a DBA, but it comes from the MMON.
由於只是臨時文件,且目前沒有被持續使用到,又是測試環境,能夠直接按照測試需求resize爲較小值,而後爲避免這樣的事情,再關閉其自動擴展的功能:
sys@DEMO> alter database tempfile 1 resize 500M; sys@DEMO> alter database tempfile 1 autoextend off;
修改完再次查詢:
sys@DEMO> select file_id, file_name, tablespace_name, bytes/1048576 "MB" from dba_temp_files; FILE_ID FILE_NAME TABLESPACE_NAME MB ---------- ------------------------------------------------------- ------------------------------ ---------- 1 +DATA/demo/tempfile/temp.264.1018829761 TEMP 500 2 +DATA/demo/tempfile/temp_jingyu.258.1018830415 TEMP_JINGYU 30
DG備庫啓動應用,觀察確認恢復正常同步:
sys@ORCL> recover managed standby database using current logfile disconnect; Media recovery complete. sys@ORCL> select * from v$dataguard_stats; NAME VALUE UNIT TIME_COMPUTED DATUM_TIME -------------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ transport lag +00 00:00:00 day(2) to second(0) interval 11/18/2019 23:10:55 11/18/2019 23:10:54 apply lag +00 00:00:00 day(2) to second(0) interval 11/18/2019 23:10:55 11/18/2019 23:10:54 apply finish time +00 00:00:00.000 day(2) to second(3) interval 11/18/2019 23:10:55 estimated startup time 24 second 11/18/2019 23:10:55
通常確認時間正確,各指標的value都爲0便可。若不放心能夠再手工去主庫切換幾回日誌,看下備庫的同步表現。