Oracle數據遷移後由列的直方圖統計信息引發的執行計劃異常

(一)問題背景數據庫

在使用impdp進行數據導入的時候,每每在導入表和索引的統計信息的時候,速度很是慢,所以我在使用impdp進行導入時,會使用exclude=table_statistics排除表的統計信息,從而加快導入速度,以後再手動收集統計信息。ide

clipboard

                                              圖.impdp導入數據的時導入統計信息速度很是慢oop

導入語句以下:測試

impdp user/password directory=DUMPDIR dumpfile=TEST01.dmp logfile=TEST01.log remap_schema=TEST_USER:TEST_USER123 exclude=table_statistics

手動收集統計信息語句以下:spa

EXEC dbms_stats.gather_table_stats(ownname => 'LIJIAMAN',tabname => 'TEST01');

最近使用以上方法將數據還原到測試環境後,發現與生產環境執行計劃存在誤差,原本應該走全表掃描的,卻走了索引範圍掃描。通過確認,是因爲列的直方圖統計信息未收集引起的執行計劃誤差。3d


(二)列的直方圖統計信息code

什麼是列的直方圖統計信息呢?在Oracle數據庫中,Oracle默認列上的值是在最小值與最大值之間均分佈的,當在計算cardinatity時,會以均勻分佈的方式計算,可是在實際生活中某些場景下數據並不是均勻分佈。舉個列子,某公司有員工10000人,表A的列COL1記錄員工的績效(分別是:A、B、C、D,A最好,D最差),那麼可能A佔了15%,B佔了60,C佔了20%,D佔了5%。很明顯在該場景下數據並不是均勻分佈,假如以均勻分佈的方式去統計員工的績效,可能會致使執行計劃失準。orm

當列的數據分佈不均勻的時候,就須要統計列上的數據分佈狀況,從而走出正確的執行計劃,列的直方圖統計信息就是記錄列上的數據分佈狀況的。blog


(三)異常模擬索引

STEP1:建立測試表test01

create table test01
(id number,
name varchar2(10)
);
create index idx_test01_id on test01(id);

向test01中插入測試數據

begin
insert into test01 values(1,'a');

for i in 1..10 loop
insert into test01 values(2,'b');
end loop;

for i in 1..100 loop
insert into test01 values(3,'c');
end loop;

for i in 1..1000 loop
insert into test01 values(4,'d');
end loop;

commit;
end;

查看數據分佈狀況:

SQL> SELECT ID,NAME,COUNT(*) FROM test01 GROUP BY ID,NAME ORDER BY COUNT(*);

ID          NAME       COUNT(*)
---------- ---------- ----------
1           a          1       
2           b          10
3           c          100
4           d          1000


STEP2:收集統計信息,由於上面查詢過id列,故在收集統計信息的時候,會收集直方圖的統計信息

EXEC dbms_stats.gather_table_stats(ownname => 'LIJIAMAN',tabname => 'TEST01');

查看是否已經收集了直方圖信息,發現id列上已經收集

SQL> SELECT a.OWNER,a.TABLE_NAME,a.COLUMN_NAME,a.LOW_VALUE,a.HIGH_VALUE,a.NUM_BUCKETS,a.HISTOGRAM
2 FROM dba_tab_columns a
3 WHERE a.OWNER = 'LIJIAMAN' AND a.TABLE_NAME = 'TEST01';

OWNER     TABLE_NAME   COLUMN_NAME   LOW_VALUE     HIGH_VALUE    NUM_BUCKETS   HISTOGRAM
--------- -----------  ------------  ------------  ------------  -----------  ---------------
LIJIAMAN  TEST01       ID            C102          C105          4             FREQUENCY
LIJIAMAN  TEST01       NAME          61            64            1             NONE

查看直方圖,已經將id列的4個值放入了4個bucket中:

SQL> SELECT * FROM dba_tab_histograms a WHERE a.OWNER = 'LIJIAMAN' AND a.TABLE_NAME = 'TEST01';

OWNER        TABLE_NAME    COLUMN_NAME    ENDPOINT_NUMBER    ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
-----------  ------------  -------------  ---------------    -------------- ----------------------
LIJIAMAN     TEST01        ID                           1                 1 
LIJIAMAN     TEST01        ID                          11                 2 
LIJIAMAN     TEST01        ID                         111                 3 
LIJIAMAN     TEST01        ID                        1111                 4 
LIJIAMAN     TEST01        NAME                         0    5.036527952778 
LIJIAMAN     TEST01        NAME                         1    5.192296858534


STEP3:查看id=1和id=4的執行計劃,當id=1時,走索引範圍掃描,當id=4時,走全表掃描

id列存在直方圖統計信息,當id=1時,走索引範圍掃描 id列存在直方圖統計信息,當id=4時,走全表掃描
SELECT * FROM test01 WHERE ID=1

 Plan Hash Value  : 1151852672 

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name          | Rows | Bytes | Cost | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |               |    1 |     5 |    2 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID | TEST01        |    1 |     5 |    2 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN           | IDX_TEST01_ID |    1 |       |    1 | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("ID"=1)
SELECT * FROM test01 WHERE ID=4

 Plan Hash Value  : 262542483 

-----------------------------------------------------------------------
 | Id  | Operation           | Name   | Rows | Bytes | Cost | Time     |
-----------------------------------------------------------------------
 |   0 | SELECT STATEMENT    |        | 1000 |  5000 |    3 | 00:00:01 |
 | * 1 |   TABLE ACCESS FULL | TEST01 | 1000 |  5000 |    3 | 00:00:01 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
 ------------------------------------------
 * 1 - filter("ID"=4)


STEP4:接下來模擬數據遷移,排除統計信息

導出表test01

expdp lijiaman/lijiaman directory=DUMPDIR tables=LIJIAMAN.TEST01 dumpfile =test01.dmp

刪除原來的表:

SQL> drop table test01;
Table dropped

再次導入表,排除統計信息:

impdp lijiaman/lijiaman directory=DUMPDIR dumpfile =test01.dmp exclude=table_statistics

查看錶的統計信息,不存在統計信息:

SQL> SELECT   a.OWNER,a.TABLE_NAME,a.COLUMN_NAME,a.LOW_VALUE,a.HIGH_VALUE,a.NUM_BUCKETS,a.HISTOGRAM
   2  FROM     dba_tab_columns a
   3  WHERE    a.OWNER = 'LIJIAMAN' AND a.TABLE_NAME = 'TEST01';

OWNER          TABLE_NAME      COLUMN_NAME     LOW_VALUE    HIGH_VALUE   NUM_BUCKETS HISTOGRAM
 -------------- --------------- --------------- ------------ ------------ ----------- ---------------
 LIJIAMAN       TEST01          ID                                                    NONE
 LIJIAMAN       TEST01          NAME                                                  NONE


STEP5:手動收集統計信息

EXEC dbms_stats.gather_table_stats(ownname => 'LIJIAMAN',tabname => 'TEST01');

發現統計信息已經收集,可是不存在直方圖的統計信息

SQL> SELECT   a.OWNER,a.TABLE_NAME,a.COLUMN_NAME,a.LOW_VALUE,a.HIGH_VALUE,a.NUM_BUCKETS,a.HISTOGRAM
  2  FROM     dba_tab_columns a
  3  WHERE    a.OWNER = 'LIJIAMAN' AND a.TABLE_NAME = 'TEST01';

OWNER     TABLE_NAME  COLUMN_NAME  LOW_VALUE   HIGH_VALUE  NUM_BUCKETS HISTOGRAM
--------- ----------- -----------  ----------- ----------- ----------- ---------------
LIJIAMAN  TEST01      ID           C102        C105                  1 NONE
LIJIAMAN  TEST01      NAME         61          64                    1 NONE


STEP6:再次查看id=1和id=4的執行計劃,當id=1或id=4時,都走索引範圍掃描

id列未收集直方圖統計信息,當id=1時,走索引範圍掃描 id列未收集直方圖統計信息,當id=4時,走索引範圍掃描
SELECT * FROM test01 WHERE ID=1
 Plan Hash Value  : 1151852672 

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name          | Rows | Bytes | Cost | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |               |  278 |  1390 |    2 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID | TEST01        |  278 |  1390 |    2 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN           | IDX_TEST01_ID |  278 |       |    1 | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("ID"=1)
SELECT * FROM test01 WHERE ID=4

 Plan Hash Value  : 1151852672 

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name          | Rows | Bytes | Cost | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |               |  278 |  1390 |    2 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID | TEST01        |  278 |  1390 |    2 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN           | IDX_TEST01_ID |  278 |       |    1 | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("ID"=4)


STEP7:再次收集統計信息,由於使用過了id列做爲查詢條件,故再次收集統計信息時,會收集id列的直方圖信息:

EXEC dbms_stats.gather_table_stats(ownname => 'LIJIAMAN',tabname => 'TEST01');

能夠看到,此時已經收集了id列的直方圖統計信息:

SQL> SELECT   a.OWNER,a.TABLE_NAME,a.COLUMN_NAME,a.LOW_VALUE,a.HIGH_VALUE,a.NUM_BUCKETS,a.HISTOGRAM
  2  FROM     dba_tab_columns a
  3  WHERE    a.OWNER = 'LIJIAMAN' AND a.TABLE_NAME = 'TEST01';

OWNER                          TABLE_NAME                     COLUMN_NAME                    LOW_VALUE     HIGH_VALUE    NUM_BUCKETS HISTOGRAM
------------------------------ ------------------------------ ------------------------------ ------------- ------------- ----------- ---------------
LIJIAMAN                       TEST01                         ID                             C102          C105                    4 FREQUENCY
LIJIAMAN                       TEST01                         NAME                           61            64                      1 NONE

執行計劃已經按照咱們想要的方式走:

id列從新收集直方圖統計信息,當id=1時,走索引範圍掃描 id列從新收集直方圖統計信息,當id=4時,走全表掃描
SELECT * FROM test01 WHERE ID=1

 Plan Hash Value  : 1151852672 

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name          | Rows | Bytes | Cost | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |               |    1 |     5 |    2 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID | TEST01        |    1 |     5 |    2 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN           | IDX_TEST01_ID |    1 |       |    1 | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("ID"=1)
SELECT * FROM test01 WHERE ID=4

 Plan Hash Value  : 262542483 

-----------------------------------------------------------------------
| Id  | Operation           | Name   | Rows | Bytes | Cost | Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        | 1000 |  5000 |    3 | 00:00:01 |
| * 1 |   TABLE ACCESS FULL | TEST01 | 1000 |  5000 |    3 | 00:00:01 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("ID"=4)


(四)總結

在使用expdp/impdp進行導出/導入數據的時,統計信息是很是重要的,對於大部分統計信息,咱們能夠在導入結束以後收集得到。可是對於列的直方圖統計信息,Oracle默認收集的方式是auto,即Oracle會根據用戶對列的使用狀況進行判斷是否收集直方圖統計信息,然而數據剛遷移完成,在表還未使用的狀況下收集統計信息,每每收集不到列的直方圖信息,這就形成了執行計劃異常,這種狀況一般在下一次收集統計信息以後會有所改變。



參考文檔:

DBMS_STATS With METHOD_OPT =>'..SIZE auto' May Not Collect Histograms (Doc ID 557594.1)

相關文章
相關標籤/搜索