♣ 程序員
題目 部分面試
在Oracle中,什麼是多列統計信息(Extended Statistics)?sql
♣ 數據庫
答案部分
express
Oracle優化器對於基數值的估算是否準確關係到可否生成最優的執行計劃,而基數值估算的準確性又取決於SQL中各個對象的統計信息是否完整、是否能真實反映出對象的數據分佈狀況。所以使用何種方法收集統計信息是頗有講究的:對於數據傾斜度較大的表須要收集直方圖,在此基礎上若是有多個列存在相關性,那麼多列統計信息(也叫擴展統計信息)收集又是一個更好的選擇。微信
在通常狀況下,SQL語句的WHERE子句後面針對單張表都有多個條件,也就是根據多列的條件篩選獲得數據。默認狀況下,Oracle會把多列的選擇率(Selectivity)相乘從而獲得WHERE語句的選擇率,可是這樣有可能形成選擇率不許確,從而致使優化器作出錯誤的判斷。爲了可以讓優化器作出準確的判斷,從而生成準確的執行計劃,Oracle在11g數據庫中引入了收集多列統計信息。多列統計信息包含列組統計信息(Column Group Statistics)和表達式的統計信息(Expression Statistics)。網絡
使用程序包DBMS_STATS中的新函數CREATE_EXTENDED_STATS建立一個虛擬列,而後對錶收集統計信息。以下所示,定義了兩個擴展列:ide
1SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', 2 TABNAME => 'T', 3 EXTENSION => '(UPPER(PAD))'), 4 DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', 5 TABNAME => 'T', 6 EXTENSION => '(VAL2,VAL3)') 7 FROM DUAL;
以上SQL是對TEST用戶下的T表,分別基於表達式和基於多列建立虛擬列,下次再收集表的統計信息時,將會自動收集到多列統計信息。須要注意的是,不能對SYS用戶下的表建立擴展的統計信息,不然會報錯「ORA-20000: Unable to create extension: not supported for SYS owned table」。函數
使用Oracle自帶的DBMS_STATS包提供的存儲過程DROP_EXTENDED_STATS來刪除擴展統計信息:學習
1EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(UPPER(PAD))'); 2EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(VAL2,VAL3)');
定義擴展統計信息也能夠直接在包DBMS_STATS中指定METHOD_OPT,收集統計信息時,把列組合做爲單獨列使用,以下所示:
1BEGIN 2 DBMS_STATS.GATHER_TABLE_STATS ( 3 OWNNAME => 'SCOTT', 4 TABNAME => 'BOOKS', 5 ESTIMATE_PERCENT=> 100, 6 METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS (HOTEL_ID,RATE_CATEGORY)', 7 CASCADE => TRUE 8 ); 9END;
在視圖DBA_STAT_EXTENSIONS中,能夠看到在數據庫中定義的擴展統計信息:
1SQL> SELECT EXTENSION_NAME, EXTENSION 2 2 FROM DBA_STAT_EXTENSIONS 3 3 WHERE TABLE_NAME='BOOKS'; 4EXTENSION_NAME EXTENSION 5------------------------------ ------------------------------ 6SYS_STUW3MXAI1XLZHCHDYKJ9E4K90 ("HOTEL_ID","RATE_CATEGORY")
當不清楚須要建立哪些列的擴展統計信息時,能夠針對一個表,基於特定的工做負荷,經過使用DBMS_STATS.SEED_COL_USAGE和REPORT_COL_USAGE來肯定須要哪些列組。須要注意的是,這種技術不適用於包含表達式列的統計工做。主要過程以下所示:
1EXEC DBMS_STATS.SEED_COL_USAGE(NULL,NULL,TIME_LIMIT=>100); 2EXPLAIN PLAN FOR SQL語句; 3SELECT DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; 4SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL;
多列統計信息的一個使用示例以下所示:
首先,建立測試表:
1DROP TABLE T_ES_20170601_LHR; 2CREATE TABLE T_ES_20170601_LHR (C1 NUMBER,C2 VARCHAR2(2),C3 VARCHAR2(20)); 3DECLARE 4BEGIN 5 FOR I IN 1 .. 5000 LOOP 6 INSERT INTO T_ES_20170601_LHR VALUES (1, 'AA', DBMS_RANDOM.STRING('l', 20)); 7 INSERT INTO T_ES_20170601_LHR VALUES (2, 'BB', DBMS_RANDOM.STRING('l', 20)); 8 INSERT INTO T_ES_20170601_LHR VALUES (3, 'CC', DBMS_RANDOM.STRING('l', 20)); 9 INSERT INTO T_ES_20170601_LHR VALUES (4, 'DD', DBMS_RANDOM.STRING('l', 20)); 10 END LOOP; 11 COMMIT; 12 END; 13 / 14INSERT INTO T_ES_20170601_LHR VALUES(11,'A','AAAAAAA'); 15INSERT INTO T_ES_20170601_LHR VALUES(22,'B','BBBBBBB'); 16INSERT INTO T_ES_20170601_LHR VALUES(33,'C','CCCCCCC'); 17INSERT INTO T_ES_20170601_LHR VALUES(44,'D','DDDDDDD'); 18COMMIT;
數據分佈以下所示:
1LHR@orclasm > SELECT COUNT(1) FROM T_ES_20170601_LHR; 2 COUNT(1) 3---------- 4 20004 5LHR@orclasm > SELECT C1,C2,COUNT(1) FROM T_ES_20170601_LHR GROUP BY C1,C2 ORDER BY C1; 6 C1 C2 COUNT(1) 7---------- -- ---------- 8 1 AA 5000 9 2 BB 5000 10 3 CC 5000 11 4 DD 5000 12 11 A 1 13 22 B 1 14 33 C 1 15 44 D 1 168 rows selected.
接下來收集T_ES_20170601_LHR表的統計信息,但不收集直方圖的信息(收集前確認默認的ESTIMATE_PERCENT爲AUTO_SAMPLE_SIZE):
1LHR@orclasm > SELECT DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) FROM DUAL; 2DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) 3----------------------------------- 4DBMS_STATS.AUTO_SAMPLE_SIZE 5 6LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1'); 7PL/SQL procedure successfully completed. 8 9LHR@orclasm > SET LINESIZE 200 10LHR@orclasm > SELECT OWNER,TABLE_NAME,NUM_DISTINCT,SAMPLE_SIZE,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE OWNER='LHR' AND TABLE_NAME='T_ES_20170601_LHR'; 11OWNER TABLE_NAME NUM_DISTINCT SAMPLE_SIZE COLUMN_NAME HISTOGRAM 12------------------------------ ------------------------------ ------------ ----------- ------------------------------ --------------- 13LHR T_ES_20170601_LHR 8 20004 C1 NONE 14LHR T_ES_20170601_LHR 8 20004 C2 NONE 15LHR T_ES_20170601_LHR 20004 20004 C3 NONE
下面分別執行以下2條SQL語句,而後查看預估行數:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';
SELECT * FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A';
1LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 2 COUNT(*) 3---------- 4 5000 5LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 6Explained. 7LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); 8PLAN_TABLE_OUTPUT 9-------------------------------------------------------------------------------- 10Plan hash value: 3668985715 11---------------------------------------------------------------------------------------- 12| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 13---------------------------------------------------------------------------------------- 14| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 15| 1 | SORT AGGREGATE | | 1 | 6 | | | 16|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | 17---------------------------------------------------------------------------------------- 18Predicate Information (identified by operation id): 19--------------------------------------------------- 20 2 - filter("C1"=1 AND "C2"='AA') 21LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; 22 COUNT(*) 23---------- 24 1 25LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; 26Explained. 27LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); 28PLAN_TABLE_OUTPUT 29-------------------------------------------------------------------------------- 30Plan hash value: 3668985715 31---------------------------------------------------------------------------------------- 32| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 33---------------------------------------------------------------------------------------- 34| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 35| 1 | SORT AGGREGATE | | 1 | 6 | | | 36|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | 37---------------------------------------------------------------------------------------- 38Predicate Information (identified by operation id): 39--------------------------------------------------- 40 2 - filter("C1"=11 AND "C2"='A')
能夠看到有以下的結果:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';--實際返回5000條,預估313條
SELECT * FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A';--實際返回1條,預估313條
在上面的兩個查詢中Cardinality的計算方法爲:ROUND(NUM_ROWS*(1/NUM_DISTINCT_C1)*(1/NUM_DISTINCT_C2))=ROUND(20004*(1/8)*(1/8))=313,和執行計劃裏的313相吻合,由於沒有收集列的直方圖信息,因此優化器估算返回行數和實際返回行數仍是有很多差距。
下面對C一、C2列收集直方圖後從新執行上面兩個查詢:
1LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS C1 SIZE SKEWONLY,C2 SIZE SKEWONLY'); 2 3PL/SQL procedure successfully completed. 4 5LHR@orclasm > SELECT OWNER,TABLE_NAME,NUM_DISTINCT,DENSITY,NUM_BUCKETS,SAMPLE_SIZE,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE OWNER='LHR' AND TABLE_NAME='T_ES_20170601_LHR'; 6 7OWNER TABLE_NAME NUM_DISTINCT DENSITY NUM_BUCKETS SAMPLE_SIZE COLUMN_NAME HISTOGRAM 8------- ------------------ ------------ ---------- ----------- ----------- ------------ --------------- 9LHR T_ES_20170601_LHR 8 .000024995 8 20004 C1 FREQUENCY 10LHR T_ES_20170601_LHR 8 .000024995 8 20004 C2 FREQUENCY 11LHR T_ES_20170601_LHR 20004 .00004999 1 20004 C3 NONE
對於C一、C2列DENSITY值的計算:1/(NUM_ROWS*2)=1/(20004*2)=0.000024995
對於c2列由於沒有直方圖,density值是這樣計算出來的:1/num_distinct_c3=0.000050155
1LHR@orclasm > COL COLUMN_NAME FORMAT A30 2LHR@orclasm > COL ENDPOINT_ACTUAL_VALUE FORMAT A50 3LHR@orclasm > SET LINESIZE 170 4LHR@orclasm > SET PAGESIZE 100 5LHR@orclasm > SELECT OWNER,TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ES_20170601_LHR'; 6 7OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE 8------------------------------ ------------------------------ ------------------------------ --------------- -------------- 9LHR T_ES_20170601_LHR C1 5000 1 10LHR T_ES_20170601_LHR C1 10000 2 11LHR T_ES_20170601_LHR C1 15000 3 12LHR T_ES_20170601_LHR C1 20000 4 13LHR T_ES_20170601_LHR C1 20001 11 14LHR T_ES_20170601_LHR C1 20002 22 15LHR T_ES_20170601_LHR C1 20003 33 16LHR T_ES_20170601_LHR C1 20004 44 17LHR T_ES_20170601_LHR C2 1 3.3750E+35 18LHR T_ES_20170601_LHR C2 5001 3.3882E+35 19LHR T_ES_20170601_LHR C2 5002 3.4269E+35 20LHR T_ES_20170601_LHR C2 10002 3.4403E+35 21LHR T_ES_20170601_LHR C2 10003 3.4788E+35 22LHR T_ES_20170601_LHR C2 15003 3.4924E+35 23LHR T_ES_20170601_LHR C2 15004 3.5308E+35 24LHR T_ES_20170601_LHR C2 20004 3.5446E+35 25LHR T_ES_20170601_LHR C3 0 3.3882E+35 26LHR T_ES_20170601_LHR C3 1 6.3594E+35 27 2818 rows selected.
「C1=1 AND C2='AA'」做爲PREDICATE執行查詢,看下此次是否CARDINALITY值會更加接近真實返回值:
1LHR@orclasm > explain plan for select count(*) from T_ES_20170601_LHR where c1=1 and c2='AA'; 2Explained. 3LHR@orclasm > select * from table(dbms_xplan.display()); 4PLAN_TABLE_OUTPUT 5------------------------------------------------------------- 6Plan hash value: 3668985715 7---------------------------------------------------------------------------------------- 8| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9---------------------------------------------------------------------------------------- 10| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 11| 1 | SORT AGGREGATE | | 1 | 6 | | | 12|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 1250 | 7500 | 27 (0)| 00:00:01 | 13---------------------------------------------------------------------------------------- 14Predicate Information (identified by operation id): 15--------------------------------------------------- 16 2 - filter("C1"=1 AND "C2"='AA')
執行計劃裏的Rows預估方法爲:ROUND(NUM_ROWS*(5000/20004)*(5000/20004))=ROUND(20004*0.0624)=1250,相比未收集直方圖以前的313更接近於真實值5000,可見有了直方圖以後的估算更加準確了。
C1=11 AND C2='A'做爲PREDICATE執行查詢,看下此次是否CARDINALITY值會更加接近真實返回值:
1LHR@orclasm > explain plan for select count(*) from T_ES_20170601_LHR where c1=11 and c2='A'; 2Explained. 3LHR@orclasm > select * from table(dbms_xplan.display()); 4PLAN_TABLE_OUTPUT 5------------------------------------------------------------------- 6Plan hash value: 3668985715 7---------------------------------------------------------------------------------------- 8| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9---------------------------------------------------------------------------------------- 10| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 11| 1 | SORT AGGREGATE | | 1 | 6 | | | 12|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 1 | 6 | 27 (0)| 00:00:01 | 13---------------------------------------------------------------------------------------- 14Predicate Information (identified by operation id): 15--------------------------------------------------- 16 2 - filter("C1"=11 AND "C2"='A')
執行計劃裏的Rows預估方法爲:NUM_ROWS*(1/20004)*(1/20004)=0.00005,近似取值爲1。
可見在收集了直方圖後的Cardinality值比沒有直方圖的狀況雖然更接近真實值,但仍是有很多差距,下面收集多列統計信息。多列統計信息能夠根據列與列之間的相關性將相關程度高的幾列劃入Column Group,以後的統計信息就是基於這個Column Group進行收集。本例T_ES_20170601_LHR表裏的C一、C2兩個字段就具備必定的相關性,例如C1=1的字段只和C2='AA'的字段組合成一行,C1=1的字段不會和除了C2='AA'之外的值組合成一行,這就是C一、C2之間存在明顯的相關性,因此C1和C2能夠構成一個COLUMN GROUP來造成更精確的統計信息,對Column Group收集統計信息的方法有兩種:
一、採納系統檢測工做負載後給出的建議值後收集統計,若是DBA對錶裏數據構成狀況及表中哪些列具備相關性事先不知道的狀況下能夠採用這種方法,Oracle會根據當前的負載給出哪些表裏的哪幾個列之間存在相關性的建議,DBA若是採納這個建議就能夠在這幾個列上建立出Column Group。
二、手動建立Column Group後再收集統計信息,對錶中具備相關性的列心知肚明,就可使用手動建立的方法。
下面簡要介紹一下這兩種方法:
方法1:採納系統檢測工做負載後給出的建議值來生成column group
這個方法裏又有兩種選擇,既可讓Oracle針對特定的SQL語句來評估是否有建立Column Groups的必要,也能夠從sql cursor cache、auto workload repository等已經生成的負載裏兜取已經執行過的SQL語句來評估是否能夠建立column groups。能夠針對一個表,基於特定的工做負荷,經過使用DBMS_STATS.SEED_COL_USAGE和REPORT_COL_USAGE來肯定須要哪些列組。當不清楚須要建立哪些列的擴展統計信息時,這個技術是很是有用的。須要注意的是,這種技術不適用於包含表達式列的統計工做。
針對「SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'」讓Oracle生成建立Column Group的建議。
1LHR@orclasm > EXEC DBMS_STATS.SEED_COL_USAGE(NULL,NULL,TIME_LIMIT=>100); 2PL/SQL procedure successfully completed. 3LHR@orclasm > EXPLAIN PLAN FOR SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 4Explained. 5LHR@orclasm > SET LONG 20000 6LHR@orclasm > SET PAGESIZE 100 7LHR@orclasm > SELECT DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; 8DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') 9-------------------------------------------------------------------------------- 10LEGEND: 11....... 12 13EQ : Used in single table EQuality predicate 14RANGE : Used in single table RANGE predicate 15LIKE : Used in single table LIKE predicate 16NULL : Used in single table is (not) NULL predicate 17EQ_JOIN : Used in EQuality JOIN predicate 18NONEQ_JOIN : Used in NON EQuality JOIN predicate 19FILTER : Used in single table FILTER predicate 20JOIN : Used in JOIN predicate 21GROUP_BY : Used in GROUP BY expression 22............................................................................... 23 24############################################################################### 25 26COLUMN USAGE REPORT FOR LHR.T_ES_20170601_LHR 27............................................. 28 291. C1 : EQ 302. C2 : EQ 313. (C1, C2) : FILTER 32###############################################################################
根據上面(C1, C2):filter的建議,生成Column Group:
1LHR@orclasm > SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; 2 3DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') 4-------------------------------------------------------------------------------- 5############################################################################### 6 7EXTENSIONS FOR LHR.T_ES_20170601_LHR 8.................................... 9 101. (C1, C2) : SYS_STUF3GLKIOP5F4B0BTTCFTMX0W created 11###############################################################################
DBA_STAT_EXTENSIONS查詢Column Group信息:
1LHR@orclasm > COL EXTENSION FORMAT A50 2LHR@orclasm > SET LINESIZE 170 3LHR@orclasm > SELECT * FROM DBA_STAT_EXTENSIONS WHERE TABLE_NAME='T_ES_20170601_LHR'; 4OWNER TABLE_NAME EXTENSION_NAME EXTENSION CREATOR DROPPABLE 5------ ------------------- ------------------------------ --------------- ------- ----------- 6LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2") USER YES
「SYS_STUF3GLKIOP5F4B0BTTCFTMX0W」是系統爲Column Group自動生成的名稱,能夠把它看做表中的一個列,針對「SYS_STUF3GLKIOP5F4B0BTTCFTMX0W」列生成統計信息:
1LHR@orclasm > SET LINESIZE 170 2LHR@orclasm > COL EXTENSION FORMAT A15 3LHR@orclasm > SELECT T1.OWNER,T1.TABLE_NAME,T1.COLUMN_NAME,T2.EXTENSION,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM FROM DBA_TAB_COL_STATISTICS T1,DBA_STAT_EXTENSIONS T2 WHERE T1.OWNER='LHR' AND T1.TABLE_NAME='T_ES_20170601_LHR' AND T1.OWNER=T2.OWNER AND T1.TABLE_NAME=T2.TABLE_NAME AND T1.COLUMN_NAME=T2.EXTENSION_NAME; 4 5no rows selected 6 7LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS SYS_STUF3GLKIOP5F4B0BTTCFTMX0W SIZE SKEWONLY'); 8 9PL/SQL procedure successfully completed. 10 11LHR@orclasm > SELECT T1.OWNER,T1.TABLE_NAME,T1.COLUMN_NAME,T2.EXTENSION,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM FROM DBA_TAB_COL_STATISTICS T1,DBA_STAT_EXTENSIONS T2 WHERE T1.OWNER='LHR' AND T1.TABLE_NAME='T_ES_20170601_LHR' AND T1.OWNER=T2.OWNER AND T1.TABLE_NAME=T2.TABLE_NAME AND T1.COLUMN_NAME=T2.EXTENSION_NAME; 12 13OWNER TABLE_NAME COLUMN_NAME EXTENSION NUM_DISTINCT SAMPLE_SIZE HISTOGRAM 14------- ------------------- ------------------------------ --------------- ------------ ----------- --------------- 15LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2") 8 20004 FREQUENCY
能夠看到已經爲SYS_STUF3GLKIOP5F4B0BTTCFTMX0W生成了統計信息,這個統計就是多列統計(Multicolumns Statistics)或者列組統計(Column Group Statistics)
方法2:手動建立Column Group,手動建立Column Group後再經過DBMS_STATS.GATHER_TABLE_STATS收集統計
1SELECT DBMS_STATS.CREATE_EXTENDED_STATS(ownname=>'LHR',tabname=>'T_ES_20170601_LHR',extension=>'(c1,c2)') 2FROM DUAL; 3DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',EXTENSION=>'(C1,C2)') 4------------------------------------------------------------------- 5SYS_STU3RTXGYOX7NS$MIUDXQDMQ0C 6EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS SYS_STU3RTXGYOX7NS$MIUDXQDMQ0C SIZE SKEWONLY');)
或者一步到位,直接對C一、C2列執行統計信息收集,同時也會生成Column Group
1EXEC DBMS_STATS.gather_table_stats('LHR','T_ES_20170601_LHR',method_opt=>'for columns (c1,c2) size skewonly');
先來看看對於表明組合列(c1,c2)的SYS_STUF3GLKIOP5F4B0BTTCFTMX0W列在DBA_TAB_HISTOGRAM裏的數據分佈狀況
1LHR@orclasm > COL COLUMN_NAME FORMAT A30 2LHR@orclasm > COL ENDPOINT_ACTUAL_VALUE FORMAT A50 3LHR@orclasm > SET LINESIZE 170 4LHR@orclasm > SET PAGESIZE 100 5LHR@orclasm > SELECT OWNER,TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ES_20170601_LHR' AND COLUMN_NAME='SYS_STUF3GLKIOP5F4B0BTTCFTMX0W'; 6 7OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE 8------------------------------ ------------------------------ ------------------------------ --------------- -------------- 9LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 1 716089956 10LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 5001 2693090364 11LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 5002 3718690277 12LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 10002 3926166024 13LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 10003 5232674306 14LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 15003 5561960012 15LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 20003 5832235708 16LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 20004 6322890850 17 188 rows selected.
預測一下有了基於(c一、c2)的Column Groups後,SQL語句「SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';」的Cardinality返回值會變成:
Cardinality=NUM_ROWS*5000/20004=20004*5000/20004=5000
生成了Column Group Statistics以後再次執行一開始的那句SQL:「SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';」,看看是否能幫助優化器算出更精確的Cardinality:
1LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 2Explained. 3LHR@orclasm > SET LINESIZE 150 4LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); 5PLAN_TABLE_OUTPUT 6----------------------------------------------------------------------------------------------- 7Plan hash value: 3668985715 8---------------------------------------------------------------------------------------- 9| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 10---------------------------------------------------------------------------------------- 11| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 12| 1 | SORT AGGREGATE | | 1 | 6 | | | 13|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 5000 | 30000 | 27 (0)| 00:00:01 | 14---------------------------------------------------------------------------------------- 15Predicate Information (identified by operation id): 16--------------------------------------------------- 17 2 - filter("C1"=1 AND "C2"='AA')
總結:若是表中的數據傾斜度較大,那麼收集直方圖能最大程度的幫助優化器計算出準確的Cardinality,從而避免產生差的執行計劃;再進一步,若是存在傾斜的多個列共同構成了Predicate裏的等值鏈接且這些列間存在較強的列相關性的話,那麼生成帶有直方圖的多列統計信息是一個上佳的選擇,可以最大程度的幫助優化器準確預測出Cardinality。
& 說明:
有關多列統計信息的更多內容能夠參考個人BLOG:http://blog.itpub.net/26736162/viewspace-2139297/
本文選自《Oracle程序員面試筆試寶典》,做者:小麥苗
詳細內容能夠添加麥老師微信或QQ私聊。
About Me:小麥苗
● 本文做者:小麥苗,只專一於數據庫的技術,更注重技術的運用
● 做者博客地址:http://blog.itpub.net/26736162/abstract/1/
● 本系列題目來源於做者的學習筆記,部分整理自網絡,如有侵權或不當之處還請諒解
● 版權全部,歡迎分享本文,轉載請保留出處
● QQ:646634621 QQ羣:618766405
● 提供OCP、OCM和高可用部分最實用的技能培訓
● 題目解答如有不當之處,還望各位朋友批評指正,共同進步
長按下圖識別二維碼或微信掃描下圖二維碼來關注小麥苗的微信公衆號:xiaomaimiaolhr,學習最實用的數據庫技術。