大表關聯走hash?
案例:sql
---- 反正我執行過1個多小時,沒有跑完ide
SELECT
a.id AS order_id ,b.s_id AS bill_id,
d.id AS sub_order_id,
d.deal_oper_id
FROM EM_ORDER PARTITION(EM_ORDER_201611) A,
M_101_ID_2_GID B,
ER_ORDER_ORDER C,
EM_ORDER D,
EE_ORDER_PF_WORK E
WHERE A.SPEC_ID = 3010200004
AND A.ID = B.T_ID
AND A.STATUS_ID = 1000007
AND A.COMPLETE_TIME >= TO_DATE('2016-11-14 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and A.COMPLETE_TIME <= TO_DATE('2016-11-14 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
AND A.ID = C.A_ORDER_ID
AND C.B_ORDER_ID = D.ID
AND D.ID = E.ORDER_ID
AND e.work_type_id = 1001411
AND ( d.deal_oper_id IS NULL
OR (SELECT f_chk_idcard(x.identity_number)
FROM dm_staff x
WHERE x.id = d.deal_oper_id) = 0
);性能
各表表大小:如下這些表幾乎都是分區表。 惋惜的是分區表作的不是很合理, 都是沒有用到分區裁剪功能,分區全掃描。
優化size_mb segement_name code
4595.0625 EE_ORDER_PF_WORK
40159.0625 EM_ORDER
20059.0625 ER_ORDER_ORDER
20770.0625 M_101_ID_2_GID
dm_staff 小表, 20萬條數據。 orm
執行計劃:索引
1 Plan hash value: 309883988 2 3 -------------------------------------------------------------------------------------------------------------------------------------------- 4 | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | 5 -------------------------------------------------------------------------------------------------------------------------------------------- 6 | 0 | SELECT STATEMENT | | 1 | 130 | | 1218K (1)| 04:03:44 | | | 7 |* 1 | FILTER | | | | | | | | | 8 | 2 | NESTED LOOPS | | 21005 | 2666K| | 1166K (1)| 03:53:19 | | | 9 | 3 | NESTED LOOPS | | 21791 | 2666K| | 1166K (1)| 03:53:19 | | | 10|* 4 | HASH JOIN | | 21791 | 2234K| 17M| 1079K (1)| 03:35:53 | | | 11| 5 | NESTED LOOPS | | 183K| 15M| | 907K (1)| 03:01:32 | | | 12| 6 | NESTED LOOPS | | 183K| 15M| | 907K (1)| 03:01:32 | | | 13| 7 | NESTED LOOPS | | 183K| 11M| | 358K (1)| 01:11:37 | | | 14|* 8 | TABLE ACCESS BY GLOBAL INDEX ROWID| EM_ORDER | 106K| 3631K| | 39284 (1)| 00:07:52 | 8 | 8 | 15|* 9 | INDEX RANGE SCAN | IDX_EM_ORDER_COMP_TIME | 44669 | | | 330 (0)| 00:00:04 | | | 16|* 10 | INDEX RANGE SCAN | IDX_ER_ORDER_ORDER_OO | 2 | 56 | | 3 (0)| 00:00:01 | | | 17|* 11 | INDEX UNIQUE SCAN | PK_EM_ORDER | 1 | | | 2 (0)| 00:00:01 | | | 18| 12 | TABLE ACCESS BY GLOBAL INDEX ROWID | EM_ORDER | 1 | 26 | | 3 (0)| 00:00:01 | ROWID | ROWID | 19| 13 | PARTITION RANGE ALL | | 10M| 161M| | 156K (1)| 00:31:22 | 1 | 10 | 20|* 14 | TABLE ACCESS FULL | EE_ORDER_PF_WORK | 10M| 161M| | 156K (1)| 00:31:22 | 1 | 10 | 21|* 15 | INDEX RANGE SCAN | IDX_101_T_ID | 1 | | | 3 (0)| 00:00:01 | | | 22| 16 | TABLE ACCESS BY GLOBAL INDEX ROWID | M_101_ID_2_GID | 1 | 25 | | 4 (0)| 00:00:01 | ROWID | ROWID | 23| 17 | TABLE ACCESS BY INDEX ROWID | DM_STAFF | 1 | 20 | | 3 (0)| 00:00:01 | | | 24|* 18 | INDEX UNIQUE SCAN | PK_DM_STAFF | 1 | | | 2 (0)| 00:00:01 | | | 25-------------------------------------------------------------------------------------------------------------------------------------------- 26 27 Predicate Information (identified by operation id): 28 --------------------------------------------------- 29 30 1 - filter( (SELECT "F_CHK_IDCARD"("X"."IDENTITY_NUMBER") FROM "QWZW_ER"."DM_STAFF" "X" WHERE "X"."ID"=:B1)=0) 31 4 - access("D"."ID"="E"."ORDER_ID") 32 8 - filter("A"."SPEC_ID"=3010200004 AND "A"."STATUS_ID"=1000007) 33 9 - access("A"."COMPLETE_TIME">=TO_DATE(' 2016-11-16 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "A"."COMPLETE_TIME"<=TO_DATE(' 34 2016-11-16 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 35 filter(TBL$OR$IDX$PART$NUM("QWZW_ER"."EM_ORDER",0,1,0,ROWID)=8) 36 10 - access("A"."ID"="C"."A_ORDER_ID") 37 11 - access("C"."B_ORDER_ID"="D"."ID") 38 14 - filter("E"."WORK_TYPE_ID"=1001411) 39 15 - access("A"."ID"="B"."T_ID") 40 filter("B"."T_ID" IS NOT NULL) 41 18 - access("X"."ID"=:B1)
--------------------------------------------------------------------------------------------------------------------------------------------
問題分析:
1,看執行計劃:
第一眼看見這個sql我會認爲id=7到id=9有問題,id=14有問題。hash
id=9,表40G,雖然一個分區可是全局索引,索引很大的;大表回表再過濾是有問題的,看9的過濾條件,
查詢的是一天的數據,這個表是按月分區的,存儲一年的數據,
40/365=0.1G,也就是說天天0.1g,因此這裏問題不大。
但仍是建本地索引速度會更快一點。建的索引不合理,何況這個影響不是很大,暫且先忽略。it
看id=14,E表,4G的表,全表掃描很慢, 不少人想到能不能建索引? 答案是不能! "E"."WORK_TYPE_ID"=1001411數據不少,建索引也不走,就算走了索引估計還沒全錶快。
id=14和id=15走hash,有問題,E表4g的表,全表掃描 , 確定須要大量時間。 io
我第一眼看到id=1 的地方 filter,謂詞部分 後面有一個小表的查詢, 這個地方確定有性能問題的,因而在整個SQL優化完後, 我把這個filter去掉,竟然發現查詢時間差很少
因而我就沒有管它。
2,看sql:這是5個表關聯,看where條件部分,E表4g比較大,而過濾條件只有一個 e.work_type_id = 1001411,
你看這個SQL發現 E表的數據再也不 select 後面的列出現,能夠改爲半鏈接的,並且半鏈接的效率要高一點。 因而我改爲exists
改寫爲
exists( select 1 from EE_ORDER_PF_WORK E where D.ID = E.ORDER_ID AND e.work_type_id = 1001411 ) .
其實更多的時候我會改爲in。
3 加hint 使執行計劃整體走NL,這裏讓小表做爲驅動表,走NL,一路驅動下去。爲何select這裏要全走NL?由於看上面的執行計劃已經分析
過了id=5到id=12雖然都是大表, 可是走的都是索引, 結合A表時間條件, 以及其餘條件, 驅動表的數據量不是不少。 而且最終數據量也不是不少。
改SQL:爲
SELECT /*+ use_nl(a,b) use_nl(a,c) use_nl(c,d) leading(a) */
'開通單' AS spec_name,
a.id AS order_id ,b.s_id AS bill_id,
d.id AS sub_order_id,
---x.s_id task_id,
d.deal_oper_idFROM EM_ORDER PARTITION(EM_ORDER_201611) A,
M_101_ID_2_GID B,
ER_ORDER_ORDER C,
EM_ORDER D
WHERE A.SPEC_ID = 3010200004
AND A.ID = B.T_ID
AND A.STATUS_ID = 1000007
AND A.COMPLETE_TIME >= TO_DATE('2016-11-18 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and A.COMPLETE_TIME <= TO_DATE('2016-11-18 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
AND A.ID = C.A_ORDER_ID
AND C.B_ORDER_ID = D.ID
AND exists( select /*+ nl_sj */ 1 from EE_ORDER_PF_WORK E where D.ID = E.ORDER_ID AND e.work_type_id = 1001411 )
AND ( d.deal_oper_id IS NULL
OR (SELECT f_chk_idcard(x.identity_number)
FROM dm_staff x
WHERE x.id = d.deal_oper_id) = 0
) ;
4,改寫以後的執行計劃:此時id=14已經走了NESTED LOOPS SEMI,而id=13已經自動走了索引。
1 Plan hash value: 1576200999 2 3 ----------------------------------------------------------------------------------------------------------------------------------- 4 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | 5 ----------------------------------------------------------------------------------------------------------------------------------- 6 | 0 | SELECT STATEMENT | | 3537 | 449K| 1372K (1)| 04:34:26 | | | 7 |* 1 | FILTER | | | | | | | | 8 | 2 | NESTED LOOPS | | 19483 | 2473K| 1372K (1)| 04:34:26 | | | 9 | 3 | NESTED LOOPS | | 20212 | 2473K| 1372K (1)| 04:34:26 | | | 10 | 4 | NESTED LOOPS SEMI | | 20212 | 2072K| 1291K (1)| 04:18:16 | | | 11 | 5 | NESTED LOOPS | | 169K| 14M| 844K (1)| 02:48:57 | | | 12 | 6 | NESTED LOOPS | | 169K| 10M| 334K (1)| 01:07:00 | | | 13 |* 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| EM_ORDER | 98540 | 3368K| 39284 (1)| 00:07:52 | 8 | 8 | 14 |* 8 | INDEX RANGE SCAN | IDX_EM_ORDER_COMP_TIME | 44669 | | 330 (0)| 00:00:04 | | | 15 |* 9 | INDEX RANGE SCAN | IDX_ER_ORDER_ORDER_OO | 2 | 56 | 3 (0)| 00:00:01 | | | 16 | 10 | TABLE ACCESS BY GLOBAL INDEX ROWID | EM_ORDER | 1 | 26 | 3 (0)| 00:00:01 | ROWID | ROWID | 17 |* 11 | INDEX UNIQUE SCAN | PK_EM_ORDER | 1 | | 2 (0)| 00:00:01 | | | 18 |* 12 | TABLE ACCESS BY GLOBAL INDEX ROWID | EE_ORDER_PF_WORK | 1257K| 19M| 3 (0)| 00:00:01 | ROWID | ROWID | 19 |* 13 | INDEX UNIQUE SCAN | PK_EE_ORDER_PF_WORK | 1 | | 2 (0)| 00:00:01 | | | 20 |* 14 | INDEX RANGE SCAN | IDX_101_T_ID | 1 | | 3 (0)| 00:00:01 | | | 21 | 15 | TABLE ACCESS BY GLOBAL INDEX ROWID | M_101_ID_2_GID | 1 | 25 | 4 (0)| 00:00:01 | ROWID | ROWID | 22 | 16 | TABLE ACCESS BY INDEX ROWID | DM_STAFF | 1 | 20 | 3 (0)| 00:00:01 | | | 23 |* 17 | INDEX UNIQUE SCAN | PK_DM_STAFF | 1 | | 2 (0)| 00:00:01 | | | 24 ----------------------------------------------------------------------------------------------------------------------------------- 25 26 Predicate Information (identified by operation id): 27 --------------------------------------------------- 28 29 1 - filter("D"."DEAL_OPER_ID" IS NULL OR (SELECT "F_CHK_IDCARD"("X"."IDENTITY_NUMBER") FROM "QWZW_ER"."DM_STAFF" "X" 30 WHERE "X"."ID"=:B1)=0) 31 7 - filter("A"."SPEC_ID"=3010200004 AND "A"."STATUS_ID"=1000007) 32 8 - access("A"."COMPLETE_TIME">=TO_DATE(' 2016-11-18 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 33 "A"."COMPLETE_TIME"<=TO_DATE(' 2016-11-18 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 34 filter(TBL$OR$IDX$PART$NUM("QWZW_ER"."EM_ORDER",0,1,0,ROWID)=8) 35 9 - access("A"."ID"="C"."A_ORDER_ID") 36 11 - access("C"."B_ORDER_ID"="D"."ID") 37 12 - filter("E"."WORK_TYPE_ID"=1001411) 38 13 - access("D"."ID"="E"."ORDER_ID") 39 14 - access("A"."ID"="B"."T_ID") 40 filter("B"."T_ID" IS NOT NULL) 41 17 - access("X"."ID"=:B1)
5,優化完以後, 首次執行大概在 4分鐘吧, 再次執行只須要 47S 左右。對比以前的執行計劃,反正木有發如今47S出結果的。 結果大概3千多條數據。 總結:大表很大表關聯,常理思路是走hash, 可是此處場景,驅動表經過過濾以後返回幾千條數據,被驅動表走主鍵惟一掃描,走NL效率更高。