數棧SQL優化案例：OR條件優化

時間 2021-04-08

標籤 git github sql 安全 ide 工具性能測試優化欄目 SQL 简体版

原文原文鏈接

數棧是雲原生—站式數據中臺PaaS，咱們在github上有一個有趣的開源項目：https://github.com/DTStack/flinkxgit

FlinkX是一個基於Flink的批流統一的數據同步工具，既能夠採集靜態的數據，好比MySQL，HDFS等，也能夠採集實時變化的數據，好比MySQL binlog，Kafka等，是全域、異構、批流一體的數據同步引擎，你們若是有興趣，歡迎來github社區找咱們玩~github

在MySQL中，一樣的查詢條件，若是變換OR在SQL語句中的位置，那麼查詢的結果也會有差別，在較爲複雜的狀況下，可能會帶來索引選擇不佳的性能隱患，爲了不執行效率大幅度降低的問題，咱們能夠適當考慮使用Union all 對查詢邏輯較爲複雜的SQL進行分離。sql

常見OR使用場景，請閱讀如下案例：安全

案例一：不一樣列使用OR條件查詢

1. 待優化場景ide

SELECT
..
..
  FROM`t1` a
 WHERE a.token= '16149684'
   AND a.store_id= '242950'
   AND(a.registrationId IS NOT NULL
   AND a.registrationId<> '')
    OR a.uid= 308475
   AND a.registrationId IS NOT NULL
   AND a.registrationId<> ''

執行計劃工具

+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| id           | select_type           | table           | type           | key               | key_len           | ref           | rows           | Extra                                       |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| 1            | SIMPLE                | a               | range          |idx_registrationid | 99                |               | 100445         | Using index condition; Using where          |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+

共返回1 行記錄,花費 5 ms。性能

2. 場景解析測試

從查詢條件中能夠看出 token 和 uid 過濾性都很是好，可是因爲使用了 or，須要採用 index merge 的方法才能得到比較好的性能。但在實際執行過程當中MySQL優化器默認選擇了使用registrationId 上的索引，致使 SQL 的性能不好。優化

3. 場景優化ui

咱們將SQL改寫成union all的形式。

SELECT
...
...
FROM`t1` a
WHERE a.token = '16054473'
AND a.store_id = '138343'
AND b.is_refund = 1
AND (a.registrationId IS NOT NULL
AND a.registrationId <> '')
union all
SELECT
...
...
FROM`t1` a
where a.uid = 181579
AND a.registrationId IS NOT NULL
AND a.registrationId <> ''

+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| id           | select_type           | table           | type           | possible_keys                | key           | key_len           | ref                          | rows           | Extra                              |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| 1            | PRIMARY               | a               | ref            | IDX_TOKEN,IDX_STORE_ID_TOKEN | IDX_TOKEN     | 63                | const                        | 1              | Using index condition; Using where |
| 1            | PRIMARY               | b               | eq_ref         | PRIMARY                      | PRIMARY       | 4                 | youdian_life_sewsq.a.role_id | 1              | Using where                        |
| 2            | UNION                 | a               | const          | PRIMARY                      | PRIMARY       | 4                 | const                        | 1              |                                    |
| 2            | UNION                 | b               | const          | PRIMARY                      | PRIMARY       | 4                 | const                        | 0              | unique row not found               |
|              | UNION RESULT          | <union1,2>      | ALL            |                              |               |                   |                              |                | Using temporary                    |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+

共返回5 行記錄,花費 5 ms。

經過對比優化先後的執行計劃，能夠明顯看出，將SQL拆分紅兩個子查詢，再使用union對結果進行合併，穩定性和安全性更好，性能更高。

案例二：同一列使用OR查詢條件

1. 待優化場景

select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
  select product_id,
  count(0) count
  from t2 pprod
  inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
  and pprod.is_enable =1
  and ppinfo.is_enable=1
  and pinfo.belong_t0 =1
  and pinfo.end_time >=now()
  and not (
   pinfo.onshelv_time>'2019-06-30 00:00:00'
   or pinfo.end_time>'2018-12-05 00:00:00'
  )group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and (pc.count =0 or pc.count isnull ) limit 0,5;

執行計劃

2. 場景解析

本例的SQL查詢中有一個子查詢，子查詢被當成驅動表，產生了auto_key，經過SQL拆分進行測試，驗證主要是(pc.count =0 , or pc.count is null )會影響到整個SQL的性能，須要進行比較改寫。

3. 場景優化

首先咱們能夠單獨思考(pc.count =0 , or pc.count is null ) 如何進行優化？先寫一個相似的SQL

Select col from test where col =100 or col is null;
+--------+
| col    |
+--------+
|    100 |
|   NULL |
+--------+
2 rows in set (0.00 sec)

這個時候咱們看到的實際上是同一個列，但對應不一樣的值，這種狀況能夠利用case when進行轉換。

Select col From test where case when col is null then 100 else col =100 end;
+--------+
| col    |
+--------+
|    100 |
|   NULL |
+--------+
2 rows in set (0.00 sec)

再回到原始SQL進行改寫。

select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
  select product_id,
  count(0) count
  from t2 pprod
  inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
  and pprod.is_enable =1
  and ppinfo.is_enable=1
  and pinfo.belong_t0 =1
  and pinfo.end_time >=now()
  and not (
   pinfo.onshelv_time>'2019-06-30 00:00:00'
   or pinfo.end_time>'2018-12-05 00:00:00'
  )group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and case when pc.count is null then 0 else pc.count end=0 limit 0,5;

能夠看出優化後的SQL比原始SQL快了30秒，執行效率提高約50倍。

案例三：優化關聯SQL OR條件

1. 待優化場景

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL30SECOND)
OR user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
OR group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

2.場景解析

咱們仔細分析上述查詢語句，發現雖然業務邏輯只須要查詢半分鐘內修改的數據，但執行過程卻必須對全部的數據進行關聯操做，帶來沒必要要的性能損耗。

3.場景優化

咱們對原始SQL進行拆分操做，第一部分sql-01以下：

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

sql-01以user_msg 表爲驅動，使用gmt_modified 索引過濾最新數據。

第二部分sql-02以下：

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

ql-02以user爲驅動表，msg user_id 的索引過濾行很好。

第三部分sql-03以下：

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

sql-03以group爲驅動表，使用gmt_modified 索引過濾最新數據。