數棧是雲原生—站式數據中臺PaaS,咱們在github上有一個有趣的開源項目:https://github.com/DTStack/flinkxgit
FlinkX是一個基於Flink的批流統一的數據同步工具,既能夠採集靜態的數據,好比MySQL,HDFS等,也能夠採集實時變化的數據,好比MySQL binlog,Kafka等,是全域、異構、批流一體的數據同步引擎,你們若是有興趣,歡迎來github社區找咱們玩~github
在MySQL中,一樣的查詢條件,若是變換OR在SQL語句中的位置,那麼查詢的結果也會有差別,在較爲複雜的狀況下,可能會帶來索引選擇不佳的性能隱患,爲了不執行效率大幅度降低的問題,咱們能夠適當考慮使用Union all 對查詢邏輯較爲複雜的SQL進行分離。sql
常見OR使用場景,請閱讀如下案例:安全
1. 待優化場景ide
SELECT .. .. FROM`t1` a WHERE a.token= '16149684' AND a.store_id= '242950' AND(a.registrationId IS NOT NULL AND a.registrationId<> '') OR a.uid= 308475 AND a.registrationId IS NOT NULL AND a.registrationId<> ''
執行計劃工具
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+ | id | select_type | table | type | key | key_len | ref | rows | Extra | +--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+ | 1 | SIMPLE | a | range |idx_registrationid | 99 | | 100445 | Using index condition; Using where | +--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
共返回1 行記錄,花費 5 ms。性能
2. 場景解析測試
從查詢條件中能夠看出 token 和 uid 過濾性都很是好,可是因爲使用了 or, 須要採用 index merge 的方法才能得到比較好的性能。但在實際執行過程當中MySQL優化器默認選擇了使用registrationId 上的索引,致使 SQL 的性能不好。優化
3. 場景優化ui
咱們將SQL改寫成union all的形式。
SELECT ... ... FROM`t1` a WHERE a.token = '16054473' AND a.store_id = '138343' AND b.is_refund = 1 AND (a.registrationId IS NOT NULL AND a.registrationId <> '') union all SELECT ... ... FROM`t1` a where a.uid = 181579 AND a.registrationId IS NOT NULL AND a.registrationId <> ''
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+ | 1 | PRIMARY | a | ref | IDX_TOKEN,IDX_STORE_ID_TOKEN | IDX_TOKEN | 63 | const | 1 | Using index condition; Using where | | 1 | PRIMARY | b | eq_ref | PRIMARY | PRIMARY | 4 | youdian_life_sewsq.a.role_id | 1 | Using where | | 2 | UNION | a | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 2 | UNION | b | const | PRIMARY | PRIMARY | 4 | const | 0 | unique row not found | | | UNION RESULT | <union1,2> | ALL | | | | | | Using temporary | +--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
共返回5 行記錄,花費 5 ms。
經過對比優化先後的執行計劃,能夠明顯看出,將SQL拆分紅兩個子查詢,再使用union對結果進行合併,穩定性和安全性更好,性能更高。
1. 待優化場景
select .... .... from t1 as mci left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1 left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2 left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3 left join( select product_id, count(0) count from t2 pprod inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id and pprod.is_enable =1 and ppinfo.is_enable=1 and pinfo.belong_t0 =1 and pinfo.end_time >=now() and not ( pinfo.onshelv_time>'2019-06-30 00:00:00' or pinfo.end_time>'2018-12-05 00:00:00' )group by pprod.product_id )as pc on pc.product_id = mci.product_id where mci.is_enable =0 and mci.comodifty_type in ('1', '5', '6') and (pc.count =0 or pc.count isnull ) limit 0,5;
執行計劃
2. 場景解析
本例的SQL查詢中有一個子查詢,子查詢被當成驅動表,產生了auto_key,經過SQL拆分進行測試,驗證主要是(pc.count =0 , or pc.count is null )會影響到整個SQL的性能,須要進行比較改寫。
3. 場景優化
首先咱們能夠單獨思考(pc.count =0 , or pc.count is null ) 如何進行優化?先寫一個相似的SQL
Select col from test where col =100 or col is null; +--------+ | col | +--------+ | 100 | | NULL | +--------+ 2 rows in set (0.00 sec)
這個時候咱們看到的實際上是同一個列,但對應不一樣的值,這種狀況能夠利用case when進行轉換。
Select col From test where case when col is null then 100 else col =100 end; +--------+ | col | +--------+ | 100 | | NULL | +--------+ 2 rows in set (0.00 sec)
再回到原始SQL進行改寫。
select .... .... from t1 as mci left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1 left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2 left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3 left join( select product_id, count(0) count from t2 pprod inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id and pprod.is_enable =1 and ppinfo.is_enable=1 and pinfo.belong_t0 =1 and pinfo.end_time >=now() and not ( pinfo.onshelv_time>'2019-06-30 00:00:00' or pinfo.end_time>'2018-12-05 00:00:00' )group by pprod.product_id )as pc on pc.product_id = mci.product_id where mci.is_enable =0 and mci.comodifty_type in ('1', '5', '6') and case when pc.count is null then 0 else pc.count end=0 limit 0,5;
能夠看出優化後的SQL比原始SQL快了30秒,執行效率提高約50倍。
1. 待優化場景
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, … FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL30SECOND) OR user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND) OR group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
2.場景解析
咱們仔細分析上述查詢語句,發現雖然業務邏輯只須要查詢半分鐘內修改的數據,但執行過程卻必須對全部的數據進行關聯操做,帶來沒必要要的性能損耗。
3.場景優化
咱們對原始SQL進行拆分操做,第一部分sql-01以下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, … FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
sql-01以user_msg 表爲驅動,使用gmt_modified 索引過濾最新數據。
第二部分sql-02以下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, … FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
ql-02以user爲驅動表,msg user_id 的索引過濾行很好。
第三部分sql-03以下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, … FROM user_msg LEFT JOIN user ON user_msg.user_id = user.user_id LEFT JOIN group ON user_msg.group_id = group.group_id WHERE group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
sql-03以group爲驅動表,使用gmt_modified 索引過濾最新數據。
MySQL OR條件優化的常見場景主要有如下狀況:
一、相同列可使用IN進行代替
二、不一樣列及複雜的狀況下,可使用union all 進行分離
三、關聯SQL OR條件
咱們須要結合實際場景,分析優化。