淺談 MySQL 子查詢及其優化

使用過oracle或者其餘關係數據庫的DBA或者開發人員都有這樣的經驗,在子查詢上都認爲數據庫已經作過優化,可以很好的選擇驅動表執行,而後在把該經驗移植到mysql數據庫上,可是不幸的是,mysql在子查詢的處理上有可能會讓你大失所望,在咱們的生產系統上就碰到過一些案例,例如: mysql

SELECT i_id,
       sum(i_sell) AS i_sell
FROM table_data
WHERE i_id IN
    (SELECT i_id
     FROM table_data
     WHERE Gmt_create >= '2011-10-07 00:00:00')
GROUP BY i_id;
(備註:sql的業務邏輯能夠打個比方:先查詢出10-07號新賣出的100本書,而後在查詢這新賣出的100本書在整年的銷量狀況)。

這條sql之因此出現的性能問題在於mysql優化器在處理子查詢的弱點,mysql優化器在處理子查詢的時候,會將將子查詢改寫。一般狀況下,咱們但願由內到外,先完成子查詢的結果,而後在用子查詢來驅動外查詢的表,完成查詢;可是mysql處理爲將會先掃描外面表中的全部數據,每條數據將會傳到子查詢中與子查詢關聯,若是外表很大的話,那麼性能上將會出現問題;
針對上面的查詢,因爲table_data這張表的數據有70W的數據,同時子查詢中的數據較多,有大量是重複的,這樣就須要關聯近70W次,大量的關聯致使這條sql執行了幾個小時也沒有執行完成,因此咱們須要改寫sql:
sql

SELECT t2.i_id,
       SUM(t2.i_sell) AS sold
FROM
  (SELECT DISTINCT i_id
   FROM table_data
   WHERE gmt_create >= '2011-10-07 00:00:00') t1,
                                              table_data t2
WHERE t1.i_id = t2.i_id
GROUP BY t2.i_id;
咱們將子查詢改成了關聯,同時在子查詢中加上distinct,減小t1關聯t2的次數;
改造後,sql的執行時間降到100ms之內。
mysql的子查詢的優化一直不是很友好,一直有受業界批評比較多,也是我在sql優化中遇到過最多的問題之一,mysql在處理子查詢的時候,會將子查詢改寫,一般狀況下,咱們但願由內到外,也就是先完成子查詢的結果,而後在用子查詢來驅動外查詢的表,完成查詢,可是偏偏相反,子查詢不會先被執行;今天但願經過介紹一些實際的案例來加深對mysql子查詢的理解。下面將介紹一個完整的案例及其分析、調優的過程與思路。

一、案例

用戶反饋數據庫響應較慢,許多業務動更新被卡住;登陸到數據庫中觀察,發現長時間執行的sql;
數據庫

| 10437 | usr0321t9m9 | 10.242.232.50:51201 | oms | Execute | 1179 | Sending

Sql爲:

SELECT tradedto0_.*
FROM a1 tradedto0_
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid IN
         (SELECT orderdto1_.tradeoid
          FROM a2 orderdto1_
          WHERE orderdto1_.proname LIKE '%??%'
            OR orderdto1_.procode LIKE '%??%'))
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

二、現象:其餘表的更新被阻塞

UPDATE a1
SET tradesign='DAB67634-795C-4EAC-B4A0-78F0D531D62F',
              markColor=' #CD5555',
                        memotime='2012-09- 22',
                                 markPerson='??'
WHERE tradeoid IN ('gy2012092204495100032') ;
爲了儘快恢復應用,將其長時間執行的sql kill掉後,應用恢復正常;

三、分析執行計劃:

db@3306 :explain
SELECT tradedto0_.*
FROM a1 tradedto0_
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid IN
         (SELECT orderdto1_.tradeoid
          FROM a2 orderdto1_
          WHERE orderdto1_.proname LIKE '%??%'
            OR orderdto1_.procode LIKE '%??%'))
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

+----+--------------------+------------+------+---------------+------+---------+------+-------+-----
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+------+---------------+------+---------+------+-------+-----
| 1 | PRIMARY | tradedto0_ | ALL | NULL | NULL | NULL | NULL | 27454 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | orderdto1_ | ALL | NULL | NULL | NULL | NULL | 40998 | Using where |
+----+--------------------+------------+------+---------------+------+---------+------+-------+-----
從執行計劃上,咱們開始一步一步地進行優化:
首先,咱們看看執行計劃的第二行,也就是子查詢的那部分,orderdto1_進行了全表的掃描,咱們看看能不能添加適當的索引:

A . 使用覆蓋索引:

db@3306:alter table a2 add index ind_a2(proname,procode,tradeoid);
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes
添加組合索引超過了最大key length限制:

B.查看該表的字段定義:

db@3306 :DESC  a2 ;
+---------------------+---------------+------+-----+---------+-------+
| FIELD               | TYPE          | NULL | KEY | DEFAULT | Extra |
+---------------------+---------------+------+-----+---------+-------+
| OID                 | VARCHAR(50)   | NO   | PRI | NULL    |       |
| TRADEOID            | VARCHAR(50)   | YES  |     | NULL    |       |
| PROCODE             | VARCHAR(50)   | YES  |     | NULL    |       |
| PRONAME             | VARCHAR(1000) | YES  |     | NULL    |       |
| SPCTNCODE           | VARCHAR(200)  | YES  |     | NULL    |       |

C.查看錶字段的平均長度:

db@3306 :SELECT MAX(LENGTH(PRONAME)),avg(LENGTH(PRONAME)) FROM a2;
+----------------------+----------------------+
| MAX(LENGTH(PRONAME)) | avg(LENGTH(PRONAME)) |
+----------------------+----------------------+
|    95              |       24.5588 |

D.縮小字段長度

ALTER TABLE MODIFY COLUMN PRONAME VARCHAR(156);
再進行執行計劃分析:

db@3306 :explain
SELECT tradedto0_.*
FROM a1 tradedto0_
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid IN
         (SELECT orderdto1_.tradeoid
          FROM a2 orderdto1_
          WHERE orderdto1_.proname LIKE '%??%'
            OR orderdto1_.procode LIKE '%??%'))
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;


+----+--------------------+------------+-------+-----------------+----------------------+---------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+-----------------+----------------------+---------+
| 1 | PRIMARY | tradedto0_ | ref | ind_tradestatus | ind_tradestatus | 345 | const,const,const,const | 8962 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index |
+----+--------------------+------------+-------+-----------------+----------------------+---------+
發現性能仍是上不去,關鍵在兩個表掃描的行數並無減少(8962*41005),上面添加的索引沒有太大的效果,如今查看t表的執行結果:

db@3306 :
SELECT orderdto1_.tradeoid
FROM t orderdto1_
WHERE orderdto1_.proname LIKE '%??%'
  OR orderdto1_.procode LIKE '%??%';

 Empty
SET (0.05 sec)
結果集爲空,因此須要將t表的結果集作做爲驅動表;

四、改寫子查詢:

經過上面測試驗證,普通的mysql子查詢寫法性能上是不好的,爲mysql的子查詢自然的弱點,須要將sql進行改寫爲關聯的寫法:
mysql優化

SELECT tradedto0_.*
FROM a1 tradedto0_ ,
  (SELECT orderdto1_.tradeoid
   FROM a2 orderdto1_
   WHERE orderdto1_.proname LIKE '%??%'
     OR orderdto1_.procode LIKE '%??%')t2
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid=t2.tradeoid)
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

五、查看執行計劃:

db@3306 :explain
SELECT tradedto0_.*
FROM a1 tradedto0_ ,
  (SELECT orderdto1_.tradeoid
   FROM a2 orderdto1_
   WHERE orderdto1_.proname LIKE '%??%'
     OR orderdto1_.procode LIKE '%??%')t2
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid=t2.tradeoid)
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

+----+-------------+------------+-------+---------------+----------------------+---------+------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+----------------------+---------+------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index |
+----+-------------+------------+-------+---------------+----------------------+---------+------+

六、執行時間:

db@3306 :
SELECT tradedto0_.*
FROM a1 tradedto0_ ,
  (SELECT orderdto1_.tradeoid
   FROM a2 orderdto1_
   WHERE orderdto1_.proname LIKE '%??%'
     OR orderdto1_.procode LIKE '%??%')t2
WHERE tradedto0_.tradestatus='1'
  AND (tradedto0_.tradeoid=t2.tradeoid)
  AND tradedto0_.undefine4='1'
  AND tradedto0_.invoicetype='1'
  AND tradedto0_.tradestep='0'
  AND (tradedto0_.orderCompany LIKE '0002%')
ORDER BY tradedto0_.tradesign ASC,
         tradedto0_.makertime DESC LIMIT 15;

 Empty
SET (0.03 sec)
縮短到了毫秒;

七、總結:

1. mysql子查詢在執行計劃上有着明顯的弱點,須要將子查詢進行改寫
能夠參考:
a. 生產庫中遇到mysql的子查詢:http://hidba.org/?p=412
b. 內建的builtin InnoDB,子查詢阻塞更新:http://hidba.org/?p=456
2. 在表結構設計上,不要隨便使用varchar(N)的大字段,致使沒法使用索引
能夠參考:
a. JDBC內存管理—varchar2(4000)的影響:http://hidba.org/?p=31
b. innodb中大字段的限制:http://hidba.org/?p=144
c. innodb使用大字段text,blob的一些優化建議: http://hidba.org/?p=551

八、Refer:

[1] 生產庫中遇到mysql的子查詢  http://hidba.org/?p=412 oracle

[2] 淺談mysql的子查詢  http://hidba.org/?p=624 性能

[3] mysql子查詢的弱點  http://hidba.org/?p=260 測試

相關文章
相關標籤/搜索