使用過oracle或者其餘關係數據庫的DBA或者開發人員都有這樣的經驗,在子查詢上都認爲數據庫已經作過優化,可以很好的選擇驅動表執行,而後在把該經驗移植到mysql數據庫上,可是不幸的是,mysql在子查詢的處理上有可能會讓你大失所望,在咱們的生產系統上就碰到過一些案例,例如:html
SELECT i_id, sum(i_sell) AS i_sell FROM table_data WHERE i_id IN (SELECT i_id FROM table_data WHERE Gmt_create >= '2011-10-07 00:00:00') GROUP BY i_id;
(備註:sql的業務邏輯能夠打個比方:先查詢出10-07號新賣出的100本書,而後在查詢這新賣出的100本書在整年的銷量狀況)。mysql
這條sql之因此出現的性能問題在於mysql優化器在處理子查詢的弱點,mysql優化器在處理子查詢的時候,會將將子查詢改寫。一般狀況下,咱們但願由內到外,先完成子查詢的結果,而後在用子查詢來驅動外查詢的表,完成查詢;可是mysql處理爲將會先掃描外面表中的全部數據,每條數據將會傳到子查詢中與子查詢關聯,若是外表很大的話,那麼性能上將會出現問題;
針對上面的查詢,因爲table_data這張表的數據有70W的數據,同時子查詢中的數據較多,有大量是重複的,這樣就須要關聯近70W次,大量的關聯致使這條sql執行了幾個小時也沒有執行完成,因此咱們須要改寫sql:sql
SELECT t2.i_id, SUM(t2.i_sell) AS sold FROM (SELECT DISTINCT i_id FROM table_data WHERE gmt_create >= '2011-10-07 00:00:00') t1, table_data t2 WHERE t1.i_id = t2.i_id GROUP BY t2.i_id;
咱們將子查詢改成了關聯,同時在子查詢中加上distinct,減小t1關聯t2的次數;
改造後,sql的執行時間降到100ms之內。
mysql的子查詢的優化一直不是很友好,一直有受業界批評比較多,也是我在sql優化中遇到過最多的問題之一,mysql在處理子查詢的時候,會將子查詢改寫,一般狀況下,咱們但願由內到外,也就是先完成子查詢的結果,而後在用子查詢來驅動外查詢的表,完成查詢,可是偏偏相反,子查詢不會先被執行;今天但願經過介紹一些實際的案例來加深對mysql子查詢的理解。下面將介紹一個完整的案例及其分析、調優的過程與思路。 數據庫
用戶反饋數據庫響應較慢,許多業務動更新被卡住;登陸到數據庫中觀察,發現長時間執行的sql;mysql優化
| 10437 | usr0321t9m9 | 10.242.232.50:51201 | oms | Execute | 1179 | Sending Sql爲: SELECT tradedto0_.* FROM a1 tradedto0_ WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid IN (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15;
UPDATE a1 SET tradesign='DAB67634-795C-4EAC-B4A0-78F0D531D62F', markColor=' #CD5555', memotime='2012-09- 22', markPerson='??' WHERE tradeoid IN ('gy2012092204495100032') ;
爲了儘快恢復應用,將其長時間執行的sql kill掉後,應用恢復正常; oracle
db@3306 :explain SELECT tradedto0_.* FROM a1 tradedto0_ WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid IN (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; +----+--------------------+------------+------+---------------+------+---------+------+-------+----- | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+------------+------+---------------+------+---------+------+-------+----- | 1 | PRIMARY | tradedto0_ | ALL | NULL | NULL | NULL | NULL | 27454 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | orderdto1_ | ALL | NULL | NULL | NULL | NULL | 40998 | Using where | +----+--------------------+------------+------+---------------+------+---------+------+-------+-----
從執行計劃上,咱們開始一步一步地進行優化:
首先,咱們看看執行計劃的第二行,也就是子查詢的那部分,orderdto1_進行了全表的掃描,咱們看看能不能添加適當的索引: 性能
db@3306:alter table a2 add index ind_a2(proname,procode,tradeoid); ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes
添加組合索引超過了最大key length限制:測試
db@3306 :DESC a2 ; +---------------------+---------------+------+-----+---------+-------+ | FIELD | TYPE | NULL | KEY | DEFAULT | Extra | +---------------------+---------------+------+-----+---------+-------+ | OID | VARCHAR(50) | NO | PRI | NULL | | | TRADEOID | VARCHAR(50) | YES | | NULL | | | PROCODE | VARCHAR(50) | YES | | NULL | | | PRONAME | VARCHAR(1000) | YES | | NULL | | | SPCTNCODE | VARCHAR(200) | YES | | NULL | |
db@3306 :SELECT MAX(LENGTH(PRONAME)),avg(LENGTH(PRONAME)) FROM a2; +----------------------+----------------------+ | MAX(LENGTH(PRONAME)) | avg(LENGTH(PRONAME)) | +----------------------+----------------------+ | 95 | 24.5588 |
ALTER TABLE MODIFY COLUMN PRONAME VARCHAR(156);
再進行執行計劃分析:優化
db@3306 :explain SELECT tradedto0_.* FROM a1 tradedto0_ WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid IN (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; +----+--------------------+------------+-------+-----------------+----------------------+---------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+------------+-------+-----------------+----------------------+---------+ | 1 | PRIMARY | tradedto0_ | ref | ind_tradestatus | ind_tradestatus | 345 | const,const,const,const | 8962 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index | +----+--------------------+------------+-------+-----------------+----------------------+---------+
發現性能仍是上不去,關鍵在兩個表掃描的行數並無減少(8962*41005),上面添加的索引沒有太大的效果,如今查看t表的執行結果: ui
db@3306 : SELECT orderdto1_.tradeoid FROM t orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%'; Empty SET (0.05 sec)
結果集爲空,因此須要將t表的結果集作做爲驅動表;
經過上面測試驗證,普通的mysql子查詢寫法性能上是不好的,爲mysql的子查詢自然的弱點,須要將sql進行改寫爲關聯的寫法:
SELECT tradedto0_.* FROM a1 tradedto0_ , (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2 WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid=t2.tradeoid) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15;
db@3306 :explain SELECT tradedto0_.* FROM a1 tradedto0_ , (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2 WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid=t2.tradeoid) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; +----+-------------+------------+-------+---------------+----------------------+---------+------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+----------------------+---------+------+ | 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables | | 2 | DERIVED | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index | +----+-------------+------------+-------+---------------+----------------------+---------+------+
db@3306 : SELECT tradedto0_.* FROM a1 tradedto0_ , (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2 WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid=t2.tradeoid) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; Empty SET (0.03 sec)
縮短到了毫秒;
當一個查詢是另外一個查詢的條件時,稱之爲子查詢。子查詢可使用幾個簡單命令構造功能強大的複合命令。
子查詢最經常使用於WHERE子句中。還用在SELECT,FROM子句中,下面分別舉例說明。
1. 子查詢用WHERE子句。
示例:顯示emp表中職位爲CLERK和SALESMAN的員工信息
SQL> SELECT * FROM emp WHERE job in('CLERK','SALESMAN'); EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ---------- --------- ----- ----------- --------- --------- ------ 7369 SMITH CLERK 7902 1980/12/17 800.00 20 7499 ALLEN SALESMAN 7698 1981/2/20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981/2/22 1250.00 500.00 30 7654 MARTIN SALESMAN 7698 1981/9/28 1250.00 1400.00 30 7844 TURNER SALESMAN 7698 1981/9/8 1500.00 0.00 30 7876 ADAMS CLERK 7788 1987/5/23 1100.00 20 7900 JAMES CLERK 7698 1981/12/3 950.00 30 7934 MILLER CLERK 7782 1982/1/23 1300.00 10 8 rows selected |
2. 子查詢用from子句。
示例:顯示emp表中5-10條記錄。
SQL> SELECT empno,ename,job,hiredate,sal,comm,deptno 2 FROM (SELECT ROWNUM r,emp.* FROM emp ) T 3 WHERE T.r>=5 AND T.r<10; EMPNO ENAME JOB HIREDATE SAL COMM DEPTNO ----- ---------- --------- ----------- --------- --------- ------ 7654 MARTIN SALESMAN 1981/9/28 1250.00 1400.00 30 7698 BLAKE MANAGER 1981/5/1 2850.00 30 7782 CLARK MANAGER 1981/6/9 2450.00 10 7788 SCOTT ANALYST 1987/4/19 3000.00 20 7839 KING PRESIDENT 1981/11/17 5000.00 10 5 rows selected |
3.子查詢用select子句
示例: 顯示emp表中所員工信息及所在部門名稱。
SQL> SELECT e.*, 2 (SELECT dname FROM dept WHERE deptno=e.deptno) as dname 3 FROM EMP e; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO dname ----- ---------- --------- ----- ----------- --------- --------- ------ ----- 7369 SMITH CLERK 7902 1980/12/17 800.00 20 RESEARCH 7499 ALLEN SALESMAN 7698 1981/2/20 1600.00 300.00 30 SALES 7521 WARD SALESMAN 7698 1981/2/22 1250.00 500.00 30 SALES 7566 JONES MANAGER 7839 1981/4/2 2975.00 20 RESEARCH 7654 MARTIN SALESMAN 7698 1981/9/28 1250.00 1400.00 30 SALES 7698 BLAKE MANAGER 7839 1981/5/1 2850.00 30 SALES …… 14 rows selected |
寫在前面的話:
——MySQL 的子查詢爲何有時候很糟糕——
引子:這樣的子查詢爲何這麼慢?
下面的例子是一個慢查,線上執行時間至關誇張。爲何呢?
SELECT gid,COUNT(id) as count
FROM shop_goods g1
WHERE status =0 and gid IN (
SELECT gid FROM shop_goods g2 WHERE sid IN (1519066,1466114,1466110,1466102,1466071,1453929)
)
GROUP BY gid;
它的執行計劃以下,請注意看關鍵詞「DEPENDENT SUBQUERY」:
id select_type table type possible_keys key key_len ref rows Extra
------ ------------------ ------ -------------- -------------------------------------- ------------ ------- ------ ------ -----------
1 PRIMARY g1 index (NULL) idx_gid 5 (NULL) 850672 Using where
2 DEPENDENT SUBQUERY g2 index_subquery id_shop_goods,idx_sid,idx_gid idx_gid 5 func 1 Using where
基礎知識:Dependent Subquery意味着什麼
官方含義爲:
SUBQUERY:子查詢中的第一個SELECT;
DEPENDENT SUBQUERY:子查詢中的第一個SELECT,取決於外面的查詢 。
換句話說,就是 子查詢對 g2 的查詢方式依賴於外層 g1 的查詢。
什麼意思呢?它意味着兩步:
第一步,MySQL 根據 select gid,count(id) from shop_goods where status=0 group by gid; 獲得一個大結果集 t1,其數據量就是上圖中的 rows=850672 了。
第二步,上面的大結果集 t1 中的每一條記錄,都將與子查詢 SQL 組成新的查詢語句:select gid from shop_goods where sid in (15...blabla..29) and gid=%t1.gid%。等於說,子查詢要執行85萬次……即便這兩步查詢都用到了索引,但不慢纔怪。
如此一來,子查詢的執行效率竟然受制於外層查詢的記錄數,那還不如拆成兩個獨立查詢順序執行呢。
優化策略1:
你不想拆成兩個獨立查詢的話,也能夠與臨時表聯表查詢,以下所示:
SELECT g1.gid,count(1)
FROM shop_goods g1,(select gid from shop_goods WHERE sid in (1519066,1466114,1466110,1466102,1466071,1453929)) g2
where g1.status=0 and g1.gid=g2.gid
GROUP BY g1.gid;
也能獲得一樣的結果,且是毫秒級。
它的執行計劃爲:
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- -------------- ------ ------------------------- ------------- ------- ----------- ------ -------------------------------
1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 30 Using temporary; Using filesort
1 PRIMARY g1 ref idx_gid idx_gid 5 g2.gid 1 Using where
2 DERIVED shop_goods range id_shop_goods,idx_sid id_shop_goods 5 (NULL) 30 Using where; Using index
DERIVED 的官方含義爲:
DERIVED:用於 from 子句裏有子查詢的狀況。MySQL 會遞歸執行這些子查詢,把結果放在臨時表裏。
DBA觀點引用:MySQL 子查詢的弱點
hidba 論述道(參考資源3):
mysql 在處理子查詢時,會改寫子查詢。
一般狀況下,咱們但願由內到外,先完成子查詢的結果,而後再用子查詢來驅動外查詢的表,完成查詢。
例如:
select * from test where tid in(select fk_tid from sub_test where gid=10)
一般咱們會感性地認爲該 sql 的執行順序是:
sub_test 表中根據 gid 取得 fk_tid(2,3,4,5,6)記錄,
而後再到 test 中,帶入 tid=2,3,4,5,6,取得查詢數據。
可是實際mysql的處理方式爲:
select * from test where exists (
select * from sub_test where gid=10 and sub_test.fk_tid=test.tid
)
mysql 將會掃描 test 中全部數據,每條數據都將會傳到子查詢中與 sub_test 關聯,子查詢不會先被執行,因此若是 test 表很大的話,那麼性能上將會出現問題。
《高性能MySQL》一書的觀點引用
《高性能MySQL》的第4.4節「MySQL查詢優化器的限制(Limitations of the MySQL Query Optimizer)」之第4.4.1小節「關聯子查詢(Correlated Subqueries)」也有相似的論述:
MySQL有時優化子查詢很糟,特別是在WHERE從句中的IN()子查詢。……
好比在sakila數據庫sakila.film表中找出全部的film,這些film的actoress包括Penelope Guiness(actor_id = 1)。能夠這樣寫:
mysql> SELECT * FROM sakila.film
-> WHERE film_id IN(
-> SELECT film_id FROM sakila.film_actor WHERE actor_id = 1);
mysql> EXPLAIN SELECT * FROM sakila.film ...;
+----+--------------------+------------+--------+------------------------+
| id | select_type | table | type | possible_keys |
+----+--------------------+------------+--------+------------------------+
| 1 | PRIMARY | film | ALL | NULL |
| 2 | DEPENDENT SUBQUERY | film_actor | eq_ref | PRIMARY,idx_fk_film_id |
+----+--------------------+------------+--------+------------------------+
根據EXPLAIN的輸出,MySQL將全表掃描film表,對找到的每行執行子查詢,這是很很差的性能。幸運的是,很容易改寫爲一個join查詢:
mysql> SELECT film.* FROM sakila.film
-> INNER JOIN sakila.film_actor USING(film_id)
-> WHERE actor_id = 1;
另一個方法是經過使用GROUP_CONCAT()執行子查詢做爲一個單獨的查詢,手工產生IN()列表。有時候比join還快。(注:你不妨在咱們的庫上試試看 SELECT goods_id,GROUP_CONCAT(cast(id as char))
FROM bee_shop_goods
WHERE shop_id IN (1519066,1466114,1466110,1466102,1466071,1453929)
GROUP BY goods_id;)
MySQL已經由於這種特定類型的子查詢執行計劃而被批評。
什麼時候子查詢是好的
MySQL並不老是把子查詢優化得很糟。有時候仍是很優化的。下面是個例子:
mysql> EXPLAIN SELECT film_id, language_id FROM sakila.film
-> WHERE NOT EXISTS(
-> SELECT * FROM sakila.film_actor
-> WHERE film_actor.film_id = film.film_id
-> )G
……(注:具體文字仍是請閱讀《高性能MySQL》吧)
是的,子查詢並非老是被優化得很糟糕,具體問題具體分析,但別忘了 explain 。