導讀:html
在作MySQL數據庫的優化工做時,若是隻涉及到單表查詢,那麼95%的慢SQL都只需從索引上入手優化便可,經過添加索引來消除全表掃描或者排序操做,大機率能實現SQL語句執行速度質的飛躍。對於單表的優化操做,相信大部分DBA甚至開發人員均可以完成。mysql
然而,在實際生產中,除了單表操做,更多的是多個表聯合起來查詢,這樣的查詢一般是慢SQL的重災區,查詢速度慢,使用服務器資源較多,高CPU,高I/O。本文經過對錶鏈接的表現形式以及內部理論進行探究,以及思考如何優化錶鏈接操做。git
本文基於MySQL 5.7版本進行探究,因爲MySQL 8中引入了新的鏈接方式hash join,本文可能不適用MySQL8版本github
(一)MySQL的七種鏈接方式介紹算法
在MySQL中,常見的錶鏈接方式有4類,共計7種方式:sql
經過圖形來表現,各類鏈接形式的求取集合部分以下,藍色部分表明知足join條件的數據:數據庫
接下來,咱們經過例子來理解各類JOIN的含義。編程
首先建立測試數據:緩存
-- 1.建立部門表 -- 部門表記錄部門信息,公司共有4個部門:財務(FINANCE)、人力(HR)、銷售(SALES)、研發(RD)。
-- 不必定每一個部門都有人,例如,公司雖然有研發部,可是沒有在編人員 create table dept (deptno int,dname varchar(14),loc varchar(20)); insert into dept values(10,'FINANCE','BEIJING'); insert into dept values(20,'HR','BEIJING'); insert into dept values(30,'SALES','SHANGHAI'); insert into dept values(40,'RD','CHENGDU'); -- 2.建立員工表
-- 員工表記錄了員工工號、姓名、部門編號。
-- 不必定每一個員工都有部門。例如,外包人員dd就沒有部門
create table emp (empno int,ename varchar(14),deptno int); insert into emp values(1,'aa',10); insert into emp values(2,'bb',20); insert into emp values(3,'cc',30); insert into emp values(4,'dd',null); insert into emp values(5,'ee',30); insert into emp values(6,'ff',20);
ER圖以下:服務器
(1.1)INNER JOIN
業務場景:查看公司正式員工的詳細信息,包括工號、姓名、部門名稱。
需求分析:正式員工都有對應部門,使用INNER JOIN,經過部門編號關聯部門與員工求交集。
SQL語句:
mysql> select e.empno,e.ename,d.dname from emp e inner join dept d on e.deptno = d.deptno; +-------+-------+---------+ | empno | ename | dname | +-------+-------+---------+ | 1 | aa | FINANCE | | 2 | bb | HR | | 3 | cc | SALES | | 5 | ee | SALES | | 6 | ff | HR | +-------+-------+---------+
INNER JOIN就是求取2個表的共有數據(交集),咱們能夠這樣來理解表INNER JOIN過程:
對於上面SQL,其執行過程咱們可使用僞代碼來描述:
// 特別注意:2個for循環,哪一個表用來作外部循環,哪一個表用來作內部循環,是由執行計劃決定的,可用explain來查看,一般使用結果集較小的表來作驅動表,
// 本例子中,SQL中順序爲emp,dept,但在執行計劃中倒是dept,emp。所以內外表順序須要看MySQL的執行計劃
for (i=1;i<=d.counts;i++) { for (j=1;j<=e.counts;j++>) { if (d[i].key = e[j].key) { return d[i].dname,e[j].empno,e[j].ename; } } }
(1.2)LEFT JOIN
業務場景:查看每個部門的詳細信息,包括工號、姓名、部門名稱。
需求分析:既然包含每個部門,那麼可使用部門表進行LEFT JOIN,經過部門編號關聯部門與員工求交集。
SQL語句:
mysql> select d.dname,e.empno,e.ename from dept d left join emp e on e.deptno = d.deptno; +---------+-------+-------+ | dname | empno | ename | +---------+-------+-------+ | FINANCE | 1 | aa | | HR | 2 | bb | | SALES | 3 | cc | | SALES | 5 | ee | | HR | 6 | ff | | RD | NULL | NULL | +---------+-------+-------+
LEFT JOIN就是求取2個表的共有數據(交集)再加上左表剩下的數據,也就是左表的數據所有都要,左表的數據只要知足關聯條件的。
咱們能夠這樣來理解表LEFT JOIN過程:
對於上面SQL,其執行過程咱們可使用僞代碼來描述:
/*
關於外鏈接查詢算法描述(https://dev.mysql.com/doc/refman/5.7/en/nested-join-optimization.html):
一般,對於外部聯接操做中第一個內部表的任何嵌套循環,都會引入一個標誌,該標誌在循環以前關閉並在循環以後檢查。當針對外部表中的當前行找到表示內部操做數的表中的匹配項時,將打開該標誌。若是在循環週期結束時該標誌仍處於關閉狀態,則未找到外部表的當前行的匹配項。在這種狀況下,該行由NULL內部表的列的值補充 。結果行將傳遞到輸出的最終檢查項或下一個嵌套循環,但前提是該行知足全部嵌入式外部聯接的聯接條件。
*/
for (i=1;i<=d.counts;i++) { var is_success=false; // 確認d.[i]是否匹配到至少1行數據,默認未匹配到 for (j=1;j<=e.counts;j++>) { if (d[i].key = e[j].key) { return d[i].dname,e[j].empno,e[j].ename; is_success = true; } } if (is_success=false) // 若是左邊的表沒有匹配到數據,也會將左邊表返回,右邊表用null代替 { return d[i].key,null,null; } }
LEFT JOIN的補充:使用LEFT JOIN來獲取左表獨有的數據
業務場景:查看哪些部門沒有員工
需求分析:要查看沒有部門的員工,只須要先查出全部的部門與員工關係數據,而後過濾掉有員工的數據。
SQL語句:
mysql> select d.dname,e.empno,e.ename from dept d left join emp e on d.deptno = e.deptno where e.deptno is null; +-------+-------+-------+ | dname | empno | ename | +-------+-------+-------+ | RD | NULL | NULL | +-------+-------+-------+
使用LEFT JOIN獲取2個表的共有數據(交集)再加上左表剩下的數據,而後又把交集去除。
(1.3)RIGHT JOIN
業務場景:查看每個員工的詳細信息,包括工號、姓名、部門名稱。
需求分析:既然包含每個員工,那麼可使用部門表進行LEFT JOIN,經過部門編號關聯部門與員工求交集。
SQL語句:
mysql> select d.dname,e.empno,e.ename from dept d right join emp e on e.deptno = d.deptno; +---------+-------+-------+ | dname | empno | ename | +---------+-------+-------+ | FINANCE | 1 | aa | | HR | 2 | bb | | HR | 6 | ff | | SALES | 3 | cc | | SALES | 5 | ee | | NULL | 4 | dd | +---------+-------+-------+
須要注意的是,右鏈接和左鏈接是能夠相互轉換的,即右鏈接的語句,經過調換表位置並修改鏈接關鍵字爲左鏈接,便可實現等價轉換。上面的SQL的等價左鏈接爲:
mysql> select d.dname,e.empno,e.ename from emp e left join dept d on e.deptno = d.deptno; +---------+-------+-------+ | dname | empno | ename | +---------+-------+-------+ | FINANCE | 1 | aa | | HR | 2 | bb | | HR | 6 | ff | | SALES | 3 | cc | | SALES | 5 | ee | | NULL | 4 | dd | +---------+-------+-------+
實際上,MySQL在解析SQL階段,會自動將右外鏈接轉換等效的左外鏈接(文檔:https://dev.mysql.com/doc/refman/5.7/en/outer-join-simplification.html),因此咱們也無需深刻的去了解右鏈接。
(1.4)FULL JOIN
業務場景:查看全部部門及其全部員工的詳細信息,包括工號、姓名、部門名稱。
需求分析:既然包含每個部門及全部員工,那麼可使用全鏈接獲取數據。然而,MySQL並無關鍵字去獲取全鏈接的數據,咱們能夠經過合併左鏈接
SQL語句:
mysql> select d.dname,e.empno,e.ename from dept d left join emp e on e.deptno = d.deptno union select d.dname,e.empno,e.ename from dept d right join emp e on e.deptno = d.deptno; +---------+-------+-------+ | dname | empno | ename | +---------+-------+-------+ | FINANCE | 1 | aa | | HR | 2 | bb | | SALES | 3 | cc | | SALES | 5 | ee | | HR | 6 | ff | | RD | NULL | NULL | | NULL | 4 | dd | +---------+-------+-------+
FULL JOIN的補充:
若是要查找沒有員工的部門或者沒有部門的員工,即求取兩個表各自獨有的數據
SQL語句:
mysql> select d.dname,e.empno,e.ename from dept d left join emp e on e.deptno = d.deptno where e.deptno is null union select d.dname,e.empno,e.ename from dept d right join emp e on e.deptno = d.deptno where d.deptno is null; +-------+-------+-------+ | dname | empno | ename | +-------+-------+-------+ | RD | NULL | NULL | | NULL | 4 | dd | +-------+-------+-------+
(二)MySQL Join算法
在MySQL 5.7中,MySQL僅支持Nested-Loop Join算法及其改進型Block-Nested-Loop Join算法,在8.0版本中,又新增了Hash Join算法,這裏只討論5.7版本的錶鏈接方式。
(2.1)Nested-Loop Join算法
嵌套循環鏈接算法(NLJ)從第一個循環的表中讀取1行數據,並將該行傳遞到下一個表進行鏈接運算,若是符合條件,則繼續與下一個表的行數據進行鏈接,知道鏈接完全部的表,而後重複上面的過程。簡單來說Nested-Loop Join就是編程中的多層for循環。假設存在3個表進行鏈接,鏈接方式以下:
table join type
------ -------------
t1 range
t2 ref
t3 ALL
若是使用NLJ算法進行鏈接,僞代碼以下:
for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } } }
(2.2)Block Nested-Loop Join算法
塊嵌套循環(BLN)鏈接算法使用外部表的行緩衝來減小對內部表的讀次數。例如,將外部表的10行數據讀入緩衝區並將緩衝區傳遞到下一個內部循環,則能夠將內部循環中的每一行與緩衝區的10行數據進行比較,此時,內部表讀取的次數將減小爲1/10。
若是使用BNL算法,上述鏈接的僞代碼能夠寫爲:
for each row in t1 matching range { for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } empty join buffer } } } if buffer is not empty { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } }
MySQL Join Buffer有以下特色:
(三)錶鏈接順序
在關係型數據庫中,對於多表鏈接,位於嵌套循環外部的表咱們稱爲驅動表,位於嵌套循環內部的表咱們稱爲被驅動表,驅動表與被驅動表的順序對於Join性能影響很是大,接下來咱們探索一下MySQL中表鏈接的順序。由於RIGHT JOIN和FULL JOIN在MySQL中最終都會轉換爲LEFT JOIN,因此咱們只需討論INNER JOIN和LEFT JOIN便可。
這裏爲了確保測試準確,咱們使用MySQL提供的測試數據庫employees,下載地址爲:https://github.com/datacharmer/test_db。其ER圖以下:
(3.1)INNER JOIN
對應INNER JOIN,MySQL永遠選擇結果集小的表做爲驅動表。
例子1:查看員工部門對應信息
-- 將employees,dept_manager , departments 3個表進行內鏈接便可 select e.emp_no,e.first_name,e.last_name,d.dept_name from employees e inner join dept_manager dm on e.emp_no = dm.emp_no inner join departments d on dm.dept_no = d.dept_no;
咱們來看一下3個表的大小,須要注意的是,這裏僅僅是MySQL粗略統計行數,在這個例子中,實際行數與之有必定的差距:
+--------------+------------+ | table_name | table_rows | +--------------+------------+ | departments | 9 | | dept_manager | 24 | | employees | 299468 | +--------------+------------+
最終的執行計劃爲:
+----+-------------+-------+------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+ | 1 | SIMPLE | d | NULL | index | PRIMARY | dept_name | 42 | NULL | 9 | 100.00 | Using index | | 1 | SIMPLE | dm | NULL | ref | PRIMARY,dept_no | dept_no | 4 | employees.d.dept_no | 2 | 100.00 | Using index | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.emp_no | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+
能夠看到,在INNER JOIN中,MySQL並非按照語句中表的出現順序來按順序執行的,而是首先評估每一個表結果集的大小,選擇小的做爲驅動表,大的做爲被驅動表,無論咱們如何調整SQL中的表順序,MySQL優化器選擇表的順序與上面相同。
這裏須要特別說明的是:一般咱們所說的"小表驅動大表"是很是不嚴謹的,在INNER JOIN中,MySQL永遠選擇結果集小的表做爲驅動表,而不是小表。這有什麼區別呢?結果集是指表進行了數據過濾後造成的臨時表,其數據量小於或等於原表。下面說起的"小表和大表"都是指結果集大小。
例子2:查看工號爲110567的員工部門對應信息
select e.emp_no,e.first_name,e.last_name,d.dept_name from employees e inner join dept_manager dm on e.emp_no = dm.emp_no and e.emp_no = 110567 inner join departments d on dm.dept_no = d.dept_no;
最終的執行計劃爲:
+----+-------------+-------+------------+--------+-----------------+---------+---------+----------------------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-----------------+---------+---------+----------------------+------+----------+-------------+ | 1 | SIMPLE | e | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL | | 1 | SIMPLE | dm | NULL | ref | PRIMARY,dept_no | PRIMARY | 4 | const | 1 | 100.00 | Using index | | 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+-----------------+---------+---------+----------------------+------+----------+-------------+
能夠看到,這裏驅動表是employees,這個表是數據量最大的表,可是爲何選擇它做爲驅動表呢?由於他的結果集最小,在執行查詢時,MySQL會首先選擇employees表中emp_no=110567的數據,而這樣的數據只有1條,其結果集也就最小,因此優化器選擇了employees做爲驅動表。
(3.2)LEFT JOIN
對於LEFT JOIN,執行順序永遠是從左往右,咱們能夠經過例子來看一下。
例子2:LEFT JOIN表順序的選擇測試
-- 表順序:e --> dm --> d mysql> explain select e.emp_no,e.first_name,e.last_name,d.dept_name from employees e left join dept_manager dm on e.emp_no = dm.emp_no left join departments d on dm.dept_no = d.dept_no; +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+ | 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 100.00 | NULL | | 1 | SIMPLE | dm | NULL | ref | PRIMARY | PRIMARY | 4 | employees.e.emp_no | 1 | 100.00 | Using index | | 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+ -- 表順序:dm --> e --> d mysql> explain select e.emp_no,e.first_name,e.last_name,d.dept_name from dept_manager dm left join employees e on e.emp_no = dm.emp_no left join departments d on dm.dept_no = d.dept_no; +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+------+----------+-------------+ | 1 | SIMPLE | dm | NULL | index | NULL | dept_no | 4 | NULL | 24 | 100.00 | Using index | | 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.emp_no | 1 | 100.00 | NULL | | 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+------+----------+-------------+ -- 表順序:e --> dm --> d mysql> explain select e.emp_no,e.first_name,e.last_name,d.dept_name from employees e left join dept_manager dm on e.emp_no = dm.emp_no left join departments d on dm.dept_no = d.dept_no; +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+ | 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 100.00 | NULL | | 1 | SIMPLE | dm | NULL | ref | PRIMARY | PRIMARY | 4 | employees.e.emp_no | 1 | 100.00 | Using index | | 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
若是右表存在謂詞過濾條件,MySQL會將left join轉換爲inner join,詳見本文:(5.3)left join優化
(四)ON和WHERE的思考
在錶鏈接中,咱們能夠在2個地方寫過濾條件,一個是在ON後面,另外一個就是WHERE後面了。那麼,這兩個地方寫謂詞過濾條件有什麼區別呢?咱們仍是經過INNER JOIN和LEFT JOIN分別看一下。
(4.1)INNER JOIN
使用INNER JOIN,無論謂詞條件寫在ON部分仍是WHERE部分,其結果都是相同的。
-- 將過濾條件寫在ON部分 mysql> select e.empno,e.ename,d.dname from emp e inner join dept d on e.deptno = d.deptno and d.dname = 'HR'; +-------+-------+-------+ | empno | ename | dname | +-------+-------+-------+ | 2 | bb | HR | | 6 | ff | HR | +-------+-------+-------+ -- 將過濾條件寫在WHERE部分 mysql> select e.empno,e.ename,d.dname from emp e inner join dept d on e.deptno = d.deptno where d.dname = 'HR'; +-------+-------+-------+ | empno | ename | dname | +-------+-------+-------+ | 2 | bb | HR | | 6 | ff | HR | +-------+-------+-------+ -- 使用非標準寫法,將錶鏈接條件和過濾條件寫在WHERE部分 mysql> select e.empno,e.ename,d.dname from emp e inner join dept d where e.deptno = d.deptno and d.dname = 'HR'; +-------+-------+-------+ | empno | ename | dname | +-------+-------+-------+ | 2 | bb | HR | | 6 | ff | HR | +-------+-------+-------+
實際上,經過trace報告能夠看到,在inner join中,無論謂詞條件寫在ON部分仍是WHERE部分,MySQL都會將SQL語句的謂詞條件等價改寫到where後面。
(4.2)LEFT JOIN
咱們繼續來看LEFT JOIN中ON與WHERE的區別。
使用ON做爲謂詞過濾條件:
mysql> select e.empno,e.ename,d.dname from emp e left join dept d on e.deptno = d.deptno and d.dname = 'HR'; +-------+-------+-------+ | empno | ename | dname | +-------+-------+-------+ | 1 | aa | NULL | | 2 | bb | HR | | 3 | cc | NULL | | 4 | dd | NULL | | 5 | ee | NULL | | 6 | ff | HR | +-------+-------+-------+
咱們能夠把使用ON的狀況用下圖來描述,先使用ON條件進行關聯,並在關聯的時候進行數據過濾:
再看看使用where的結果:
mysql> select e.empno,e.ename,d.dname from emp e left join dept d on e.deptno = d.deptno where d.dname = 'HR'; +-------+-------+-------+ | empno | ename | dname | +-------+-------+-------+ | 2 | bb | HR | | 6 | ff | HR | +-------+-------+-------+
咱們能夠把使用where的狀況用下圖來描述,先使用ON條件進行關聯,而後對關聯的結果進行數據過濾:
能夠看到,在LEFT JOIN中,過濾條件放在ON和WHERE以後結果是不一樣的:
(五)JOIN優化
JOIN語句相對而言比較複雜,咱們根據SQL語句的結構考慮優化方法,JOIN相關的主要SQL結構以下:
(5.1)inner join優化
常規inner join的SQL語法以下:
SELECT <select_list> FROM <left_table> inner join <right_table> ON <join_condition> WHERE <where_condition>
優化方法:
1.對於inner join,一般是採用小表驅動大表的方式,即小標做爲驅動表,大表做爲被驅動表(至關於小表位於for循環的外層,大表位於for循環的內層)。這個過程MySQL數據局優化器以幫助咱們完成,一般無需手動處理(特殊狀況,表的統計信息不許確)。注意,這裏的「小表」指的是結果集小的表。
2.對於inner join,須要對被驅動表的鏈接條件建立索引
3.對於inner join,考慮對鏈接條件和過濾條件(ON、WHERE)建立複合索引
例子1:對於inner join,須要對被驅動表的鏈接條件建立索引
-- ---------- 構造測試表 -------------------------- -- 建立新表employees_new mysql> create table employees_new like employees; Query OK, 0 rows affected (0.01 sec) mysql> insert into empployees_new select * from employees; Query OK, 300024 rows affected (2.69 sec) Records: 300024 Duplicates: 0 Warnings: 0 -- 建立新表salaries_new mysql> create table salaries_new like salaries; Query OK, 0 rows affected (0.01 sec) mysql> insert into salaries_new select * from salaries; Query OK, 2844047 rows affected (13.00 sec) Records: 2844047 Duplicates: 0 Warnings: 0 -- 刪除主鍵 mysql> alter table employees_new drop primary key; Query OK, 300024 rows affected (1.84 sec) Records: 300024 Duplicates: 0 Warnings: 0 mysql> alter table salaries_new drop primary key; Query OK, 2844047 rows affected (9.58 sec) Records: 2844047 Duplicates: 0 Warnings: 0 -- 表大小 mysql> select table_name,table_rows from information_schema.tables a where a.table_schema = 'employees' and a.table_name in ('employees_new','salaries_new'); +---------------+------------+ | table_name | table_rows | +---------------+------------+ | employees_new | 299389 | | salaries_new | 2837194 | +---------------+------------+
此時測試表ER關係以下:
進行錶鏈接查詢,語句以下:
select e.emp_no,e.first_name,e.last_name,s.salary,s.from_date,s.to_date from employees_new e inner join salaries_new s on e.emp_no = s.emp_no ;
結果爲:
-- 1. 被驅動表沒有索引,執行時間:大於800s,(800s未執行完) -- 執行計劃: +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------+ | 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299389 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ALL | NULL | NULL | NULL | NULL | 2837194 | 10.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------+ -- 2. 在被驅動錶鏈接條件上建立索引,執行時間: 37s -- 建立索引語句 create index idx_empno on salaries_new(emp_no); -- 執行計劃: +----+-------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+-------+ | 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299389 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+-------+ -- 3. 更進一步,在驅動錶鏈接條件上也建立索引,執行時間: 40s -- 建立索引語句 create index idx_employees_new_empno on employees_new(emp_no); -- 執行計劃: +----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------+ | 1 | SIMPLE | e | NULL | ALL | idx_employees_new_empno | NULL | NULL | NULL | 299389 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------+
經過以上測試可見,在被驅動表的鏈接條件上建立索引是很是有必要的,而在驅動錶鏈接條件上建立索引則不會顯著提升速度。
例子2:對於inner join,考慮對鏈接條件和過濾條件(ON、WHERE)建立複合索引
進行錶鏈接查詢,語句以下(如下2個SQL在MySQL優化器中解析爲相同SQL):
select e.emp_no,e.first_name,e.last_name,s.salary,s.from_date,s.to_date from employees_new e inner join salaries_new s on e.emp_no = s.emp_no and e.first_name = 'Georgi' -- 或者 select e.emp_no,e.first_name,e.last_name,s.salary,s.from_date,s.to_date from employees_new e inner join salaries_new s on e.emp_no = s.emp_no where e.first_name = 'Georgi'
結果爲:
-- 1. 未在鏈接條件和過濾條件上建立複合索引,執行時間: 0.162s -- 執行計劃: +----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------------+ | 1 | SIMPLE | e | NULL | ALL | idx_employees_new_empno | NULL | NULL | NULL | 299389 | 10.00 | Using where | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------------+ -- 2.在鏈接條件和過濾條件上建立複合索引,執行時間: 0.058s -- 建立索引語句 create index idx_employees_first_name_emp_no on employees_new(first_name,emp_no); create index idx_employees_emp_no_first_name on employees_new(emp_no,first_name); -- 執行計劃: +----+-------------+-------+------------+------+-----------------------------------------------------------------------------------------+---------------------------------+---------+--------------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+-----------------------------------------------------------------------------------------+---------------------------------+---------+--------------------+------+----------+-------+ | 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_employees_first_name_emp_no,idx_employees_emp_no_first_name | idx_employees_first_name_emp_no | 16 | const | 253 | 100.00 | NULL | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+-----------------------------------------------------------------------------------------+---------------------------------+---------+--------------------+------+----------+-------+
經過以上測試可見,表的鏈接條件上和過濾條件上建立複合索引能夠提升查詢速度,從本例子看,速度沒有較大提升,由於對employees_new表全表掃描速度很快,可是在很是大的表中,複合索引可以有效提升速度。
(5.2)inner join + 排序(group by 或者 order by)優化
常規inner join+排序的SQL語法以下:
SELECT <select_list> FROM <left_table> inner join <right_table> ON <join_condition> WHERE <where_condition>
GROUP BY <group_by_list>
ORDER BY <order_by_list>
優化方法:
1.與inner join同樣,在被驅動表的鏈接條件上建立索引
2.inner join + 排序每每會在執行計劃裏面伴隨着Using temporary Using filesort關鍵字出現,若是臨時表或者排序的數據量很大,那麼將會致使查詢很是慢,須要特別重視;反之,臨時表或者排序的數據量較小,例如只有幾百條,那麼即便執行計劃有Using temporary Using filesort關鍵字,對查詢速度影響也不大。若是說排序操做消耗了大部分的時間,那麼能夠考慮使用索引的有序性來消除排序,接下來對該優化方法進行討論。
group by和order by都會對相關列進行排序,根據SQL是否存在GROUP BY或者ORDER BY關鍵字,分3種狀況討論:
SQL語句存在 group by |
SQL語句存在 order by |
優化操做考慮的排序列 | 解釋 | |
狀況1 | 是 | 否 | 只需考慮group by相關列排序問題便可 | 若是SQL語句中只含有group by,則只需考慮group by後面的列排序問題便可 |
狀況2 | 否 | 是 | 只需考慮order by相關列排序問題便可 | 若是SQL語句中只含有order by,則只需考慮order by後面的列排序問題便可 |
狀況3 | 是 | 是 | 只需考慮group by相關列排序問題便可 | 若是SQL語句中同時含有group by和order by,只需考慮group by後面的排序便可。 由於MySQL先執行group by,後執行order by,一般group by以後數據量已經較少了, 後續的order by直接在磁盤上排序便可 |
對於上面3種狀況:
1.若是優化考慮的排序列所有來源於驅動表,則能夠考慮:在等值謂詞過濾條件上+排序列上建立複合索引,這樣可使用索引先過濾數據,再使用索引按順序獲取數據。
2.若是優化考慮的排序列所有來源於某個被驅動表,則能夠考慮:使用錶鏈接hint(Straight_JOIN)控制鏈接順序,將排序相關表設置爲驅動表,而後按照1建立複合索引;
3.若是優化考慮的排序列來源於多個表,貌似沒有好的解決辦法,有想法的同窗也能夠留言,一塊兒進步。
例子1:若是優化考慮的排序列所有來源於驅動表,則能夠考慮:在等值謂詞過濾條件上+排序列上建立複合索引,這樣可使用索引先過濾數據,再使用索引按順序獲取數據。
-- 1.驅動表e上存在排序 mysql> explain select e.first_name,sum(salary) from employees_new e inner join salaries_new s on e.emp_no = s.emp_no where e.last_name = 'Aamodt' group by e.first_name; +----+-------------+-------+------------+------+------------------------------------------------------+------------------------------+---------+--------------------+------+----------+-----------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+------------------------------------------------------+------------------------------+---------+--------------------+------+----------+-----------------------------------------------------------+ | 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_lastname_empno_firstname | idx_lastname_empno_firstname | 18 | const | 205 | 100.00 | Using where; Using index; Using temporary; Using filesort | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+------------------------------------------------------+------------------------------+---------+--------------------+------+----------+-----------------------------------------------------------+ -- 2.在驅動表e上的等值謂詞過濾條件last_name和排序列first_name上建立索引 mysql> create index idx_lastname_firstname on employees_new (last_name,first_name); -- 3.能夠看到,排序消除 mysql> explain select e.first_name,sum(salary) from employees_new e inner join salaries_new s on e.emp_no = s.emp_no where e.last_name = 'Aamodt' group by e.first_name; +----+-------------+-------+------------+------+----------------------------------------------------------------------------------+------------------------+---------+--------------------+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+----------------------------------------------------------------------------------+------------------------+---------+--------------------+------+----------+-----------------------+ | 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_employees_new_empno_firstname,idx_lastname_firstname | idx_lastname_firstname | 18 | const | 205 | 100.00 | Using index condition | | 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL | +----+-------------+-------+------------+------+----------------------------------------------------------------------------------+------------------------+---------+--------------------+------+----------+-----------------------+
須要說明的是,消除排序只是提供了一種數據優化的方式,消除排序後,其速度並不必定會比以前快,須要具體問題具體分析測試。
例子2:若是優化考慮的排序列所有來源於某個被驅動表,則能夠考慮:使用錶鏈接hint(Straight_JOIN)控制鏈接順序,將排序相關表設置爲驅動表,而後按照1建立複合索引;
-- 1. 被驅動表s上存在排序 mysql> explain select s.from_date,sum(salary) from employees_new e inner join salaries_new s on e.emp_no = s.emp_no where e.last_name = 'Aamodt' and s.salary = 40000 group by s.from_date; +----+-------------+-------+------------+------+------------------...-------+------------------------+---------+--------------------+------+----------+---------------------------------+ | id | select_type | table | partitions | type | possible_keys ... | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+------------------...-------+------------------------+---------+--------------------+------+----------+---------------------------------+ | 1 | SIMPLE | e | NULL | ref | idx_employees_new...stname | idx_lastname_firstname | 18 | const | 205 | 100.00 | Using temporary; Using filesort | | 1 | SIMPLE | s | NULL | ref | idx_empno ... | idx_empno | 4 | employees.e.emp_no | 9 | 10.00 | Using where | +----+-------------+-------+------------+------+------------------...-------+------------------------+---------+--------------------+------+----------+---------------------------------+ -- 2. 使用Straight_join改變表的鏈接順序 mysql> explain select s.from_date,sum(salary) from salaries_new s STRAIGHT_JOIN employees_new e on e.emp_no = s.emp_no where e.last_name = 'Aamodt' and s.salary = 40000 group by s.from_date; +----+-------------+-------+------------+------+-----------------...----------+-------------------------+---------+--------------------+---------+----------+----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys ... | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+-----------------...----------+-------------------------+---------+--------------------+---------+----------+----------------------------------------------+ | 1 | SIMPLE | s | NULL | ALL | idx_empno ... | NULL | NULL | NULL | 2837194 | 10.00 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | e | NULL | ref | idx_employees_ne...firstname | idx_employees_new_empno | 4 | employees.s.emp_no | 1 | 5.00 | Using where | +----+-------------+-------+------------+------+-----------------...----------+-------------------------+---------+--------------------+---------+----------+----------------------------------------------+ -- 3. 在新的驅動表上建立等值謂詞+排序列索引 mysql> create index idx_salary_fromdate on salaries_new(salary,from_date); Query OK, 0 rows affected (5.39 sec) Records: 0 Duplicates: 0 Warnings: 0 -- 4. 能夠看到,消除排序 mysql> explain select s.from_date,sum(salary) from salaries_new s STRAIGHT_JOIN employees_new e on e.emp_no = s.emp_no where e.last_name = 'Aamodt' and s.salary = 40000 group by s.from_date; +----+-------------+-------+------------+------+---------------------------------...--+-------------------------+---------+--------------------+--------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys ... | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------------------------...--+-------------------------+---------+--------------------+--------+----------+-----------------------+ | 1 | SIMPLE | s | NULL | ref | idx_empno,idx_salary_fromdate ... | idx_salary_fromdate | 4 | const | 199618 | 100.00 | Using index condition | | 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_empl...e | idx_employees_new_empno | 4 | employees.s.emp_no | 1 | 5.00 | Using where | +----+-------------+-------+------------+------+---------------------------------...--+-------------------------+---------+--------------------+--------+----------+-----------------------+
須要說明的是,大部分狀況下,MySQL優化器會自動選擇最優的錶鏈接方式,Straight_join的引入每每會形成大表作驅動表的狀況出現,雖然消除了排序,可是又引入了新的麻煩。究竟是排序帶來的開銷大,仍是NLJ循環嵌套不合理帶來的開銷大,須要具體狀況具體分析。
(5.3)left join優化
在MySQL中外鏈接(left join、right join 、full join)會被優化器轉換爲left join,所以,外鏈接只需討論left join便可。常規left join的SQL語法以下:
SELECT <select_list> FROM <left_table> left join <right_table> ON <join_condition> WHERE <where_condition> GROUP BY <group_by_list> ORDER BY <order_by_list>
優化方法:
1.與inner join同樣,在被驅動表的鏈接條件上建立索引
2.left join的錶鏈接順序都是從左像右的,咱們沒法改變錶鏈接順序。可是若是右表在where條件中存在謂詞過濾,則MySQL會將left join自動轉換爲inner join,其原理圖以下:
例子1:.若是右表在where條件中存在謂詞過濾,則MySQL會將left join自動轉換爲inner join
建立測試表:
create table dept ( deptno int, dname varchar(20) ); insert into dept values (10, 'sales'),(20, 'hr'),(30, 'product'),(40, 'develop'); create table emp ( empno int, ename varchar(20), deptno varchar(20) ); insert into emp values (1,'aa',10),(2,'bb',10),(3,'cc',20),(4,'dd',30),(5,'ee',30);
執行left join,查看其執行計劃,發現並非左表做爲驅動表
mysql> explain select d.dname,e.ename from dept d left join emp e on d.deptno = e.deptno where e.deptno = 30; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ | 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 20.00 | Using where | | 1 | SIMPLE | d | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
經過trace追蹤,發現MySQL對其該語句進行了等價改寫,將外鏈接改成了內鏈接。
mysql> set optimizer_trace="enabled=on",end_markers_in_JSON=on; Query OK, 0 rows affected (0.00 sec) mysql> select d.dname,e.ename from dept d left join emp e on d.deptno = e.deptno where e.deptno = 30; +---------+-------+ | dname | ename | +---------+-------+ | product | dd | | product | ee | +---------+-------+ 2 rows in set (0.03 sec) mysql> select * from information_schema.optimizer_trace; | MISSING_BYTES_BEYOND_MAX_MEM_SIZE | INSUFFICIENT_PRIVILEGES | select d.dname,e.ename from dept d left join emp e on d.deptno = e.deptno where e.deptno = 30 | { "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `d`.`dname` AS `dname`,`e`.`ename` AS `ename` from (`dept` `d` left join `emp` `e` on((`d`.`deptno` = `e`.`deptno`))) where (`e`.`deptno` = 30)" }, { "transformations_to_nested_joins": { "transformations": [ "outer_join_to_inner_join", "JOIN_condition_to_WHERE", "parenthesis_removal" ] /* transformations */, "expanded_query": "/* select#1 */ select `d`.`dname` AS `dname`,`e`.`ename` AS `ename` from `dept` `d` join `emp` `e` where ((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))" } /* transformations_to_nested_joins */ } ] /* steps */ } /* join_preparation */ }, { "join_optimization": { "select#": 1, "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))", "steps": [ { "transformation": "equality_propagation", "resulting_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))" }, { "transformation": "constant_propagation", "resulting_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))" }, { "transformation": "trivial_condition_removal", "resulting_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))" } ] /* steps */ } /* condition_processing */ }, { "substitute_generated_columns": { } /* substitute_generated_columns */ }, { "table_dependencies": [ { "table": "`dept` `d`", "row_may_be_null": false, "map_bit": 0, "depends_on_map_bits": [ ] /* depends_on_map_bits */ }, { "table": "`emp` `e`", "row_may_be_null": true, "map_bit": 1, "depends_on_map_bits": [ ] /* depends_on_map_bits */ } ] /* table_dependencies */ }, { "ref_optimizer_key_uses": [ ] /* ref_optimizer_key_uses */ }, { "rows_estimation": [ { "table": "`dept` `d`", "table_scan": { "rows": 4, "cost": 1 } /* table_scan */ }, { "table": "`emp` `e`", "table_scan": { "rows": 5, "cost": 1 } /* table_scan */ } ] /* rows_estimation */ }, { "considered_execution_plans": [ { "plan_prefix": [ ] /* plan_prefix */, "table": "`dept` `d`", "best_access_path": { "considered_access_paths": [ { "rows_to_scan": 4, "access_type": "scan", "resulting_rows": 4, "cost": 1.8, "chosen": true } ] /* considered_access_paths */ } /* best_access_path */, "condition_filtering_pct": 100, "rows_for_plan": 4, "cost_for_plan": 1.8, "rest_of_plan": [ { "plan_prefix": [ "`dept` `d`" ] /* plan_prefix */, "table": "`emp` `e`", "best_access_path": { "considered_access_paths": [ { "rows_to_scan": 5, "access_type": "scan", "using_join_cache": true, "buffers_needed": 1, "resulting_rows": 1, "cost": 2.6007, "chosen": true } ] /* considered_access_paths */ } /* best_access_path */, "condition_filtering_pct": 100, "rows_for_plan": 4, "cost_for_plan": 4.4007, "chosen": true } ] /* rest_of_plan */ }, { "plan_prefix": [ ] /* plan_prefix */, "table": "`emp` `e`", "best_access_path": { "considered_access_paths": [ { "rows_to_scan": 5, "access_type": "scan", "resulting_rows": 1, "cost": 2, "chosen": true } ] /* considered_access_paths */ } /* best_access_path */, "condition_filtering_pct": 100, "rows_for_plan": 1, "cost_for_plan": 2, "rest_of_plan": [ { "plan_prefix": [ "`emp` `e`" ] /* plan_prefix */, "table": "`dept` `d`", "best_access_path": { "considered_access_paths": [ { "rows_to_scan": 4, "access_type": "scan", "using_join_cache": true, "buffers_needed": 1, "resulting_rows": 4, "cost": 1.8002, "chosen": true } ] /* considered_access_paths */ } /* best_access_path */, "condition_filtering_pct": 100, "rows_for_plan": 4, "cost_for_plan": 3.8002, "chosen": true } ] /* rest_of_plan */ } ] /* considered_execution_plans */ }, { "attaching_conditions_to_tables": { "original_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))", "attached_conditions_computation": [ ] /* attached_conditions_computation */, "attached_conditions_summary": [ { "table": "`emp` `e`", "attached": "(`e`.`deptno` = 30)" }, { "table": "`dept` `d`", "attached": "(`d`.`deptno` = `e`.`deptno`)" } ] /* attached_conditions_summary */ } /* attaching_conditions_to_tables */ }, { "refine_plan": [ { "table": "`emp` `e`" }, { "table": "`dept` `d`" } ] /* refine_plan */ } ] /* steps */ } /* join_optimization */ }, { "join_execution": { "select#": 1, "steps": [ ] /* steps */ } /* join_execution */ } ] /* steps */ } | 0 | 0 | +---------------------------------------------------------------------------- mysql>
【完】
參考:
1.嵌套循環鏈接算法:https://dev.mysql.com/doc/refman/5.7/en/nested-loop-joins.html
2.外部鏈接優化:https://dev.mysql.com/doc/refman/5.7/en/outer-join-optimization.html
Note:MySQL菜鳥一枚,文章僅表明我的觀點,若有不對,敬請指出,共同進步,謝謝。