MySQL-join的初步優化

時間 2019-11-06

標籤 mysql join 初步優化欄目 MySQL 简体版

原文原文鏈接

強調一下關於 MySQL5.7 group 的問題ψ(*｀ー´)ψphp

一、MySQL中MAX函數與Group By一塊兒使用的注意事項
條件：同一個user_role
業務：根據權限分組查詢user_id最大的數據html

MySQL 5.5 版本 mysql> select * from user_role; +----+---------+---------+ | id | user_id | role_id | +----+---------+---------+ | 3 | 3 | 3 | | 4 | 4 | 4 | | 5 | 0 | 0 | | 11 | 17 | 1 | | 13 | 19 | 3 | | 16 | 22 | 2 | | 18 | 24 | 2 | | 19 | 25 | 3 | | 22 | 31 | 1 | +----+---------+---------+ 9 rows in set (0.00 sec) 以下就是咱們平時的SQL寫法 mysql> select id,role_id,max(user_id) from user_role group by role_id; +----+---------+--------------+ | id | role_id | max(user_id) | +----+---------+--------------+ | 5 | 0 | 0 | | 11 | 1 | 31 | | 16 | 2 | 24 | | 3 | 3 | 25 | | 4 | 4 | 4 | +----+---------+--------------+ 5 rows in set (0.00 sec)

可是會很容易的發現實際上數據並不對好比咱們查詢id爲11的數據，user_id並非31mysql

mysql> select * from user_role where id = 11; +----+---------+---------+ | id | user_id | role_id | +----+---------+---------+ | 11 | 17 | 1 | +----+---------+---------+ 1 row in set (0.00 sec) 因此這裏須要注意一下！！！SQL改成以下方式： select id,role_id,user_id from ( select * from user_role order by user_id desc ) as a group by role_id +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 5 | 0 | 0 | | 22 | 1 | 31 | | 18 | 2 | 24 | | 19 | 3 | 25 | | 4 | 4 | 4 | +----+---------+---------+ 5 rows in set (0.00 sec) 測試id = 22 數據 mysql> select * from user_role where id = 22; +----+---------+---------+ | id | user_id | role_id | +----+---------+---------+ | 22 | 31 | 1 | +----+---------+---------+ 1 row in set (0.00 sec)

切換爲MySQL 5.7 測試 !!!∑(ﾟДﾟノ)ノ而後發現又是一個坑（*゜Д゜）σ凸←自爆按鈕c++

mysql> select id,role_id,user_id from ( -> select * from user_role order by user_id desc -> ) as a group by role_id; +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 5 | 0 | 0 | | 11 | 1 | 17 | | 16 | 2 | 22 | | 3 | 3 | 3 | | 4 | 4 | 4 | +----+---------+---------+ 5 rows in set (0.00 sec) 數據不對呀(〃＞皿＜) 實際上MySQL 5.7 對於SQL進行了改寫 mysql> explain select id,role_id,user_id from (select * from user_role order by user_id desc ) as a group by role_id; +----+-------------+-----------+------------+------+------+------+----------+---------------------------------+ | id | select_type | table | partitions | type | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+------+------+----------+---------------------------------+ | 1 | SIMPLE | user_role | NULL | ALL | NULL | 9 | 100.00 | Using temporary; Using filesort | +----+-------------+-----------+------------+------+------+------+----------+---------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show warnings \G; *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `mysql12`.`user_role`.`id` AS `id`,`mysql12`.`user_role`.`role_id` AS `role_id`,`mysql12`.`user_role`.`user_id` AS `user_id` from `mysql12`.`user_role` group by `mysql12`.`user_role`.`role_id` 1 row in set (0.00 sec)

官方的解釋：https://bugs.mysql.com/bug.php?id=80131算法

那麼可實行的方法：sql

一、 select u.id,u.role_id,u.user_id from user_role u, (select max(user_id) max_user_id,role_id from user_role group by role_id ) as a where a.max_user_id = u.user_id and a.role_id = u.role_id +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 4 | 4 | 4 | | 5 | 0 | 0 | | 18 | 2 | 24 | | 19 | 3 | 25 | | 22 | 1 | 31 | +----+---------+---------+ 5 rows in set (0.00 sec) 二、 select u.id,u.role_id,u.user_id from user_role u left join user_role a on u.role_id = a.role_id and u.user_id < a.user_id where a.user_id is null; 推介1把容易理解，相對來講效率高一些 關於MySQL5.7 group 與 order 問題 還有解決方法就是 mysql> select id,role_id,user_id from (select * from user_role order by user_id desc limit 0,100 ) as a group by role_id; +----+---------+---------+ | id | role_id | user_id | +----+---------+---------+ | 5 | 0 | 0 | | 22 | 1 | 31 | | 18 | 2 | 24 | | 19 | 3 | 25 | | 4 | 4 | 4 | +----+---------+---------+ 5 rows in set (0.00 sec) 必需要加上limit 才能夠

回到題目中，根據題目的意思實際上咱們須要分爲兩條SQL 經過 union all 和在一塊兒(這是簡單版本的)數據庫

explain select a.name, b.income FROM customers1 a, ( select city,gender,min(monthsalary * 12 + yearbonus) as income from customers1 ignore index(idx_gender_city_monthsalary) group by city,gender ) b where a.city = b.city and a.gender = b.gender and (a.monthsalary * 12 + a.yearbonus) = b.income select a.name, b.income FROM customers1 a, ( select city,gender,max(monthsalary * 12 + yearbonus) as income from customers1 ignore index(idx_gender_city_monthsalary) group by city,gender ) b where a.city = b.city and a.gender = b.gender and (a.monthsalary * 12 + a.yearbonus) = b.income 優化的方式很簡單能夠在以前建立以下索引 alter table customers1 add index idx_gender_city_name_monthsalary_yearbonus(gender, city, name, monthsalary, yearbonus); alter table customers1 add index idx_gender_city_monthsalary_yearbonus(gender, city, monthsalary, yearbonus);

固然有同窗可能就會疑惑(*•ω•)，就是這優化就是對於對應的SQL語句而後建立聯合索引嘛？緩存

若是從必定要把這條SQL優化來講是那麼回事，可是最主要的其實仍是咱們應該怎麼去創建這個索引，這是咱們所須要關注的；在整個優化的過程當中咱們實際上主要都是儘可能使用覆蓋索引去幫助咱們優化SQL;函數

實際上索引的創建最主要的幾個點從咱們的案例中詮釋的：...等等我仍是先再從新聲明一下關於MySQL對於索引的選擇 ლ(⁰⊖⁰ლ)最近有學員仍是不懂呀：mysql對於索引的選擇主要是看where條件上的字段，它並不會優先考慮你須要獲取的是那些字段列oop

好比：

show index from customers1; +------------+------------+---------------------------------------+--------------+-------------+-----------+-------------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Index_type | +------------+------------+---------------------------------------+--------------+-------------+-----------+-------------+------------+ | customers1 | 0 | PRIMARY | 1 | id | A | 577859 | BTREE | | customers1 | 1 | idx_monthsalary_yearbonus_birthdate | 1 | monthsalary | A | 450702 | BTREE | | customers1 | 1 | idx_monthsalary_yearbonus_birthdate | 2 | yearbonus | A | 555676 | BTREE | | customers1 | 1 | idx_monthsalary_yearbonus_birthdate | 3 | birthdate | A | 577859 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 1 | gender | A | 1 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 2 | city | A | 21 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 3 | monthsalary | A | 577859 | BTREE | | customers1 | 1 | idx_gender_city_monthsalary_yearbonus | 4 | yearbonus | A | 577859 | BTREE | +------------+------------+---------------------------------------+--------------+-------------+-----------+-------------+------------+ 8 rows in set (0.01 sec) explain select count(monthsalary) from customers1; +----+-------------+------------+-------+-------------------------------------+---------+--------+----------+-------------+ | id | select_type | table | type | key | key_len | rows | filtered | Extra | +----+-------------+------------+-------+-------------------------------------+---------+--------+----------+-------------+ | 1 | SIMPLE | customers1 | index | idx_monthsalary_yearbonus_birthdate | 13 | 577859 | 100.00 | Using index | +----+-------------+------------+-------+-------------------------------------+---------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) explain select count(monthsalary) from customers1 where photo = "xxx"; +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | customers1 | NULL | ALL | NULL | NULL | NULL | NULL | 577859 | 10.00 | Using where | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)

如上的兩條SQL，會發現第一條SQL使用到了idx_monthsalary_yearbonus_birthdate索引，可是第二個確沒有使用到這個索引僅僅只是加了where條件，由於對於MySQL來講索引的最大做用是用來涮選數據，count(monthsalary) 是須要獲取的數據，第二個沒有使用到是由於MySQL須要過濾一部分數據就須要根據where過濾，而idx_monthsalary_yearbonus_birthdate中並不包含這個字段因此不能使用

注意！！！關於使用idx_gender_city_monthsalary_yearbonus以後的狀況

explain select a.name, b.income FROM customers1 a, ( select city,gender,min(monthsalary * 12 + yearbonus) as income from customers1 group by gender,city ) b where a.gender = b.gender and a.city = b.city and (a.monthsalary * 12 + a.yearbonus) = b.income\G; *************************** 1. row *************************** id: 1 select_type: PRIMARY table: a partitions: NULL type: ALL possible_keys: idx_gender_city_monthsalary_yearbonus key: NULL key_len: NULL ref: NULL rows: 577859 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: <derived2> partitions: NULL type: ref possible_keys: <auto_key0> key: <auto_key0> key_len: 40 ref: mysql12.a.city,mysql12.a.gender,func rows: 10 filtered: 100.00 Extra: Using where; Using index *************************** 3. row *************************** id: 2 select_type: DERIVED table: customers1 partitions: NULL type: index possible_keys: idx_gender_city_monthsalary_yearbonus key: idx_gender_city_monthsalary_yearbonus key_len: 43 ref: NULL rows: 577859 filtered: 100.00 Extra: Using index 3 rows in set, 1 warning (0.04 sec)

咱們會發現MySQL僅僅只是使用了一次idx_gender_city_monthsalary_yearbonus，可是在外部鏈接中實際上where上是存在着索引中的這些字段，那應該用的到索引呀？∑(´△｀)？！

注意：這裏並非由於與最後的(a.monthsalary * 12 + a.yearbonus) = b.income 這部分的計算而是另外的緣由，咱們能夠經過show warnings \G;查看一下MySQL對於咱們執行的SQL語句進行分析

show warnings \G; *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `mysql12`.`a`.`name` AS `name`,`b`.`income` AS `income` from `mysql12`.`customers1` `a` join (/* select#2 */ select `mysql12`.`customers1`.`c ity` AS `city`,`mysql12`.`customers1`.`gender` AS `gender`,min(((`mysql12`.`customers1`.`monthsalary` * 12) + `mysql12`.`customers1`.`yearbonus`)) AS `income` from `mysql12 `.`customers1` group by `mysql12`.`customers1`.`gender`,`mysql12`.`customers1`.`city`) `b` where ((`b`.`city` = `mysql12`.`a`.`city`) and (`b`.`gender` = `mysql12`.`a`.`gen der`) and (((`mysql12`.`a`.`monthsalary` * 12) + `mysql12`.`a`.`yearbonus`) = `b`.`income`)) 1 row in set (0.01 sec) ERROR: No query specified 美化一下ヾ(=･ω･=)o SELECT `mysql12`.`a`.`name` AS `name`,`b`.`income` AS `income` FROM `mysql12`.`customers1` `a` JOIN ( SELECT `mysql12`.`customers1`.`city` AS `city`, `mysql12`.`customers1`.`gender` AS `gender`, min((`mysql12`.`customers1`.`monthsalary` * 12) + `mysql12`.`customers1`.`yearbonus`) AS `income` FROM `mysql12`.`customers1` GROUP BY `mysql12`.`customers1`.`gender`,`mysql12`.`customers1`.`city` ) `b` WHERE `b`.`city` = `mysql12`.`a`.`city` AND `b`.`gender` = `mysql12`.`a`.`gender` AND ((`mysql12`.`a`.`monthsalary` * 12) + `mysql12`.`a`.`yearbonus`) = `b`.`income`

從上能夠看出MySQL對於咱們的SQL重寫以後，把本來的條件順序置換了成了一個獨立列字段，在MySQL中對於獨立字段是不會使用索引的以下例子：

explain select * from customers1 where gender - 1 = 0; +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | customers1 | NULL | ALL | NULL | NULL | NULL | NULL | 577859 | 100.00 | Using where | +----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) explain select * from customers1 where gender = 0; +----+-------------+------------+------+---------------------------------------+---------------------------------------+---------+-------+--------+----------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | +----+-------------+------------+------------+------+---------------------------------------+---------------------------------------+---------+-------+--------+----------+- ------+ | 1 | SIMPLE | customers1 | ref | idx_gender_city_monthsalary_yearbonus | idx_gender_city_monthsalary_yearbonus | 1 | const | 288929 | 100.00 | +----+-------------+------------+------+---------------------------------------+---------------------------------------+---------+-------+--------+----------+ 1 row in set, 1 warning (0.00 sec)

因此就出現瞭如上的問題,對於子查詢的查詢，這是MySQL版本變革以後的處理

2. 5.5與5.7版本之間子查詢的區別

在MySQL的5.6/5.7的版本中對於子查詢作出了相應的優化處理，在MySQL5.5及以前的版本中對於子查詢僅僅只是一個功能而已，性能差在開發中儘量的避免。

由於在5.5以前對於子查詢的查詢方式是先查詢的外層的數據表，而後再去查詢內表也就是經過外部表驅動內表，而在5.6之後對於子查詢進行了優化MySQL內部的優化器把子查詢改寫成關聯查詢。

測試表 yd_admin_user 數據量 49條 desc yd_admin_user; +-----------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+--------------+------+-----+---------+----------------+ | user_id | smallint(5) | NO | PRI | NULL | auto_increment | | user_name | varchar(60) | NO | | | | | password | varchar(60) | NO | | | | | role_id | smallint(5) | YES | | 0 | | | real_name | varchar(30) | YES | | | | | mobile | char(20) | YES | | | | | email | varchar(100) | YES | | | | +-----------+--------------+------+-----+---------+----------------+ 7 rows in set (0.00 sec) role數據量13條 desc role; +----------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+--------------+------+-----+---------+----------------+ | id | smallint(5) | NO | PRI | NULL | auto_increment | | name | varchar(20) | YES | | | | | department_ids | varchar(200) | YES | | | | | action_list | text | YES | | NULL | | | do_list | text | YES | | NULL | | | add_time | int(10) | YES | | 0 | | +----------------+--------------+------+-----+---------+----------------+ 6 rows in set (0.00 sec) 不用salary以及customers是由於數據量太多了，導出導入太麻煩(ﾟДﾟ*)ﾉ

業務就對於兩個表進行聯查經過子查詢的方式找出有權限的數據

可能查詢方面不和理論(ー`´ー)，將就一下；可是仍是能夠說明問題的(・ω<) ﾃﾍﾍﾟﾛ

在MySQL5.5版本中

explain select * from yd_admin_user where user_id in (select id from role where id = 1); +----+--------------------+---------------+-------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------------+-------+---------------+---------+---------+-------+------+-------------+ | 1 | PRIMARY | yd_admin_user | ALL | NULL | NULL | NULL | NULL | 49 | Using where | | 2 | DEPENDENT SUBQUERY | role | const | PRIMARY,id | PRIMARY | 2 | const | 1 | Using index | +----+--------------------+---------------+-------+---------------+---------+---------+-------+------+-------------+ 2 rows in set (0.07 sec)

MySQL5.7版本

explain select * from yd_admin_user where user_id in (select id from role where id = 1); +----+-------------+---------------+-------+---------------+---------+---------+-------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------+-------+---------------+---------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | yd_admin_user | const | PRIMARY | PRIMARY | 2 | const | 1 | 100.00 | NULL | | 1 | SIMPLE | role | const | PRIMARY,id | PRIMARY | 2 | const | 1 | 100.00 | Using index | +----+-------------+---------------+-------+---------------+---------+---------+-------+------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) show warnings \G; *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select '1' AS `user_id`,'admin' AS `user_name`,'14e1b600b1fd579f47433b88e8d85291' AS `password`,'1' AS `role_id`,'9' AS `d_id`,'1515330851' AS `last_login`,'127.0.0.1' AS `last_ip`,'Admin' AS `real_name`,'0' AS `add_time`,'0' AS `is_disable`,'13632483323' AS `mobile`,'' AS `email`,'超級管理員' AS `job`,'0' AS `is_responsible` from `mysql12`.`role` join `mysql12`.`yd_admin_user` where 1 1 row in set (0.00 sec)

最主要的是你能夠看看，在查詢的時候MySQL經過explain分析之不一樣版本對於數據表的查詢狀況。

在官方網站中有如上內容對於join的解釋：咱們根據前面的兩條SQL也作一個解釋

5.5 版本：

table	type
yd_admin_user	All
role	const

for each row in yd_admin_user{ 循環yd_admin_user for each row in role matching const { 循環 role 根據 const 匹配 if row satisfies join conditions and where id 知足 join條件和where就返回 } } send to client

5.7 版本：

table	type
yd_admin_user	const
role	const

role_data 先循環role 獲取到where的結果 for each row in role matching const { role_data = if row satisfies where id, send to client } 再去根據結果經過join條件查詢yd_admin_user的數據 for each in role_data{ for each row in yd_admin_user matching const{ if row satisfies join conditions , send to client } }

3. 問題解答環境

過....

show warnings 這是MySQL5.7的小操做能夠查看explain執行以後優化器從新的SQL

4. join算法

SQL中對於join的實現主要是經過Nest額的 Loop join算法處理的，其餘數據庫多是使用hash join以及sort merge join。NLJ實際上就是經過驅動表的結構及做爲循環基礎數據，而後講該結果集中的數據做爲過濾條件一條條第到下一個表中查詢數據，最後合併結構。若是還有第三個表參與join，則把前面兩個表的join結果集做爲循環基礎數據，再一次經過循環查詢條件到第三個表中查詢數據，以此往下推

優化的思路：儘量減小 Join 語句中的 Nested Loop 的循環總次數；如何減小 Nested Loop 的循環總次數？最有效的辦法只有一個，那就是讓驅動表的結果集儘量的小，這也正是在本章第二節中的優化基本原則之一「永遠用小結果集驅動大的結果集」。

優先優化 Nested Loop 的內層循環；
保證 Join 語句中被驅動表上 Join 條件字段已經被索引；
當沒法保證被驅動表的 Join 條件字段被索引且內存資源充足的前提下，不要太吝惜 JoinBuffer 的設置；

1.1 Nested-Loop Join算法解釋

官網join算法 https://www.docs4dev.com/docs/zh/mysql/5.7/reference/nested-loop-joins.html

Simple Nested-Loop Join

以下圖，r爲驅動表，s爲匹配表，能夠看到從r中分別取出r一、r二、......、rn去匹配s表的左右列，而後再合併數據，對s表進行了rn次訪問，對數據庫開銷大

Index Nested-Loop Join（索引嵌套）：

這個要求非驅動表（匹配表s）上有索引，能夠經過索引來減小比較，加速查詢。在查詢時，驅動表（r）會根據關聯字段的索引進行查找，擋在索引上找到符合的值，再回表進行查詢，也就是隻有當匹配到索引之後纔會進行回表查詢。若是非驅動表（s）的關聯健是主鍵的話，性能會很是高，若是不是主鍵，要進行屢次回表查詢，先關聯索引，而後根據二級索引的主鍵ID進行回表操做，性能上比索引是主鍵要慢。

Block Nested-Loop Join：

若是有索引，會選取第二種方式進行join，但若是join列沒有索引，就會採用Block Nested-Loop Join。能夠看到中間有個join buffer緩衝區，是將驅動表的全部join相關的列都先緩存到join buffer中，而後批量與匹配表進行匹配，將第一種屢次比較合併爲一次，下降了非驅動表（s）的訪問頻率。默認狀況下join_buffer_size=256K，在查找的時候MySQL會將全部的須要的列緩存到join buffer當中，包括select的列，而不是僅僅只緩存關聯列。在一個有N個JOIN關聯的SQL當中會在執行時候分配N-1個join buffer。

join_buffer_size 官方解釋：https://www.docs4dev.com/docs/zh/mysql/5.7/reference/server-system-variables.html