首先咱們來講下in()這種方式的查詢 在《高性能MySQL》裏面說起用in這種方式能夠有效的替代必定的range查詢,提高查詢效率,由於在一條索引裏面,range字段後面的部分是不生效的。使用in這種方式其實MySQL優化器是轉化成了n*m種組合方式來進行查詢,最終將返回值合併,有點相似union可是更高效。 同時它存在這一些問題:老版本的MySQL在IN()組合條件過多的時候會發生不少問題。查詢優化可能須要花不少時間,並消耗大量內存。新版本MySQL在組合數超過必定的數量就不進行計劃評估了,這可能致使MySQL不能很好的利用索引。html
這裏的「必定數量」在MySQL5.6.5以及之後的版本中是由eq_range_index_dive_limit這個參數控制 。默認設置是10,一直到5.7之後的版本默認修改成200,固然能夠手動設置的。5.6手冊說明以下:mysql
The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To disable use of statistics and always use index dives, set eq_range_index_dive_limit to 0. To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. eq_range_index_dive_limit is available as of MySQL 5.6.5. Before 5.6.5, the optimizer uses index dives, which is equivalent to eq_range_index_dive_limit=0.git
換言之,github
1. eq_range_index_dive_limit = 0 只能使用index dive 2. 0 < eq_range_index_dive_limit <= N 使用index statistics 3. eq_range_index_dive_limit > N 只能使用index dive
在MySQL5.7版本中將默認值從10修改爲200目的是爲了儘量的保證範圍等值運算(IN())執行計劃儘可能精準,由於IN()list的數量不少時候都是超過10的。sql
在MySQL的官方手冊上有這麼一句話:session
the optimizer can estimate the row count for each range using dives into the index or index statistics.post
大意: 優化器預估每一個範圍段--如"a IN (10, 20, 30)" 視爲等值比較, 括3個範圍段實則簡化爲3個單值,分別是10,20,30--中包括的元組數,用範圍段來表示是由於MySQL的「range」掃描方式多數作的是範圍掃描,此處單值可視爲範圍段的特例;性能
估計方法有2種:優化
相比這2種方式ui
簡單說,選項 eq_range_index_dive_limit 的值設定了 IN列表中的條件個數上線,超過設定值時,會將執行計劃從 1 變成 2。
爲何要區分這2種方式呢?
SQL以下:
SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
索引以下:
+----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | pre_forum_post | 0 | PRIMARY | 1 | tid | A | NULL | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | PRIMARY | 2 | position | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | pid | 1 | pid | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | fid | 1 | fid | A | 1490 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 2 | invisible | A | 945236 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 3 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 2 | first | A | 1215304 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 1 | authorid | A | 1963184 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 2 | invisible | A | 1963184 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 3 | tid | A | 12760696 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | idx_dt | 1 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 2 | invisible | A | 945236 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 3 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 4 | pid | A | 25521392 | NULL | NULL | | BTREE | | | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
看下執行計劃:
root@localhost 16:08:27 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') -> ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | 1 | SIMPLE | pre_forum_post | range | PRIMARY,displayorder,first,mul_test,idx_1 | displayorder | 4 | NULL | 54 | Using index condition; Using filesort | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ 1 row in set (0.00 sec)
MySQL優化器認爲這是一個range查詢,那麼(tid,invisible,dateline)這條索引中,dateline字段確定用不上了,也就是說這個SQL最後的排序確定會生成一個臨時結果集,而後再結果集裏面完成排序,而不是直接在索引中直接完成排序動做,因而咱們嘗試增長了一條索引。
root@localhost 16:09:06 [ultrax]> alter table pre_forum_post add index idx_1 (tid,dateline); Query OK, 20374596 rows affected, 0 warning (600.23 sec) Records: 0 Duplicates: 0 Warnings: 0 root@localhost 16:20:22 [ultrax]> explain SELECT * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) root@localhost 16:22:06 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.40 sec) root@localhost 16:23:55 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.00 sec)
實驗證實效果是極好的,其實不難理解,上面咱們就說了in()在MySQL優化器裏面是以多種組合方式來檢索數據的,若是加了一個排序或者分組那勢必只能在臨時結果集上操做,也就是說索引裏面即便包含了排序或者分組的字段依然是沒用的。惟一不滿的是MySQL優化器的選擇依然不夠靠譜。 總結下:在MySQL查詢裏面使用in(),除了要注意in()list的數量以及eq_range_index_dive_limit的值之外(具體見下),還要注意若是SQL包含排序/分組/去重等等就須要注意索引的使用。
仍是上面的案例,爲何idx_1沒法直接使用?須要使用hint強制只用這個索引呢?這裏咱們首先看下eq_range_index_dive_limit的值。
root@localhost 22:38:05 [ultrax]> show variables like 'eq_range_index_dive_limit'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | eq_range_index_dive_limit | 2 | +---------------------------+-------+ 1 row in set (0.00 sec)
根據咱們上面說的這種狀況0 < eq_range_index_dive_limit <= N使用index statistics,那麼接下來咱們用OPTIMIZER_TRACE來一看究竟。
{ "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": false, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 54, "cost": 66.81, "chosen": true } // index dive爲false,最終chosen是true ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": false, "cause": "cost" }
咱們能夠看到displayorder索引的cost是66.81,而idx_1的cost是120646,而最終MySQL優化器選擇了displayorder這條索引。那麼若是咱們把eq_range_index_dive_limit設置>N是否是應該就會使用index dive計算方式,獲得更準確的執行計劃呢?
root@localhost 22:52:52 [ultrax]> set eq_range_index_dive_limit = 3; Query OK, 0 rows affected (0.00 sec) root@localhost 22:55:38 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | PRIMARY,displayorder,first,mul_test,idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+ 1 row in set (0.00 sec)
optimize_trace結果以下
{ "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 188193, "cost": 225834, "chosen": true } ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": true } ... "cost_for_plan": 144775, "rows_for_plan": 120646, "chosen": true
在備選索引選擇中兩條索引都被選擇,在最後的邏輯優化中選在了代價最小的索引也就是idx_1 以上就是在等值範圍查詢中eq_range_index_dive_limit的值怎麼影響MySQL優化器計算開銷,從而影響索引的選擇。另外咱們能夠經過profiling來看看優化器的統計耗時:
+----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000048 | | checking permissions | 0.000004 | | Opening tables | 0.000015 | | init | 0.000044 | | System lock | 0.000009 | | optimizing | 0.000014 | | statistics | 0.032089 | | preparing | 0.000022 | | Sorting result | 0.000003 | | executing | 0.000003 | | Sending data | 0.000101 | | end | 0.000004 | | query end | 0.000002 | | closing tables | 0.000009 | | freeing items | 0.000013 | | cleaning up | 0.000012 | +----------------------+----------+
+----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000045 | | checking permissions | 0.000003 | | Opening tables | 0.000014 | | init | 0.000040 | | System lock | 0.000008 | | optimizing | 0.000014 | | statistics | 0.000086 | | preparing | 0.000016 | | Sorting result | 0.000002 | | executing | 0.000002 | | Sending data | 0.000016 | | Creating sort index | 0.412123 | | end | 0.000012 | | query end | 0.000004 | | closing tables | 0.000013 | | freeing items | 0.000023 | | cleaning up | 0.000015 | +----------------------+----------+
能夠看到當eq_range_index_dive_limit加大使用index dive時,優化器統計耗時明顯比ndex statistics方式來的長,但最終它使用了做出了更合理的執行計劃。統計耗時0.032089s vs .000086s,可是SQL執行耗時倒是約0.03s vs 0.41s。
附:
如何使用optimize_trace set optimizer_trace='enabled=on'; select * from information_schema.optimizer_trace\G // 注:optimizer_trace建議只在session模式下開啓調試便可 如何使用profile set profiling=ON; 執行sql; show profiles; show profile for query 2; show profile block io,cpu for query 2; 另外還能夠看到memory,swaps,context switches,source 等信息