數據量對where in語句的索引影響

時間 2019-11-10

標籤數據語句索引影響简体版

原文原文鏈接

咱們常常在論壇和麪試中遇到這個問題，mysql中，where in會不會用到索引？mysql

爲了完全搞明白這個問題，作了一些測試，發現記錄數大小對是否命中索引有影響，咱們來看一看。面試

使用的mysql版本是5.7，數據庫引擎爲默認的innoDB，索引類型是默認的B+樹索引，用explain執行計劃確認是否命中索引。sql

咱們建立一個表數據庫

create table staffs(
    id int primary key auto_increment,
    name varchar(24) not null default '' comment '姓名',
    age int not null default 0 comment '年齡',
    pos varchar(20) not null default '' comment '職位',
    add_time timestamp not null default current_timestamp comment '入職時間'
)charset utf8 comment '員工記錄表';

1，咱們測試第一種狀況，數據量少的狀況

先插入三條數據數組

insert into staffs(name,age,pos,add_time) values('z3',22,'manager',now());
insert into staffs(name,age,pos,add_time) values('July',23,'dev',now());
insert into staffs(name,age,pos,add_time) values('2000',23,'dev',now());

1.1 對單列索引的影響，以name爲例

alter table staffs add index idx_staffs_name(name);

mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys   | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_name | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

能夠看到，沒有命中索引，行數爲3，server層對存儲引擎返回的數據作過濾以後剩餘66.67%，也就是說，存儲引擎返回了3條記錄，mysql的server層過濾掉1條，剩下2條，filtered的值爲66.67%. （explain詳見以前的博文: http://www.javashuo.com/article/p-nawevcyl-ds.html）bash

1.2 對聯合索引的影響

準備索引測試

alter table staffs drop index idx_staffs_name;
alter table staffs add index idx_staffs_nameAgePos(name, age, pos);

1.2.1 對聯合索引最左字段的影響

mysql> explain select * from staffs where name = 'z3';
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.04 sec)

能夠看到，用 = 查詢時，因爲最左原則，用到了索引，而用in查詢時，沒有用到索引。優化

1.2.2 對聯合索引中間字段的影響

mysql> explain select * from staffs where name = 'z3' and age = 22;
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref         | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from staffs where name = 'z3' and age in (22, 23);
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

一樣的，當使用 = 查詢時，依次使用了聯合索引，而第二個字段用 in 查詢時，連第一個字段都被拖累，沒有使用索引。spa

2，數據量大的狀況

爲了快速插入大量數據並建立索引，咱們先把原來的那張表drop掉，再建一張同樣的表，不帶任何索引，這樣就不會耗費更新索引的時間。這邊用存儲過程插入。.net

DELIMITER $$
    CREATE PROCEDURE test_insert()
    BEGIN
        declare i int;
        set i = 1 ;
        WHILE (i < 10000) DO
            INSERT INTO staffs(`name`,`age`,`pos`) VALUES(CONCAT('a', i), FLOOR(20 + RAND() * (100 - i + 1)),'dev');	 
            set i = i + 1;
        END WHILE;
        commit;
END$$
DELIMITER ;

CALL test_insert();

Query OK, 0 rows affected (8 min 7.84 sec)

9999條數據耗時8分多鐘，仍是有點慢的。

2.1 對單列索引的影響，以name爲例

按照以前的動做，創建索引（命令和上面同樣，爲了節約篇幅，這裏就不放出來了，下同），再查詢。

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys   | key             | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_name | idx_staffs_name | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引，2條記錄，準確率100%.

1.2 對聯合索引的影響

一樣先刪除單列索引，建立聯合索引。

1.2.1 對聯合索引最左字段的影響

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引。

mysql> explain select * from staffs where name in ('a1', 'a2000') and age = 23;
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

in字段後面再加條件也能夠命中。

1.2.2 對聯合索引中間字段的影響

mysql> explain select * from staffs where name = 'a1' and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.01 sec)

mysql> explain select * from staffs where name in ('a1', 'a2000') and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    4 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

對中間字段也沒有影響，一樣能夠命中索引。

3, 總結

3.1 當數據量少時，會按照聯合索引的順序依次使用索引，反而不會使用單列索引，可能的緣由是，mysql認爲數據量過小，直接走全表查詢，全表掃描反而更快。

3.2 當數據量大時，單列索引必定會使用。聯合索引也會按順序依次使用。

3.3 固然這裏in條件裏面的數值長度不大，若是是一個很長數組，致使返回的結果佔全表記錄數量較大時，應該也不會使用索引而走全表查詢。

3.4 這裏尚未測試，當in條件裏面是一個子查詢時的狀況。同時，這裏沒有對5.7如下版本作測試。這裏引用一段這位博主的話

若是是 5.5 以前的版本確實不會走索引的，在 5.5 以後的版本，MySQL 作了優化。MySQL 在 2010 年發佈 5.5 版本中，優化器對 in 操做符能夠自動完成優化，針對創建了索引的列可使用索引，沒有索引的列仍是會走全表掃描。

好比，5.5 以前的版本（如下都是 5.5 之前的版本）。select * from a where id in (select id from b); 這條 sql 語句它的執行計劃其實並非先查詢出 b 表的全部 id，而後再與 a 表的 id 進行比較。mysql 會把 in 子查詢轉換成 exists 相關子查詢，因此它實際等同於這條 sql 語句：select * from a where exists(select * from b where b.id=a.id);

而 exists 相關子查詢的執行原理是：循環取出 a 表的每一條記錄與 b 表進行比較，比較的條件是 a.id=b.id。看 a 表的每條記錄的 id 是否在 b 表存在，若是存在就行返回 a 表的這條記錄。