數據量對where in語句的索引影響

咱們常常在論壇和麪試中遇到這個問題,mysql中,where in會不會用到索引?mysql

爲了完全搞明白這個問題,作了一些測試,發現記錄數大小對是否命中索引有影響,咱們來看一看。面試

使用的mysql版本是5.7,數據庫引擎爲默認的innoDB,索引類型是默認的B+樹索引,用explain執行計劃確認是否命中索引。sql

咱們建立一個表數據庫

create table staffs(
    id int primary key auto_increment,
    name varchar(24) not null default '' comment '姓名',
    age int not null default 0 comment '年齡',
    pos varchar(20) not null default '' comment '職位',
    add_time timestamp not null default current_timestamp comment '入職時間'
)charset utf8 comment '員工記錄表';

1, 咱們測試第一種狀況,數據量少的狀況

先插入三條數據數組

insert into staffs(name,age,pos,add_time) values('z3',22,'manager',now());
insert into staffs(name,age,pos,add_time) values('July',23,'dev',now());
insert into staffs(name,age,pos,add_time) values('2000',23,'dev',now());

1.1 對單列索引的影響,以name爲例

alter table staffs add index idx_staffs_name(name);
mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys   | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_name | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

能夠看到,沒有命中索引,行數爲3,server層對存儲引擎返回的數據作過濾以後剩餘66.67%,也就是說,存儲引擎返回了3條記錄,mysql的server層過濾掉1條,剩下2條,filtered的值爲66.67%. (explain詳見以前的博文: http://www.javashuo.com/article/p-nawevcyl-ds.htmlbash

1.2 對聯合索引的影響

準備索引測試

alter table staffs drop index idx_staffs_name;
alter table staffs add index idx_staffs_nameAgePos(name, age, pos);

1.2.1 對聯合索引最左字段的影響

mysql> explain select * from staffs where name = 'z3';
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref   | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from staffs where name in ('z3', '2000');
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.04 sec)

能夠看到,用 = 查詢時,因爲最左原則,用到了索引,而用in查詢時,沒有用到索引。優化

1.2.2 對聯合索引中間字段的影響

mysql> explain select * from staffs where name = 'z3' and age = 22;
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys         | key                   | key_len | ref         | rows | filtered | Extra |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
|  1 | SIMPLE      | staffs | NULL       | ref  | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+------+-----------------------+-----------------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from staffs where name = 'z3' and age in (22, 23);
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys         | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | staffs | NULL       | ALL  | idx_staffs_nameAgePos | NULL | NULL    | NULL |    3 |    66.67 | Using where |
+----+-------------+--------+------------+------+-----------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

一樣的,當使用 = 查詢時,依次使用了聯合索引,而第二個字段用 in 查詢時,連第一個字段都被拖累,沒有使用索引。spa

 

2,數據量大的狀況

爲了快速插入大量數據並建立索引,咱們先把原來的那張表drop掉,再建一張同樣的表,不帶任何索引,這樣就不會耗費更新索引的時間。這邊用存儲過程插入。.net

DELIMITER $$
    CREATE PROCEDURE test_insert()
    BEGIN
        declare i int;
        set i = 1 ;
        WHILE (i < 10000) DO
            INSERT INTO staffs(`name`,`age`,`pos`) VALUES(CONCAT('a', i), FLOOR(20 + RAND() * (100 - i + 1)),'dev');	 
            set i = i + 1;
        END WHILE;
        commit;
END$$
DELIMITER ;

CALL test_insert();
Query OK, 0 rows affected (8 min 7.84 sec)

9999條數據耗時8分多鐘,仍是有點慢的。

 

2.1 對單列索引的影響,以name爲例

按照以前的動做,創建索引(命令和上面同樣,爲了節約篇幅,這裏就不放出來了,下同),再查詢。

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys   | key             | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_name | idx_staffs_name | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引,2條記錄,準確率100%.

1.2 對聯合索引的影響

一樣先刪除單列索引,建立聯合索引。

1.2.1 對聯合索引最左字段的影響

mysql> explain select * from staffs where name in ('a1', 'a2000');
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 74      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

命中索引。

mysql> explain select * from staffs where name in ('a1', 'a2000') and age = 23;
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

in字段後面再加條件也能夠命中。

1.2.2 對聯合索引中間字段的影響

mysql> explain select * from staffs where name = 'a1' and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    2 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.01 sec)
mysql> explain select * from staffs where name in ('a1', 'a2000') and age in (22, 23);
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys         | key                   | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | staffs | NULL       | range | idx_staffs_nameAgePos | idx_staffs_nameAgePos | 78      | NULL |    4 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+-----------------------+-----------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

對中間字段也沒有影響,一樣能夠命中索引。

 

3, 總結

3.1 當數據量少時,會按照聯合索引的順序依次使用索引,反而不會使用單列索引,可能的緣由是,mysql認爲數據量過小,直接走全表查詢,全表掃描反而更快。

3.2 當數據量大時,單列索引必定會使用。聯合索引也會按順序依次使用。

3.3 固然這裏in條件裏面的數值長度不大,若是是一個很長數組,致使返回的結果佔全表記錄數量較大時,應該也不會使用索引而走全表查詢。

3.4 這裏尚未測試,當in條件裏面是一個子查詢時的狀況。同時,這裏沒有對5.7如下版本作測試。這裏引用一段這位博主的話

若是是 5.5 以前的版本確實不會走索引的,在 5.5 以後的版本,MySQL 作了優化。MySQL 在 2010 年發佈 5.5 版本中,優化器對 in 操做符能夠自動完成優化,針對創建了索引的列可使用索引,沒有索引的列仍是會走全表掃描。

好比,5.5 以前的版本(如下都是 5.5 之前的版本)。select * from a where id in (select id from b); 這條 sql 語句它的執行計劃其實並非先查詢出 b 表的全部 id,而後再與 a 表的 id 進行比較。mysql 會把 in 子查詢轉換成 exists 相關子查詢,因此它實際等同於這條 sql 語句:select * from a where exists(select * from b where b.id=a.id);

而 exists 相關子查詢的執行原理是:循環取出 a 表的每一條記錄與 b 表進行比較,比較的條件是 a.id=b.id。看 a 表的每條記錄的 id 是否在 b 表存在,若是存在就行返回 a 表的這條記錄。

相關文章
相關標籤/搜索