咱們常常聽到一些人說"把WHERE條件裏的列都加上索引",其實這個建議很是錯誤。在多個列上創建單獨的索引大部分狀況下並不能提升MySQL的查詢性能。MySQL在5.0以後引入了一種叫「索引合併」(index merge)的策略,必定程度上可使用表上的多個單列索引來定位指定的行。可是當服務器對多個索引作聯合操做時,一般須要耗費大量CPU和內存資源在算法的緩存、排序和合並操做上,特別是當其中有些索引的選擇性不高,須要合併掃描大量的數據的時候。
這個時候,咱們須要一個多列索引。mysql
建立一個測試數據庫和數據表:算法
CREATE DATABASE IF NOT EXISTS db_test default charset utf8 COLLATE utf8_general_ci; use db_test; CREATE TABLE payment ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, staff_id INT UNSIGNED NOT NULL, customer_id INT UNSIGNED NOT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
利用存儲過程插入1000w行隨機數據(表引擎能夠先設置爲MyISAM,而後改成InnoDB):sql
DROP PROCEDURE IF EXISTS add_payment; DELIMITER // create PROCEDURE add_payment(in num INT) BEGIN DECLARE rowid INT DEFAULT 0; SET @exesql = 'INSERT INTO payment(staff_id, customer_id) values (?, ?)'; WHILE rowid < num DO SET @staff_id = (1 + FLOOR(5000*RAND()) ); SET @customer_id = (1 + FLOOR(500000*RAND())); SET rowid = rowid + 1; prepare stmt FROM @exesql; EXECUTE stmt USING @staff_id, @customer_id; END WHILE; END // DELIMITER ;
或者你能夠直接下載使用個人測試數據(也是利用上面的存儲過程,可是我以後調整了數據):
測試數據數據庫
添加兩個單列索引(執行過程要花點時間,建議分開一句一句執行):緩存
ALTER TABLE `payment` ADD INDEX idx_customer_id(`customer_id`); ALTER TABLE `payment` ADD INDEX idx_staff_id(`staff_id`);
查詢一條數據利用到兩個列的索引:服務器
select count(*) from payment where staff_id = 2205 AND customer_id = 93112;
查看執行計劃:性能
mysql> explain select count(*) from payment where staff_id = 2205 AND customer_id = 93112; +----+-------------+---------+-------------+------------------------------+------------------------------+---------+------+-------+-------------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------------+------------------------------+------------------------------+---------+------+-------+-------------------------------------------------------------------------+ | 1 | SIMPLE | payment | index_merge | idx_customer_id,idx_staff_id | idx_staff_id,idx_customer_id | 4,4 | NULL | 11711 | Using intersect(idx_staff_id,idx_customer_id); Using where; Using index | +----+-------------+---------+-------------+------------------------------+------------------------------+---------+------+-------+-------------------------------------------------------------------------+ 1 row in set (0.00 sec)
能夠看到type是index_merge,Extra中提示Using intersect(idx_staff_id,idx_customer_id);
這即是索引合併,利用兩個索引,而後合併兩個結果(取交集或者並集或者二者都有)
查詢結果:測試
mysql> select count(*) from payment where staff_id = 2205 AND customer_id = 93112 ; +----------+ | count(*) | +----------+ | 178770 | +----------+ 1 row in set (0.12 sec)
而後刪除以上索引,添加多列索引:code
ALTER TABLE payment DROP INDEX idx_customer_id; ALTER TABLE payment DROP INDEX idx_staff_id; ALTER TABLE `payment` ADD INDEX idx_customer_id_staff_id(`customer_id`, `staff_id`);
注意,多列索引很關注索引列的順序(由於customer_id的選擇性更大,因此把它放前面)
查詢:排序
mysql> select count(*) from payment where staff_id = 2205 AND customer_id = 93112; +----------+ | count(*) | +----------+ | 178770 | +----------+ 1 row in set (0.05 sec)
發現多列索引加快的查詢(這裏數據量仍是較小,更大的時候比較更明顯)
多列索引的列順序相當重要,如何選擇索引的列順序有一個經驗法則:將選擇性最高的列放到索引最前列(可是不是絕對的)。經驗法則考慮全局的基數和選擇性,而不是某個具體的查詢:
mysql> select count(DISTINCT staff_id) / count(*) AS staff_id_selectivity, count(DISTINCT customer_id) / count(*) AS customer_id_selectivity, count(*) from payment\G; *************************** 1. row *************************** staff_id_selectivity: 0.0005 customer_id_selectivity: 0.0500 count(*): 10000000 1 row in set (6.29 sec)
customer_id的選擇性更高,因此將它做爲索引列的第一位。
多列索引只能匹配最左前綴,也就是說:
select * from payment where staff_id = 2205 AND customer_id = 93112 ; select count(*) from payment where customer_id = 93112 ;
能夠利用索引,可是
select * from payment where staff_id = 2205 ;
不能利用索引。