索引實踐和調優(2)

時間 2019-11-12

標籤索引實踐简体版

原文原文鏈接

Ⅰ、索引的另外一個做用

B+ tree 是排序過的,對排序過的列進行查詢也會很是快mysql

(root@localhost) [dbt3]> explain select * from orders order by o_totalprice desc limit 10;
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra          |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+----------------+
|  1 | SIMPLE      | orders | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1489118 |   100.00 | Using filesort |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

看到沒走索引,依賴sort_buffer_size來排序sql

加一個索引json

(root@localhost) [dbt3]> alter table orders add index idx_o_totalprice(o_totalprice);
Query OK, 0 rows affected (6.81 sec)
Records: 0  Duplicates: 0  Warnings: 0

(root@localhost) [dbt3]> explain select * from orders order by o_totalprice desc limit 10;
+----+-------------+--------+------------+-------+---------------+------------------+---------+------+------+----------+-------+
| id | select_type | table  | partitions | type  | possible_keys | key              | key_len | ref  | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------------------+---------+------+------+----------+-------+
|  1 | SIMPLE      | orders | NULL       | index | NULL          | idx_o_totalprice | 9       | NULL |   10 |   100.00 | NULL  |
+----+-------------+--------+------------+-------+---------------+------------------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

會發現走了建立的索引,extra也是null了,根本就不用排序了,用不到排序內存了,也就是說不用調大sort_buffer_size了併發

綜上：索引的第二個做用,加速order by,從排序的數據中取數據,第一個做用,快速定位less

Ⅱ、何時建立索引

cardinality：簡單說就是count(distinct column)
the count of unique record:惟一記錄的數量(不重複數據數量)
high selectivity:高選擇性,性別這種就是低選擇性,姓名就是高選擇性
using B+ tree to access less record：從大量數據中找出一小部分數據

如何看索引是否選擇度高？
看information_schema庫中statistics表裏cardinality字段性能

(root@localhost) [information_schema]> select cardinality from statistics where table_schema='dbt3' and table_name = 'customer' limit 1;
+-------------+
| cardinality |
+-------------+
|      147674 |
+-------------+
1 row in set (0.00 sec)

(root@localhost) [information_schema]> select table_rows from tables where table_schema='dbt3' and table_name = 'customer' limit 1;
+------------+
| table_rows |
+------------+
|     147674 |
+------------+
1 row in set (0.00 sec)

發現記錄數和cardinality數目是一致的這是爲何呢,由於主鍵是惟一的 ,其餘索引的cardinality就沒這麼高了大數據

綜上：高選擇性即cardinality/table_rows越接近於1越好3d

一會兒抓出全部低效索引日誌

SELECT 
    t.TABLE_SCHEMA,
    t.TABLE_NAME,
    INDEX_NAME,
    CARDINALITY,
    TABLE_ROWS,
    CARDINALITY / TABLE_ROWS AS SELECTIVITY
FROM
    information_schema.tables t,
    (SELECT 
        table_schema, table_name, index_name, cardinality
    FROM
        information_schema.STATISTICS
    WHERE
        (table_schema , table_name, index_name, seq_in_index) IN (SELECT 
                table_schema, table_name, index_name, MAX(seq_in_index)
            FROM
                information_schema.STATISTICS
            GROUP BY table_schema , table_name , index_name)) s
WHERE
    t.table_schema = s.table_schema
        AND t.table_name = s.table_name
        AND t.table_rows != 0
        AND t.table_schema NOT IN ('mysql' , 'performance_schema',
        'information_schema',
        'sys')
HAVING SELECTIVITY < 0.1
ORDER BY SELECTIVITY DESC;

tips:：
①categories要不要建立索引呢？
只根據類別查詢的場景很少,什麼分類下的產品什麼的都會被放到cache中,不少時候是和另外一個字段一塊兒作一個複合索引code

②cardinality 和 table_rows 是經過採樣的方式預估的，不是精確的

③慢查詢日誌和計算每一行的大小也是用採樣

Ⅲ、複合索引

一般來講,複合索引是二級索引
原理：和以前的同樣,葉子節點存放鍵值和主鍵值,非葉節點存放的是key和pointer,這裏key是多個索引組成的一個key

a和a,b都已經排序,b不必定排序
使用場景：

select * from t where a = ?   
select * from t where a = ? and b = ?

對a a,b已經排序,因此上面兩個能用

select * from t where b = ?

這個是不能用到前面的複合索引,對b沒排序

用的最多的是下面這個,很重要

select * from t where a = ? order by b 完美的索引表示,找到a後,b已經排肯定好順序了,

不少人只對a列建立索引,這樣會用到filesort,並且還不會被記錄到slow.log,由於已經走了索引了,可能運行還蠻快的,由於你取出的數據比較少,再排序,limit

可是若是你這樣作了,排序會對cpu消耗很大,單條sql執行不難,可是互聯網某條sql很熱門,一千個用戶都來執行,那就完蛋了

如何查看排序最多的sql？
看sys庫中statements_with_sorting表的exec_count列

一個問題：
a b c 三個列的組合索引

a = ? and b = ? order by c
a = ? b = ? c = ?

這時候該怎麼建立複合索引

原則：選擇度高的放到前面,根據cardinality從高到低作索引

tips：
a b c d 四個列 d 是主鍵
a b 建立的複合索引,則其葉子節點中就會保存a b d ,a b d 是排序的 (key ,pk)

Ⅳ、索引覆蓋

index coverage——只要是不回表取得數據就叫索引覆蓋

一般來講覆蓋索引是一個複合索引,固然單個列也能夠覆蓋

索引覆蓋的標籤就是執行計劃的extra中顯示using index

explain select a from t where c between  '2018-06-12 00:00:00' and '2018-06-12 23:59:59';
+----+-------------+----------------+------------+-------+---------------+---------------+---------+------+-------+----------+-----------------------+
| id | select_type | table          | partitions | type  | possible_keys | key           | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+----------------+------------+-------+---------------+---------------+---------+------+-------+----------+-----------------------+
|  1 | SIMPLE      | t              | NULL       | range | idx_c         | idx_c         | 5     | NULL   | 33560 |   100.00 | Using index condition |
+----+-------------+----------------+------------+-------+---------------+---------------+---------+------+-------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

這樣走idx_c索引,idx_c索引覆蓋不到要查的a,因此須要根據idx_c查到pk,再根據pk回表去查a

加一個索引

(root@localhost) [j8]> alter table t add index idx_c_a(c,a);
Query OK, 0 rows affected (14.54 sec)
Records: 0  Duplicates: 0  Warnings: 0

(root@localhost) [j8]> explain select a from t where c between  '2018-06-12 00:00:00' and '2018-06-12 23:59:59';
+----+-------------+----------------+------------+-------+----------------------------------+--------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table          | partitions | type  | possible_keys                    | key                | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+----------------+------------+-------+----------------------------------+--------------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | t              | NULL       | range | idx_c,idx_c_a                    | idx_c_a            | 5       | NULL | 34650 |   100.00 | Using where; Using index |
+----+-------------+----------------+------------+-------+----------------------------------+--------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

新索引包含了c_ip,因此就using index不用回表了,兩個sql消耗的對比能夠用explain format=json看cost,這個例子直接看執行計劃確實不明顯,不過對比cost第二個狀況消耗要小不少,json就先不貼了,洲際哥絕逼不騙人的,很正兒八經

若是有了(a,b)複合索引,b再單首創建一個索引是經常使用的,那a列須要單首創建一個索引嗎？a=？是比a b =? 快一些,但從B+ tree高度上來講是差很少的,除非b列特別大,就算1k也沒什麼太大感受,多建立索引,對dml操做維護起來消耗更大

(a,b) (a) 同時存在,(a)叫冗餘索引 redundant index

如何查看冗餘索引？
sys庫中schema_redundant_indexes表,裏面有提示,甚至有建議刪除的語句在裏面

到目前爲止講的都是oltp的,互聯網,在線事務處理,操做很快,用戶併發很大

olap在線分析,實時性能要求不高,數據倉庫,大數據,這時候建立冗餘索引關係不大,更新操做比較少,查詢比較多,mysql不適用於這種