乾貨 | 解讀MySQL 8.0新特性：Skip Scan Range

時間 2019-11-19

標籤乾貨解讀 mysql 8.0 特性 skip scan range 欄目 MySQL 简体版

原文原文鏈接

MySQL從8.0.13版本開始支持一種新的range scan方式，稱爲Loose Skip Scan。該特性由Facebook貢獻。咱們知道在以前的版本中，若是要使用到索引進行掃描，條件必須知足索引前綴列，好比索引idx(col1,col2), 若是where條件只包含col2的話，是沒法有效的使用idx的, 它須要掃描索引上全部的行，而後再根據col2上的條件過濾。ide

新的優化能夠避免全量索引掃描，而是根據每一個col1上的值+col2上的條件，啓動屢次range scan。每次range scan根據構建的key值直接在索引上定位，直接忽略了那些不知足條件的記錄。函數

示例

下例是從官方文檔上摘取的例子:性能

root@test 11:03:28>CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL, PRIMARY KEY(f1, f2));
Query OK, 0 rows affected (0.00 sec)

root@test 11:03:29>INSERT INTO t1 VALUES
    ->   (1,1), (1,2), (1,3), (1,4), (1,5),
    ->   (2,1), (2,2), (2,3), (2,4), (2,5);
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 5 FROM t1;
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 10 FROM t1;
Query OK, 20 rows affected (0.00 sec)
Records: 20  Duplicates: 0  Warnings: 0

root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 20 FROM t1;
Query OK, 40 rows affected (0.00 sec)
Records: 40  Duplicates: 0  Warnings: 0

root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 40 FROM t1;
Query OK, 80 rows affected (0.00 sec)
Records: 80  Duplicates: 0  Warnings: 0

root@test 11:03:29>ANALYZE TABLE t1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.00 sec)

root@test 11:03:29>EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                                  |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | range | PRIMARY       | PRIMARY | 8       | NULL |   53 |   100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------+
1 row in set, 1 warning (0.00 sec)

也能夠從optimizer trace裏看到如何選擇的skip scan:優化

"skip_scan_range": {
                    "potential_skip_scan_indexes": [
                      {
                        "index": "PRIMARY",
                        "tree_travel_cost": 0.4,
                        "num_groups": 3,
                        "rows": 53,
                        "cost": 10.625
                      }
                    ]
                  },
                  "best_skip_scan_summary": {
                    "type": "skip_scan",
                    "index": "PRIMARY",
                    "key_parts_used_for_access": [
                      "f1",
                      "f2"
                    ],
                    "range": [
                      "40 < f2"
                    ],
                    "chosen": true
                  },

咱們從innodb的角度來看看這個SQL是如何執行的，咱們知道每一個index scan都會走到ha_innobase::index_read來構建search tuple，上述查詢的執行步驟：spa

第一次從Index left side開始scan
第二次使用key(1,40) 掃描index，直到第一個range結束
使用key(1), find_flag =HA_READ_AFTER_KEY, 找到下一個Key值2
使用key(2,40)，掃描Index，直到range結束
使用Key(2)，去找大於2的key值，上例中沒有，所以結束掃描

筆者在代碼注入了日誌，打印search_tuple(dtuple_print())日誌

STEP 1: no search_tuple

STEP 2:
DATA TUPLE: 2 fields;
 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000028; asc    (;;

STEP 3:
DATA TUPLE: 1 fields;
 0: len 4; hex 80000001; asc     ;;
 
STEP 4:
DATA TUPLE: 2 fields;
 0: len 4; hex 80000002; asc     ;;
 1: len 4; hex 80000028; asc    (;;
 
STEP 5:
DATA TUPLE: 1 fields;
 0: len 4; hex 80000002; asc     ;;

從上述描述能夠看到使用skip-scan的方式避免了全索引掃描，從而提高了性能，尤爲是在索引前綴列區分度比較低的時候code

條件

skip scan能夠經過Hint或者optimizer_switch來控制(skip_scan)，默認是打開的。根據worklog的描述，對於以下query:orm

SELECT A_1,...,A_k, B_1,...,B_m, C
      FROM T
    WHERE
      EQ(A_1,...,A_k)
      AND RNG(C);

須要知足以下條件才能使用skip scan:索引

A) Table T has at least one compound index I of the form:
   I = <A_1,...,A_k, B_1,..., B_m, C ,[D_1,...,D_n]>
   Key parts A and D may be empty, but B and C must be non-empty.
B) Only one table referenced.
C) Cannot have group by/select distinct
D) Query must reference fields in the index only.
E) The predicates on A_1...A_k must be equality predicates and they need
   to be constants. This includes the 'IN' operator.
F) The query must be a conjunctive query.
   In other words, it is a AND of ORs:
   (COND1(kp1) OR COND2(kp1)) AND (COND1(kp2) OR ...) AND ...
G) There must be a range condition on C.
H) Conditions on D columns are allowed. Conditions on D must be in
   conjunction with range condition on C.

ref: get_best_skip_scan()ip

當skip scan擁有更低的cost時，會被選擇，計算cost的函數是cost_skip_scan()，因爲索引統計信息中已經基於不一樣的前綴列值估算了distinct value的個數(rec_per_key), 能夠基於此去預估可能須要讀的行數。更具體的能夠參考wl#11322中的描述，筆者對此不甚瞭解，故不作筆墨
ref: cost_skip_scan()

原文連接本文爲雲棲社區原創內容，未經容許不得轉載。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。