MySQL從8.0.13版本開始支持一種新的range scan方式,稱爲Loose Skip Scan。該特性由Facebook貢獻。咱們知道在以前的版本中,若是要使用到索引進行掃描,條件必須知足索引前綴列,好比索引idx(col1,col2), 若是where條件只包含col2的話,是沒法有效的使用idx的, 它須要掃描索引上全部的行,而後再根據col2上的條件過濾。php
新的優化能夠避免全量索引掃描,而是根據每一個col1上的值+col2上的條件,啓動屢次range scan。每次range scan根據構建的key值直接在索引上定位,直接忽略了那些不知足條件的記錄。html
下例是從官方文檔上摘取的例子:mysql
root@test 11:03:28>CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL, PRIMARY KEY(f1, f2)); Query OK, 0 rows affected (0.00 sec) root@test 11:03:29>INSERT INTO t1 VALUES -> (1,1), (1,2), (1,3), (1,4), (1,5), -> (2,1), (2,2), (2,3), (2,4), (2,5); Query OK, 10 rows affected (0.00 sec) Records: 10 Duplicates: 0 Warnings: 0 root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 5 FROM t1; Query OK, 10 rows affected (0.00 sec) Records: 10 Duplicates: 0 Warnings: 0 root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 10 FROM t1; Query OK, 20 rows affected (0.00 sec) Records: 20 Duplicates: 0 Warnings: 0 root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 20 FROM t1; Query OK, 40 rows affected (0.00 sec) Records: 40 Duplicates: 0 Warnings: 0 root@test 11:03:29>INSERT INTO t1 SELECT f1, f2 + 40 FROM t1; Query OK, 80 rows affected (0.00 sec) Records: 80 Duplicates: 0 Warnings: 0 root@test 11:03:29>ANALYZE TABLE t1; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.t1 | analyze | status | OK | +---------+---------+----------+----------+ 1 row in set (0.00 sec) root@test 11:03:29>EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40; +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------+ | 1 | SIMPLE | t1 | NULL | range | PRIMARY | PRIMARY | 8 | NULL | 53 | 100.00 | Using where; Using index for skip scan | +----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------+ 1 row in set, 1 warning (0.00 sec)
也能夠從optimizer trace裏看到如何選擇的skip scan:git
"skip_scan_range": { "potential_skip_scan_indexes": [ { "index": "PRIMARY", "tree_travel_cost": 0.4, "num_groups": 3, "rows": 53, "cost": 10.625 } ] }, "best_skip_scan_summary": { "type": "skip_scan", "index": "PRIMARY", "key_parts_used_for_access": [ "f1", "f2" ], "range": [ "40 < f2" ], "chosen": true },
咱們從innodb的角度來看看這個SQL是如何執行的,咱們知道每一個index scan都會走到ha_innobase::index_read來構建search tuple,上述查詢的執行步驟:github
筆者在代碼注入了日誌,打印search_tuple(dtuple_print()
)sql
STEP 1: no search_tuple STEP 2: DATA TUPLE: 2 fields; 0: len 4; hex 80000001; asc ;; 1: len 4; hex 80000028; asc (;; STEP 3: DATA TUPLE: 1 fields; 0: len 4; hex 80000001; asc ;; STEP 4: DATA TUPLE: 2 fields; 0: len 4; hex 80000002; asc ;; 1: len 4; hex 80000028; asc (;; STEP 5: DATA TUPLE: 1 fields; 0: len 4; hex 80000002; asc ;;
從上述描述能夠看到使用skip-scan的方式避免了全索引掃描,從而提高了性能,尤爲是在索引前綴列區分度比較低的時候ide
skip scan能夠經過Hint或者optimizer_switch來控制(skip_scan
),默認是打開的。根據worklog的描述,對於以下query:函數
SELECT A_1,...,A_k, B_1,...,B_m, C FROM T WHERE EQ(A_1,...,A_k) AND RNG(C);
須要知足以下條件才能使用skip scan:性能
A) Table T has at least one compound index I of the form: I = <A_1,...,A_k, B_1,..., B_m, C ,[D_1,...,D_n]> Key parts A and D may be empty, but B and C must be non-empty. B) Only one table referenced. C) Cannot have group by/select distinct D) Query must reference fields in the index only. E) The predicates on A_1...A_k must be equality predicates and they need to be constants. This includes the 'IN' operator. F) The query must be a conjunctive query. In other words, it is a AND of ORs: (COND1(kp1) OR COND2(kp1)) AND (COND1(kp2) OR ...) AND ... G) There must be a range condition on C. H) Conditions on D columns are allowed. Conditions on D must be in conjunction with range condition on C.
ref: get_best_skip_scan()
優化
當skip scan擁有更低的cost時,會被選擇,計算cost的函數是cost_skip_scan()
,因爲索引統計信息中已經基於不一樣的前綴列值估算了distinct value的個數(rec_per_key
), 能夠基於此去預估可能須要讀的行數。 更具體的能夠參考wl#11322中的描述,筆者對此不甚瞭解,故不作筆墨
ref: cost_skip_scan()
官方文檔:Skip Scan Range Access Method
WL#11322: SUPPORT LOOSE INDEX RANGE SCANS FOR LOW CARDINALITY
Bug#88103
相關代碼
本文做者:zhaiwx_yinfeng
本文爲雲棲社區原創內容,未經容許不得轉載。