第21期：索引設計（函數索引）

時間 2021-04-20

標籤 mysql sql express json segmentfault session ide 函數優化 spa 欄目 MySQL 简体版

原文原文鏈接

本篇主要介紹 MySQL 的函數索引（也叫表達式索引）。mysql

一般來說，索引都是基於字段自己或者字段前綴（第 20 篇），而函數索引是基於字段自己加上函數、操做符、表達式等計算而來。若是將表達式或者操做符也看作函數的話，簡單來講，這樣的索引就能夠統稱函數索引。sql

MySQL 的函數索引內部是基於虛擬列（generated columns）實現，不一樣於直接定義虛擬列，函數索引自動建立的虛擬列自己實時計算結果，並不存儲數據，只把函數索引自己存在磁盤上。express

MySQL 8.0.13 以前不支持函數索引，因此老版本包括如今主流的 MySQL 5.7 也不支持函數索引，須要手工模擬建立或者改 SQL。json

本章基於如下幾點來說函數索引：segmentfault

1.函數索引的使用場景

函數索引最最經典的使用場景莫過於就是對日期的處理，特別是表中只定義了一個字段，後期對這個字段的查詢都是基於部分結果。好比「2100-02-02 08:09:09.123972」包含了日期「2100-02-02」，時間「08:09:09」，小數位時間「123972」，有可能會對這個值拆解後部分查詢。session

舉個簡單例子，表 t1 有兩個字段，一個主鍵，另一個時間字段，總記錄數不到 40W。ide

<localhost|mysql>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `log_time` datetime(6) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_log_time` (`log_time`)
) ENGINE=InnoDB AUTO_INCREMENT=524268 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)


<localhost|mysql>select count(*) from t1;
+----------+
| count(*) |
+----------+
|   393216 |
+----------+
1 row in set (0.07 sec)

執行下面這條 SQL 1，把日期單獨拿出來，執行了 0.09 秒。函數

# SQL 1
<localhost|mysql>select * from t1 where date(log_time) = '2100-02-02';
+--------+----------------------------+
| id     | log_time                   |
+--------+----------------------------+
| 524267 | 2100-02-02 08:09:09.123972 |
+--------+----------------------------+
1 row in set (0.09 sec)

看下它的執行計劃，雖然走了索引，可是掃描行數爲總記錄數，至關於全表掃，這時候比全表掃還不理想，全表掃直接走聚簇索引還快點。優化

<localhost|mysql>explain select * from t1 where date(log_time) = '2100-02-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index
possible_keys: NULL
          key: idx_log_time
      key_len: 9
          ref: NULL
         rows: 392413
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

這時最好的方法就是爲列 log_time 加一新索引，基於函數 date 的函數索引。spa

<localhost|mysql>alter table t1 add key idx_func_index_1((date(log_time)));
Query OK, 0 rows affected (2.76 sec)
Records: 0  Duplicates: 0  Warnings: 0

再次執行上面的 SQL 1，瞬間執行完畢。

<localhost|mysql>select * from t1 where date(log_time) = '2100-02-02';
+--------+----------------------------+
| id     | log_time                   |
+--------+----------------------------+
| 524267 | 2100-02-02 08:09:09.123972 |
+--------+----------------------------+
1 row in set (0.00 sec)

接下來查看執行計劃，結果顯示走函數索引 idx_func_index_1 掃描記錄數只有一行，執行計劃達到最優。

<localhost|mysql>explain select * from t1 where date(log_time) = '2100-02-02'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_func_index_1
          key: idx_func_index_1
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

若是想查看 MySQL 函數索引內部建立的列，直接 show create table 看是沒有結果的，好比下面只看到一個新的索引。

<localhost|mysql>show create table t1\G
...
  KEY `idx_func_index_1` ((cast(`log_time` as date)))
) ENGINE=InnoDB AUTO_INCREMENT=524268 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

經過 MySQL 8.0 的新語句 show extended columns 查看隱藏的列，下面結果發現確實是新加了一個虛擬列。

<localhost|mysql>show extended columns from t1;
...
| bbd3daff935e7a4d0991a8393ec03728 | date            | YES  | MUL | NULL    | VIRTUAL GENERATED |
...
5 rows in set (0.03 sec)

2.函數索引在處理 JSON 類型的注意事項

好比須要遍歷 JSON 類型的子串做爲索引，直接用遍歷操做符 ->> 報錯。

<localhost|mysql>create table t2 (id int primary key, r1 json);
Query OK, 0 rows affected (0.09 sec)

<localhost|mysql>alter table t2 add key idx_func_index_2((r1->>'$.x'));
ERROR 3757 (HY000): Cannot create a functional index on an expression that returns a BLOB or TEXT. Please consider using CAST.

操做符 ->> 表示從 JSON 串中遍歷指定路徑的 value，在 MySQL 內部轉換爲 json_unquote(jso_extract(...))，而函數 json_unquote 返回結果具備如下特性：

數據類型爲 longtext，在 MySQL 裏 longtext 類型只支持前綴索引，必須用函數 cast 來轉換類型。
json_unquote 調用結果的排序規則爲 utf8mb4_bin，cast 調用結果的排序規則和當前 session 的排序規則同樣，有可能不是 utf8mb4_bin，因此函數索引中要顯式定義排序規則。

因此針對 JSON 字段來創建新的函數索引：

<localhost|mysql>alter table t2 add key idx_func_index_2((cast(r1->>'$.x' as char(1)) collate utf8mb4_bin));
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

看下錶結構，操做符 ->> 被轉換爲 json_unquote(json_extract(...))，而且排序規則爲 utf8mb4_bin。

<localhost|mysql>show create table t2\G
*************************** 1. row ***************************
      Table: t2
...
 KEY `idx_func_index_2` (((cast(json_unquote(json_extract(`r1`,_utf8mb4'$.x')) as char(1) charset utf8mb4) collate utf8mb4_bin)))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

接下來插入幾條記錄，看看這個函數索引的使用。

<localhost|mysql>select * from t2;
+----+---------------------+
| id | r1                  |
+----+---------------------+
|  1 | {"x": "1", "y": 10} |
|  2 | {"x": "2", "y": 20} |
|  3 | {"x": "a", "y": 20} |
|  4 | {"x": "A", "y": 20} |
+----+---------------------+
4 rows in set (0.00 sec)

執行下 SQL 2，而且看下執行計劃，直接走了剛纔建立的函數索引。

# SQL 2
<localhost|mysql>select * from t2 where r1->>'$.x'='a';
+----+---------------------+
| id | r1                  |
+----+---------------------+
|  3 | {"x": "a", "y": 20} |
+----+---------------------+
1 row in set (0.00 sec)

<localhost|mysql>explain select * from t2 where r1->>'$.x'='a'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t2
  partitions: NULL
        type: ref
possible_keys: idx_func_index_2
         key: idx_func_index_2
     key_len: 7
         ref: const
        rows: 1
    filtered: 100.00
       Extra: NULL
1 row in set, 1 warning (0.00 sec)

這裏其實應該有個疑問，對函數索引的調用，必需要按照以前定義好的函數來執行，不然不會用到索引，那 SQL 2 怎麼能夠直接到用索引？
MySQL 在這塊兒其實內部已經轉換爲正確的語句。查看下剛纔 EXPLAIN 的 WARNINGS 信息。能夠看到 SQL 2 被 MySQL 轉換爲遵照函數索引規則的正確語句。

<localhost|mysql>show warnings\G
*************************** 1. row ***************************
 Level: Note
  Code: 1003
Message: /* select#1 */ select `ytt`.`t2`.`id` AS `id`,`ytt`.`t2`.`r1` AS `r1` from `ytt`.`t2` where ((cast(json_unquote(json_extract(`ytt`.`t2`.`r1`,_utf8mb4'$.x')) as char(1) charset utf8mb4) collate utf8mb4_bin) = 'a')
1 row in set (0.00 sec)

3.函數索引替代前綴索引？

以前講過前綴索引，可能會有這樣的疑問。前綴索引能不能被函數索引替代？固然是不行的！函數索引要求查詢條件嚴格按照函數索引的定義匹配，雖然有的場景下 MySQL 能夠內部轉換，可是 MySQL 沒法爲每一個函數都替換爲最優化的寫法。好比函數 substring,left,right 等。

下面例子用來模擬下是否能夠用函數索引替代前綴索引。示例表 t3，一個前綴索引和兩個函數索引實現的目的同樣，可是實際查詢的時候 SQL 語句並不同。

<localhost|mysql>show create table t3\G
*************************** 1. row ***************************
      Table: t3
Create Table: CREATE TABLE `t3` (
 `id` bigint unsigned NOT NULL AUTO_INCREMENT,
 `r1` char(36) DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 KEY `idx_r1_prefix` (`r1`(8)),
 KEY `idx_func_index_3` ((left(`r1`,8))),
 KEY `idx_func_index_4` ((substr(`r1`,1,8)))
) ENGINE=InnoDB AUTO_INCREMENT=249 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
如下 SQL 3 、SQL 四、SQL 5 寫法不同，查詢結果同樣，走的索引不同。
# SQL 3
select * from t3 where r1 like 'de45c7d9%';

# SQL 4
select * from t3 where left(r1,8) ='de45c7d9';

# SQL 5
select * from t3 where substring(r1,1,8) ='de45c7d9';

<localhost|mysql>select * from t3 where r1 like 'de45c7d9%';
+-----+--------------------------------------+
| id  | r1                                   |
+-----+--------------------------------------+
| 178 | de45c7d9-935c-11ea-8421-08002753f58d |
+-----+--------------------------------------+
1 row in set (0.00 sec)

<localhost|mysql>select * from t3 where left(r1,8) ='de45c7d9';
+-----+--------------------------------------+
| id  | r1                                   |
+-----+--------------------------------------+
| 178 | de45c7d9-935c-11ea-8421-08002753f58d |
+-----+--------------------------------------+
1 row in set (0.00 sec)

<localhost|mysql>select * from t3 where substring(r1,1,8) ='de45c7d9';
+-----+--------------------------------------+
| id  | r1                                   |
+-----+--------------------------------------+
| 178 | de45c7d9-935c-11ea-8421-08002753f58d |
+-----+--------------------------------------+
1 row in set (0.00 sec)

各自的查詢計劃，每條 SQL 走的不一樣的索引。

<localhost|mysql>explain select * from t3 where r1 like 'de45c7d9%'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: range
possible_keys: idx_r1_prefix
         key: idx_r1_prefix
     key_len: 33
         ref: NULL
        rows: 1
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

<localhost|mysql>explain select * from t3 where left(r1,8) ='de45c7d9'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: ref
possible_keys: idx_func_index_3
         key: idx_func_index_3
     key_len: 35
         ref: const
        rows: 1
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

<localhost|mysql>explain select * from t3 where substring(r1,1,8) ='de45c7d9'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: ref
possible_keys: idx_func_index_4
         key: idx_func_index_4
     key_len: 35
         ref: const
        rows: 1
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

此時刪除掉函數索引 idx_func_index_3, SQL 4 就沒法走正確的索引。

<localhost|mysql>alter table t3 drop key idx_func_index_3;
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

<localhost|mysql>explain select * from t3 where left(r1,8) ='de45c7d9'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t3
  partitions: NULL
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 128
    filtered: 100.00
       Extra: Using where
1 row in set, 1 warning (0.00 sec)

查看 warnings，發現 MySQL 優化器轉換後的 SQL，LEFT 函數仍是保持原樣，可是表裏沒有基於 LEFT 函數的索引，只能全表掃。

<localhost|mysql>show warnings\G
*************************** 1. row ***************************
 Level: Note
  Code: 1003
Message: /* select#1 */ select `ytt`.`t3`.`id` AS `id`,`ytt`.`t3`.`r1` AS `r1` from `ytt`.`t3` where (left(`ytt`.`t3`.`r1`,8) = 'de45c7d9')
1 row in set (0.00 sec)