函數索引顧名思義就是加給字段加了函數的索引,這裏的函數也能夠是表達式。因此也叫表達式索引。mysql
MySQL 5.7 推出了虛擬列的功能,MySQL8.0的函數索引內部其實也是依據虛擬列來實現的。sql
咱們考慮如下幾種場景:json
1.對比日期部分的過濾條件。併發
SELECT ...FROM tb1WHERE date(time_field1) = current_date;
2.兩字段作計算。異步
SELECT ...FROM tb1WHERE field2 + field3 = 5;
3.求某個字段中間某子串。ide
SELECT ...FROM tb1WHERE substr(field4, 5, 9) = 'actionsky';
4.求某個字段末尾某子串。函數
SELECT ...FROM tb1WHERE RIGHT(field4, 9) = 'actionsky';
5.求JSON格式的VALUE。性能
SELECT ...FROM tb1WHERE CAST(field4 ->> '$.name' AS CHAR(30)) = 'actionsky';
以上五個場景若是不用函數索引,改寫起來難易不一樣。不過都要作相關修改,不是過濾條件修正就是表結構變動添加冗餘字段加額外索引。測試
好比第1個場景改寫爲,字體
SELECT ...FROM tb1WHERE time_field1 >= concat(current_date, ' 00:00:00') AND time_field1 <= concat(current_date, '23:59:59');
再好比第4個場景的改寫,
因爲是求最末尾的子串,只能添加一個新的冗餘字段,而且作相關的計劃任務來必定頻率的異步更新或者添加觸發器來實時更新此字段值。
SELECT ...FROM tb1WHERE field4_suffix = 'actionsky';
那咱們看到,改寫也能夠實現,不過這樣的SQL就沒有標準化而言,後期不能平滑的遷移了。
MySQL 8.0 推出來了函數索引讓這些變得相對容易許多。
不過函數索引也有本身的缺陷,就是寫法很固定,必需要嚴格按照定義的函數來寫,否則優化器不知所措。
咱們來把上面那些場景實例化。
示例表結構,
總記錄數
mysql> SELECT COUNT(*)FROM t_func;+----------+| count(*) |+----------+| 16384 |+----------+1 row in set (0.01 sec)
咱們把上面幾個場景的索引全加上。
mysql > ALTER TABLE t_func ADD INDEX idx_log_time ( ( date( log_time ) ) ),ADD INDEX idx_u1 ( ( rank1 + rank2 ) ),ADD INDEX idx_suffix_str3 ( ( RIGHT ( str3, 9 ) ) ),ADD INDEX idx_substr_str1 ( ( substr( str1, 5, 9 ) ) ),ADD INDEX idx_str2 ( ( CAST( str2 ->> '$.name' AS CHAR ( 9 ) ) ) );QUERY OK,0 rows affected ( 1.13 sec ) Records : 0 Duplicates : 0 WARNINGS : 0
咱們再看下錶結構, 發現好幾個已經被轉換爲系統本身的寫法了。
MySQL 8.0 還有一個特性,就是能夠把系統隱藏的列顯示出來。
咱們用show extened 列出函數索引建立的虛擬列,
上面5個隨機字符串列名爲函數索引隱式建立的虛擬COLUMNS。
咱們先來看看場景2,兩個整形字段的相加,
mysql> SELECT COUNT(*)FROM t_funcWHERE rank1 + rank2 = 121;+----------+| count(*) |+----------+| 878 |+----------+1 row in set (0.00 sec)
看下執行計劃,用到了idx_u1函數索引,
mysql> explain SELECT COUNT(*)FROM t_funcWHERE rank1 + rank2 = 121\G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: t_func partitions: NULL type: refpossible_keys: idx_u1 key: idx_u1 key_len: 9 ref: const rows: 878 filtered: 100.00 Extra: NULL1 row in set, 1 warning (0.00 sec)
那若是咱們稍微改下這個SQL的執行計劃,發現此時不能用到函數索引,變爲全表掃描了,因此要嚴格按照函數索引的定義來寫SQL。
mysql> explain SELECT COUNT(*)FROM t_funcWHERE rank1 = 121 - rank2\G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: t_func partitions: NULL type: ALLpossible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 16089 filtered: 10.00 Extra: Using where1 row in set, 1 warning (0.00 sec)
再來看看場景1的的改寫和不改寫的性能簡單對比,
mysql> SELECT *FROM t_funcWHERE date(log_time) = '2019-04-18'LIMIT 1\G*************************** 1. row *************************** id: 2 rank1: 1 str1: test-actionsky-test str2: {"age": 30, "name": "dell"} rank2: 120 str3: test-actionskylog_time: 2019-04-18 10:04:531 row in set (0.01 sec)
咱們把普通的索引加上。
mysql > ALTER TABLE t_func ADD INDEX idx_log_time_normal ( log_time );QUERY OK,0 rows affected ( 0.36 sec ) Records : 0 Duplicates : 0 WARNINGS : 0
而後改寫下SQL看下。
mysql> SELECT *FROM t_funcWHERE date(log_time) >= '2019-04-18 00:00:00' AND log_time < '2019-04-19 00:00:00'*************************** 1. row *************************** id: 2 rank1: 1 str1: test-actionsky-test str2: {"age": 30, "name": "dell"} rank2: 120 str3: test-actionskylog_time: 2019-04-18 10:04:531 row in set (0.01 sec)
兩個看起來沒啥差異,咱們仔細看下兩個的執行計劃:
普通索引
mysql> explain format=json SELECT *FROM t_funcWHERE log_time >= '2019-04-18 00:00:00' AND log_time < '2019-04-19 00:00:00'LIMIT 1\G*************************** 1. row ***************************EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "630.71" }, "table": { "table_name": "t_func", "access_type": "range", "possible_keys": [ "idx_log_time_normal" ], "key": "idx_log_time_normal", "used_key_parts": [ "log_time" ], "key_length": "6", "rows_examined_per_scan": 1401, "rows_produced_per_join": 1401, "filtered": "100.00", "index_condition": "((`ytt`.`t_func`.`log_time` >= '2019-04-18 00:00:00') and (`ytt`.`t_func`.`log_time` < '2019-04-19 00:00:00'))", "cost_info": { "read_cost": "490.61", "eval_cost": "140.10", "prefix_cost": "630.71", "data_read_per_join": "437K" }, "used_columns": [ "id", "rank1", "str1", "str2", "rank2", "str3", "log_time", "cast(`log_time` as date)", "(`rank1` + `rank2`)", "right(`str3`,9)", "substr(`str1`,5,9)", "cast(json_unquote(json_extract(`str2`,_utf8mb4'$.name')) as char(9) charset utf8mb4)" ] } }}1 row in set, 1 warning (0.00 sec)
函數索引
mysql> explain format=json SELECT COUNT(*)
FROM t_func
WHERE date(log_time) = '2019-04-18'
LIMIT 1\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "308.85"
},
"table": {
"table_name": "t_func",
"access_type": "ref",
"possible_keys": [
"idx_log_time"
],
"key": "idx_log_time",
"used_key_parts": [
"cast(`log_time` as date)"
],
"key_length": "4",
"ref": [
"const"
],
"rows_examined_per_scan": 1401,
"rows_produced_per_join": 1401,
"filtered": "100.00",
"cost_info": {
"read_cost": "168.75",
"eval_cost": "140.10",
"prefix_cost": "308.85",
"data_read_per_join": "437K"
},
"used_columns": [
"log_time",
"cast(`log_time` as date)"
]
}
}
}
1 row in set, 1 warning (0.00 sec)
mysql>
從上面的執行計劃看起來區別不是很大, 惟一不一樣的是,普通索引在CPU的計算上消耗稍微大點,見紅色字體。
固然,有興趣的能夠大併發的測試下,我這僅僅做爲功能性進行一番演示。