這裏講述 MySQL 哈希索引的實現方式以及使用場景。
哈希表在 MySQL 裏有以下應用:mysql
哈希索引顯式應用主要存在於內存表,也就是 Memory 引擎,或者是 MySQL 8.0 的 Temptable 引擎。本篇的內容上都是基於內存表,MySQL 內存表的大小由參數 max_heap_table_size 來控制,其中包含了表數據,索引數據等。sql
舉個例子,表 t1 有六行記錄,主鍵哈希索引。函數
# MySQL 內存表主鍵默認哈希索引 mysql> create table t1(id int , name varchar(64), gender char(1), status char(2),primary key (id)) engine memory; Query OK, 0 rows affected (0.02 sec) mysql> insert into t1 values(101,'張三','男','未婚'); Query OK, 1 row affected (0.00 sec) mysql> insert into t1 values(102,'小明','男','已婚'); Query OK, 1 row affected (0.01 sec) mysql> insert into t1 values(103,'李四','男','未婚'); Query OK, 1 row affected (0.01 sec) mysql> insert into t1 values(104,'李慶','女','已婚'); Query OK, 1 row affected (0.00 sec) mysql> insert into t1 values(105,'楊陽','女','未婚'); Query OK, 1 row affected (0.01 sec) mysql> insert into t1 values(106,'餘歡水','男','二婚'); Query OK, 1 row affected (0.01 sec)
我簡單畫了張主鍵哈希索引的分佈圖,見圖 1:性能
圖 1 展現了 MySQL 內存表的哈希主鍵分佈。MySQL 內存表容許非主鍵哈希索引存在。假設給列 name 加一個哈希索引,spa
mysql> alter table t1 add key idx_name(name) using hash; Query OK, 6 rows affected (0.04 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> insert into t1 values(107,'楊陽','男','二婚'); Query OK, 1 row affected (0.00 sec)
此時基於 name 列的哈希索引如圖 2:code
由圖 2 發現,name 列作索引存在必定概率的哈希值碰撞,這類碰撞越多,哈希索引的性能降低越快,因此這種場景反而發揮不到哈希索引的優點。blog
接下來咱們來看看在 MySQL 哈希索引的使用場景。爲了對比 B 樹索引,建一張表 t1 的克隆表 t2。排序
# 省略表 t1 造數據過程 mysql> create table t2 like t1; Query OK, 0 rows affected (0.02 sec) mysql> alter table t2 drop primary key, drop key idx_name; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table t2 add primary key (id) using btree, add key idx_name (name) using btree; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> insert into t2 select * from t1; Query OK, 50000 rows affected (0.18 sec) Records: 50000 Duplicates: 0 Warnings: 0
僅限於如下操做符:"=","in","<=>"
"=" 和 "in" 你們都很熟悉,簡單說下 "<=>", "<=>" 也是等值操做符,不一樣的是能夠比較 NULL 值,操做符左右兩邊都爲 NULL 返回 1;任意一邊非 NULL,返回 0;兩邊都非 NULL 也返回 1。見如下 SQL 0。索引
# SQL 0 mysql> select 10<=>10,null<=>null,null<=>true; +---------+-------------+-------------+ | 10<=>10 | null<=>null | null<=>true | +---------+-------------+-------------+ | 1 | 1 | 0 | +---------+-------------+-------------+ 1 row in set (0.00 sec)
接下來的幾個 SQL 都是基於操做 "=" 和 "in"。內存
# SQL 1 mysql> select * from t1 where id = 2000; +------+-----------+--------+--------+ | id | name | gender | status | +------+-----------+--------+--------+ | 2000 | 王牡丹 | 男 | 離異 | +------+-----------+--------+--------+ 1 row in set (0.00 sec)
SQL 1 爲基於主鍵的等值查詢,很是適合用哈希索引,計算列 id=2000 的哈希值,能快速經過索引找到對應的記錄。
# SQL 2 mysql> select * from t2 where id in (1000,2000); +------+-----------+--------+--------+ | id | name | gender | status | +------+-----------+--------+--------+ | 1000 | 餘歡水 | 男 | 二婚 | | 2000 | 王牡丹 | 男 | 離異 | +------+-----------+--------+--------+ 2 rows in set (0.00 sec)
SQL 2 雖然爲 IN,可是子串的條件很是稀少(只有兩個),也適合用哈希索引。
# SQL 3 mysql> select count(*) from t1 where id between 1000 and 2000; +----------+ | count(*) | +----------+ | 1001 | +----------+ 1 row in set (0.02 sec)
SQL 3 基於一個範圍的查詢,和以上兩條 SQL 不同,這種 SQL 無法走哈希索引。緣由很明確:基於索引字段生成的哈希值和索引字段自己的可排序性沒有任何聯繫,哈希索引無從下手。這樣的場景,就得使用先天優點的 B 樹索引。
把 SQL 3 的表改成 t2,基於 B 樹索引。
# SQL 4 mysql> select count(*) from t2 where id between 1000 and 2000; +----------+ | count(*) | +----------+ | 1001 | +----------+ 1 row in set (0.00 sec)
對比下 SQL 3 和 SQL 4 的執行計劃,SQL 3 不走索引,全表掃描 5W 行;SQL4 走主鍵索引只掃描 3000 行。
mysql> explain select count(*) from t1 where id between 1000 and 2000\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 50000 filtered: 11.11 Extra: Using where 1 row in set, 1 warning (0.00 sec) mysql> explain select count(*) from t2 where id between 1000 and 2000\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t2 partitions: NULL type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 3052 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)
哈希索引自己是無序的,因此基於哈希索引的排序只能走全表掃描,而且額外排序。SQL 5 和 SQL 6 分別對錶 t1,t2 執行一樣的倒序輸出。
# SQL 5 mysql> select id from t1 where 1 order by id desc limit 1; +--------+ | id | +--------+ | 135107 | +--------+ 1 row in set (0.02 sec) # SQL 6 mysql> select id from t2 where 1 order by id desc limit 1; +--------+ | id | +--------+ | 135107 | +--------+ 1 row in set (0.00 sec)
來看下這兩條 SQL 的執行計劃:
mysql> explain select id from t1 where 1 order by id desc limit 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 50000 filtered: 100.00 Extra: Using filesort 1 row in set, 1 warning (0.00 sec) mysql> explain select id from t2 where 1 order by id desc limit 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t2 partitions: NULL type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 1 filtered: 100.00 Extra: Backward index scan 1 row in set, 1 warning (0.00 sec)
能夠看到 SQL 6 走 B 樹索引反向掃描拿回對應的 ID,SQL 5 就只能全表排序拿數據。
一樣,因爲哈希值是基於哈希函數生成,索引值並不包含數據自己,任何對哈希索引的查找都得回表。
mysql> flush status; Query OK, 0 rows affected (0.01 sec) # SQL 7 mysql> select id from t1 limit 1; +-----+ | id | +-----+ | 949 | +-----+ 1 row in set (0.00 sec) mysql> show status like 'handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 2 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 0 | | Handler_read_last | 0 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 849 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+-------+ 18 rows in set (0.00 sec) mysql> flush status; Query OK, 0 rows affected (0.01 sec) # SQL 8 mysql> select id from t2 limit 1; +-----+ | id | +-----+ | 949 | +-----+ 1 row in set (0.00 sec) mysql> show status like 'handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 2 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 0 | | Handler_read_last | 0 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 1 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+-------+ 18 rows in set (0.00 sec)
以上分別執行 SQL 7 和 SQL 8 。兩條 SQL 只拿出索引鍵,對比下 MySQL 的 HANDLER 狀態,發現 SQL 7 回表了 849 次才找到記錄;SQL 8 只回表了一次。
好比對字段 (x1,x2,x3) 創建聯合哈希索引,因爲哈希值是針對這三個字段聯合計算,只有一個,不能針對單個字段來走索引。在此基礎上,在創建上兩表,主鍵爲聯合索引,表 t3 主鍵是哈希索引,表 t4 主鍵是 B 樹索引。
mysql> create table t3(id1 int,id2 int,r1 int, primary key(id1,id2)) engine memory; Query OK, 0 rows affected (0.03 sec) mysql> create table t4(id1 int,id2 int,r1 int, primary key(id1,id2) using btree) engine memory; Query OK, 0 rows affected (0.03 sec) #省略造數據過程。 mysql> select (select count(*) from t3) t3, (select count(*) from t4) t4; +-------+-------+ | t3 | t4 | +-------+-------+ | 16293 | 16293 | +-------+-------+ 1 row in set (0.00 sec) # SQL 9 mysql> select * from t3 where id1 = 44; +-----+-----+------+ | id1 | id2 | r1 | +-----+-----+------+ | 44 | 98 | 29 | | 44 | 180 | 56 | | 44 | 130 | 32 | | 44 | 104 | 65 | | 44 | 208 | 4 | | 44 | 154 | 113 | | 44 | 69 | 84 | | 44 | 76 | 154 | | 44 | 132 | 33 | | 44 | 108 | 79 | | 44 | 173 | 6 | +-----+-----+------+ 11 rows in set (0.00 sec) # SQL 10 mysql> select * from t4 where id1 = 44; +-----+-----+------+ | id1 | id2 | r1 | +-----+-----+------+ | 44 | 69 | 84 | | 44 | 76 | 154 | | 44 | 98 | 29 | | 44 | 104 | 65 | | 44 | 108 | 79 | | 44 | 130 | 32 | | 44 | 132 | 33 | | 44 | 154 | 113 | | 44 | 173 | 6 | | 44 | 180 | 56 | | 44 | 208 | 4 | +-----+-----+------+ 11 rows in set (0.00 sec)
SQL 9 和 SQL 10 都是基於聯合主鍵第一個字段查詢。簡單看下執行計劃。很明顯,SQL 9 沒走索引,SQL10 走主鍵。
mysql> explain select * from t3 where id1 = 44\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t3 partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 16293 filtered: 10.00 Extra: Using where 1 row in set, 1 warning (0.00 sec) mysql> explain select * from t4 where id1 = 44\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t4 partitions: NULL type: ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 30 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec)
這篇關於 MySQL 哈希索引的介紹就到此爲止。這篇主要講 MySQL 哈希索引的數據分佈以及使用場景,但願對你們有幫助。
關於 MySQL 的技術內容,大家還有什麼想知道的嗎?趕忙留言告訴小編吧!