23.Secondary Index

時間 2019-12-17

標籤 23.secondary secondary index 简体版

原文原文鏈接

一. Secondary Index（二級索引）
1.1. Secondary Index 介紹

mysql

• Clustered Index（彙集索引）
    ◦ 葉子節點存儲全部記錄（all row data）
• Secondary Index（二級索引）
    ◦ 也能夠稱爲 非彙集索引
    ◦ 葉子節點存儲的是 索引 和 主鍵 信息
    ◦ 在找到索引後，獲得對應的主鍵，再 回到彙集索引 中找主鍵對應的記錄（row data）
        ◾ Bookmark Lookup （書籤查找）
        ◾ 俗稱 回表
        ◾ 回表 不止 多 一次IO
        ◾ 而是 多N次 IO（N=樹的高度）

1.2. Secondary Index 回表
sql

create table userinfo (
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid),
unique key idx_username(username),
key idx_registdate(registdate)
);

1. 假設查找 username 爲Tom，先找二級索引 idx_username ，經過找到 key 爲Tom，並獲得對應的primary key：userid_a。
2. 獲得了userid_a後，再去找彙集索引中userid_a的記錄（row data）。
3. 上述一次經過 二級索引 獲得 數據 （row data）的 查找過程 ，即爲 回表 。
4. 上述過程都是MySQL自動幫你作的。

能夠將上述的 userinfo 表進行人工拆分，從而進行人工回表，拆分以下：

shell

-- 表1 : 建立一個只有主鍵userid的表，將原來的二級索引 人工拆分 成獨立的表
create table userinfo(
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid)
);
-- 表2： idx_username表，將userid和username做爲表的字段，並作一個複合主鍵 （對應原來的idx_username索引）
create table idx_username(
userid int not null,
username varchar(30),
primary key(username, userid)
);
-- 表3： idx_registdate表，將userid和registdate做爲表的字段，並作一個複合主鍵 （對應原來的idx_registdate 索引）
create table idx_registdate(
userid int not null,
registdate datetime,
primary key(registdate, userid)
);
-- 表4：一致性約束表
create table idx_username_constraint(
username varchar(30),
primary key(username)
);
-- 插入數據，使用事物，要麼全插，要麼全不差
start transaction;
insert into userinfo values(1, 'Tom', '1990-01-01', 'tom@123.com');
insert into idx_username_constraint('Tom');
insert into idx_username(1, 'Tom');
insert into idx_registdate(1, '1990-01-01')
commit；

• 假設要查找TOM的 email ：

1. 先查找 Tom 對應的 userid ，即找的是 idx_username表 （對應以前就是在idx_username索引中找tom）
2. 獲得 userid 後，再去 userinfo表 ，經過 userid 獲得 email 字段的內容（對對應以前就是在 彙集索引 中找userid的記錄（row data））
3. 上述兩次查找就是 人工回表

拆表後，就須要開發本身去實現 回表 的邏輯；而開始的一張大表，則是MySQL自動實現該邏輯。

1.3. 堆表的二級索引
1. 在堆表中，是沒有彙集索引的，全部的索引都是二級索引；
2. 索引的葉子節點存放的是 key 和指向堆中記錄的指針（物理位置）數據庫

1.4. 堆表和IOT表二級索引的對比
多線程

1. 堆表中的二級索引查找 不須要回表 ，且查找速度和 主鍵索引 一致，由於二者的 葉子節點 存放的都是 指向數據 的 指針 ；反之 IOT表 的的二級索引查找須要回表。
2. 堆表中某條記錄（row data）發生更新且 沒法原地更新 時，該記錄（row data）的物理位置將發生改變；此時， 全部索引 中對該記錄的 指針 都須要 更新 （代價較大）；反之，IOT表中的記錄更新，且 主鍵沒有更新 時， 二級索引 都 無需更新 （一般來講主鍵是不更新的）
◦ 實際數據庫設計中，堆表的數據沒法原地更新時，且在一個 頁內有剩餘空間 時，原來數據的空間位置不會釋放，而是使用指針指向新的數據空間位置，此時該記錄對應的全部索引就無需更改了；
◦ 若是 頁內沒有剩餘空間 ，全部的索引仍是要更新一遍；
3. IOT表頁內是有序的，頁與頁之間也是有序的，作range查詢很快。

1.5. index with included column（含列索引）
在上面給出的 userinfo 的例子中，若是要查找某個用戶的email ，須要回表，如何不回表進行查詢呢？

數據庫設計

1. 方案一 ：複合索引
-- 表結構
create table userinfo (
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid),
unique key idx_username(username, email), -- 索引中有email，能夠直接查到，不用回表
key idx_registdate(registdate)
);

-- 查詢
select email from userinfo where username='Tom';
該方案能夠作到 只查一次 索引就能夠獲得用戶的email，可是 複合索引 中username和email都要 排序
而 含列索引 的意思是索引中 只對username 進行排序，email是不排序的，只是帶到索引中，方便查找

2. 方案二：拆表
create table userinfo (
userid int not null auto_increment,
username varchar(30),
registdate datetime,
email varchar(50),
primary key(userid),
key idx_registdate(registdate)
);

create table idx_username_include_email (
userid int not null,
username varchar(30),
email varchar(50),
primary key(username, userid),
unique key(username)
);

-- 兩個表的數據一致性能夠經過事物進行保證

經過拆表的方式，查找 idx_username_include_email 表，既能夠經過 username 找到 email ，可是須要告訴研發，若是想要經過useranme獲得email，查這張錶速度更快，而不是查userinfo表
oop

對於含有多個索引的IOT表，能夠將索引拆成不一樣的表，進而提升查詢速度
可是實際使用中，就這個例子而言，使用複合索引，代價也不會太大。

性能

二. Multi-Range Read（MRR）
2.1. 回表的代價

優化

mysql> alter table employees add index idx_date (hire_date); -- 給 employees 增長一個索引


mysql> show create table employees\G
*************************** 1. row ***************************
Table: employees
Create Table: CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`),
KEY `idx_date` (`hire_date`) -- 新增的索引
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

-- 查詢語句1
mysql> select * from employees where emp_no between 10000 and 20000; -- 主鍵查找1W條數據

-- 查詢語句2
mysql> select * from employees where hire_date >= '1990-01-01' limit 10000; -- select * 操做，每次查找須要回表
1. 對於 查詢語句1 ，假設一個頁中有100條記錄，則只須要100次IO；
2. 對於 查詢語句2 ，這次查詢中，假設 彙集索引 和 hire_date索引 （二級索引）的高度都是 3 ，且查找 1W 條（假設不止1W條），則須要查詢的IO數爲 (3+N)+3W
　　◦ 3 爲 第一次 找到 hire_date>=1990-01-01 所在的頁（二級索引）的IO次數
　　◦ N 爲從第一次找到的頁 日後 讀頁的IO次數（注意二級索引也是連續的， 不須要 從根再從新查找）
　　　　◾ 因此 3+N 就是在 hire_date （二級索引）中讀取IO的次數
　　◦ 3W 爲在IOT表中進行 回表 的次數
3. 在MySQL5.6以前，實際使用過程當中，優化器可能會選擇直接進行 掃表 ，而 不會 進行如此多的回表操做。

2.2. MRR 介紹
MRR：針對物理訪問，隨機轉順序，空間換時間。

ui

1. 開闢一塊 內存 空間做爲cache
　　◦ 默認爲 32M ，注意是 線程級 的，不建議設置的很大；

mysql> show variables like "%read_rnd%";
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| read_rnd_buffer_size | 33554432 | -- 32M
+----------------------+----------+
1 row in set (0.00 sec)

2. 將 須要回表 的 主鍵 放入上述的 內存 空間中（空間換時間）， 放滿 後進行 排序 （隨機轉順序）；
3. 將 排序 好數據（主鍵）一塊兒進行回表操做，以提升性能；
　　◦ 在 IO Bound 的SQL場景下，使用MRR比不使用MRR系能 提升 將近 10倍 （磁盤性能越低越明顯）；
　　◦ 若是數據都在內存中，MRR的幫助不大， 已經在內存 中了，不存在隨機讀的概念了（隨機讀主要針對物理訪問）
SSD 仍然須要開啓該特性，多線程下的隨機讀確實很快，可是咱們這裏的操做是一條SQL語句，是 單線程 的，因此 順序 的訪問仍是比 隨機 訪問要 更快 。

mysql> show variables like 'optimizer_switch'\G
*************************** 1. row ***************************
Variable_name: optimizer_switch
Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_in
dex_extensions=on,condition_fanout_filter=on,derived_merge=on
1 row in set (0.00 sec)

-- 其中MRR默認是打開的 mrr=on，不建議關閉
mysql> explain select * from employees where hire_date >= '1990-01-01';
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | employees | NULL | ALL | idx_date | NULL | NULL | NULL | 298124 | 50.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)


-- 雖然mrr=on打開了，可是沒有使用MRR
mysql> set optimizer_switch='mrr_cost_based=off'; -- 將該值off，不讓MySQL對MRR進行成本計算（強制使用MRR）
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from employees where hire_date >= '1990-01-01';
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------+
| 1 | SIMPLE | employees | NULL | range | idx_date | idx_date | 3 | NULL | 149062 | 100.00 | Using index condition; Using MRR |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)
-- 使用了MRR

三. 求B+樹的高度
每一個頁的 Page Header 中都包含一個 PAGE_LEVEL 的信息，表示該頁所在B+樹中的層數，葉子節點的PAGE_LEVEL爲 0 。
因此樹的高度就是 root頁的 PAGE_LEVEL + 1

3.3. PAGE_LEVEL
從一個頁的第64字節開始讀取，而後再讀取 2個字節，就能獲得 PAGE_LEVEL 的值

3.4. 獲取root頁
mysql> use information_schema;Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A Database changed
 mysql> desc INNODB_SYS_INDEXES; +-----------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+---------------------+------+-----+---------+-------+ | INDEX_ID | bigint(21) unsigned | NO | | 0 |  | | NAME | varchar(193) | NO | | |  | | TABLE_ID | bigint(21) unsigned | NO | | 0 | | | TYPE | int(11)　　　　　　　 | NO | | 0 | | | N_FIELDS 　　　　| int(11) | NO | | 0  | | | PAGE_NO 　　　　　| int(11) | NO | | 0  | | | SPACE 　　　　　　| int(11) | NO | | 0 |  | | MERGE_THRESHOLD | int(11) | NO | | 0 |  | +-----------------+---------------------+------+-----+---------+-------+ 8 rows in set (0.00 sec)
 mysql> select * from INNODB_SYS_INDEXES where space<>0 limit 1\G *************************** 1. row *************************** INDEX_ID: 18 NAME: PRIMARY TABLE_ID: 16 TYPE: 3 N_FIELDS: 1 PAGE_NO: 3 -- 根據官方文檔，該字段就是B+樹root頁的PAGE_NO SPACE: 5 MERGE_THRESHOLD: 50 1 row in set (0.01 sec)
 -- 沒有table的name，只有ID
 mysql> select b.name , a.name, index_id, type, a.space, a.PAGE_NO -> from INNODB_SYS_INDEXES as a, INNODB_SYS_TABLES as b -> where a.table_id = b.table_id -> and a.space <> 0 and b.name like "dbt3/%"; -- 作一次關聯 +----------------------+-----------------------+----------+------+-------+---------+ | name 　　　　　　　　　　| name 　　　　　　　　　　| index_id | type | space | PAGE_NO | +----------------------+-----------------------+----------+------+-------+---------+ | dbt3/customer | PRIMARY 　　　　　　　　　| 64 | 3 | 43 | 3  | | dbt3/customer | i_c_nationkey | 65 | 0 | 43 | 4 | | dbt3/lineitem 　　 | PRIMARY | 66 | 3 　　| 44　　| 3 　　　 | | dbt3/lineitem　　　　 | i_l_shipdate | 67 | 0 | 44 | 4 | | dbt3/lineitem　　　　 | i_l_suppkey_partkey | 68 | 0 | 44 | 5 | | dbt3/lineitem　　　　 | i_l_partkey | 69 | 0 | 44 | 6 | | dbt3/lineitem 　　　　 | i_l_suppkey | 70 | 0 | 44  | 7  | | dbt3/lineitem　　　　 | i_l_receiptdate | 71 | 0 | 44 | 8 | | dbt3/lineitem　　　　 | i_l_orderkey | 72  | 0  | 44 | 9 | | dbt3/lineitem　　　　 | i_l_orderkey_quantity | 73  | 0 　　| 44 　　| 10 　　| | dbt3/lineitem　　　　 | i_l_commitdate | 74 | 0　　 | 44 　　| 11 　　| | dbt3/nationq　　　　　 | PRIMARY  | 75 | 3 　　| 45 　　| 3 　　 | | dbt3/nation　　　　 | i_n_regionkey | 76 | 0　　 | 45　　 | 4　　  | | dbt3/orders　　　　 | PRIMARY | 77 | 3 　　| 46 　　| 3  | | dbt3/orders　　　　　　 | i_o_orderdate | 78  | 0　　 | 46 　　| 4 | | dbt3/orders | i_o_custkey | 79  | 0 　　| 46 　　| 5 | | dbt3/part　　　　　　 | PRIMARY | 80 | 3 | 47 | 3 | | dbt3/partsupp | PRIMARY | 81 | 3 | 48 　　 | 3 | | dbt3/partsupp | i_ps_partkey | 82 | 0  | 48 | 4 | | dbt3/partsupp | i_ps_suppkey | 83  | 0  | 48 | 5  | | dbt3/region | PRIMARY | 84  | 3  | 49 | 3  | | dbt3/supplier | PRIMARY  | 85  | 3  | 50 | 3  | | dbt3/supplier | i_s_nationkey | 86  | 0 | 50 | 4  | | dbt3/time_statistics | GEN_CLUST_INDEX | 87  | 1  | 51 | 3  | +----------------------+-----------------------+----------+------+-------+---------+ 24 rows in set (0.00 sec)
 -- 彙集索引頁的root頁的PAGE_NO通常就是3

3.5. 讀取PAGE_LEVEL

mysql> select count(*) from dbt3.lineitem;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+
1 row in set (5.68 sec)

shell> hexdump -h
hexdump: invalid option -- 'h'
hexdump: [-bcCdovx] [-e fmt] [-f fmt_file] [-n length] [-s skip] [file ...]

shell> hexdump -s 24640 -n 2 -Cv lineitem.ibd
00006040 00 02 |..|
00006042


1. 24640 = 8192 * 3 + 64
　　◦ 其中 8192 是個人頁大小
　　◦ root頁 的 PAGE_NO 爲 3 ，表示是 第4個頁 ，則須要 跳過 前面 3個頁 ，才能 定位到root頁 ，因此要 *3
　　◦ 而後加上 64 個字節的偏移量，便可定位到 PAGE_LEVEL
2. -n 2 表示讀取的字節數，這裏讀取 2個字節 ，便可以讀到 PAGE_LEVEL

根據上述 hexdump 的結果，root頁中的 PAGE_LEVEL 爲2，表示該索引的高度爲 3 （從0開始計算）

1. Secondary Market
2. Secondary NameNode
3. 【51nod】2606 Secondary Substring
4. Secondary Namenode流程
5. Secondary NameNode作用
6. Secondary NameNode做用
7. Scala primary && secondary constructor
8. Secondary Logon 服務
9. 22、Secondary Tiles
10. Index-Organized Tables
更多相關文章...
• SQLite 索引（Index） - SQLite教程
• SQL CREATE INDEX 語句 - SQL 教程
• 三篇文章瞭解 TiDB 技術內幕 —— 說計算
• Java 8 Stream 教程

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。