探究InnoDB數據頁內部行的存儲方式

 

 

探究InnoDB數據頁內部行的存儲方式

實驗數據

CREATE TABLE `ibd2_test` (
  `id` int(11) NOT NULL,
  `name` varchar(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
+----+-------+
| id | name  |
+----+-------+
|  1 | test1 | 
|  2 | test2 | 
|  3 | test3 | 
|  4 | test4 | 
|  5 | test5 | 
+----+-------+
5 rows in set (0.00 sec)

以後delete id爲3的行,並繼續插入4行數據,最終:css

localhost.test>select * from ibd2_test;
+----+-------+
| id | name  |
+----+-------+
|  1 | test1 | 
|  2 | test2 | 
|  4 | test4 | 
|  5 | test5 | 
|  6 | test6 | 
|  7 | test7 | 
|  8 | test8 | 
|  9 | test9 | 
+----+-------+
8 rows in set (0.00 sec)

分析工具

本身python寫的Innodb Extracthtml

實驗分析

首先回憶下MySQL源碼中關於record格式的定義,文件rec0rem.c(77~104行)node

/* PHYSICAL RECORD (NEW STYLE)
===========================python

The physical record, which is the data type of all the records
found in index pages of the database, has the following format
(lower addresses and more significant bits inside a byte are below
represented on a higher text line):mysql

| length of the last non-null variable-length field of data:
if the maximum length is 255, one byte; otherwise,
0xxxxxxx (one byte, length=0..127), or 1exxxxxxxxxxxxxx (two bytes,
length=128..16383, extern storage flag) |
...
| length of first variable-length field of data |
| SQL-null flags (1 bit per nullable field), padded to full bytes |
| 4 bits used to delete mark a record, and mark a predefined
minimum record in alphabetical order |
| 4 bits giving the number of records owned by this record
(this term is explained in page0page.h) |
| 13 bits giving the order number of this record in the
heap of the index page |
| 3 bits record type: 000=conventional, 001=node pointer (inside B-tree),
010=infimum, 011=supremum, 1xx=reserved |
| two bytes giving a relative pointer to the next record in the page |
ORIGIN of the record
| first field of data |
...
| last field of data |nginx

畫成圖以下:
row_formatgit

info bits的第三位表示該行是否已被刪除,若是是則標記1,沒有被刪除則標記0,第四位表示該記錄是不是預先被定義爲最小的記錄,若是是則標記爲1
n_owned該記錄擁有的記錄數,指的是該記錄所在頁中page diectory所屬slot中擁有的記錄數
order索引堆中的順序,僞記錄首記錄infimum這裏爲0,而僞記錄最後一條記錄spremum這裏爲1,也就是說真實記錄從2開始。這裏這個值表明的是物理記錄的真實順序,而非邏輯順序,後續咱們爲此驗證
record type表示記錄的類型,數據行爲0,節點指針值爲1,僞記錄首記錄infimum值爲2,僞記錄最後一個記錄supremum的值爲3
next record offset下一條記錄的相對offset,經過這個next record offset 咱們能夠遍歷一個頁中的全部記錄。記錄與記錄之間經過鏈表的形式組織github

深刻剖析

step 1,咱們首先看下原先刪除Id爲3的記錄前:web

[root@hebe211 ibd]#  python innodb_extract.py ibd2_test.ibd

infimum
row_id:000000000213,info_bits:0000,n_owned:0000,order:2(0000000000010),next offset:34(0000000000100010)
   1 test1 
row_id:000000000214,info_bits:0000,n_owned:0000,order:3(0000000000011),next offset:34(0000000000100010)
   2 test2 
row_id:000000000215,info_bits:0000,n_owned:0000,order:4(0000000000100),next offset:34(0000000000100010)
   3 test3 
row_id:000000000216,info_bits:0000,n_owned:0000,order:5(0000000000101),next offset:34(0000000000100010)
   4 test4 
row_id:000000000217,info_bits:0000,n_owned:0000,order:6(0000000000110),next offset:-150(1111111101101010)
   5 test5

首先,咱們沒有定義主鍵,因此係統會自動建立一個6字節的row_id做爲隱藏主鍵,每一條記錄record header的最後兩個字節指向下一條記錄row_id的起始offset,鏈表是按照聚簇索引組織起來的,也就說邏輯記錄是按照聚簇索引的順序連接起來。咱們在看物理順序是2->3->4->5->6,此時跟聚簇索引的順序是徹底同樣的!(另外在個人工具中把僞記錄的首記錄infimum和尾記錄supremum過濾了,這兩條記錄的order分別是0和1,這裏不作詳。)sql

step 2,咱們將id爲3(row_id爲000000000215)的記錄刪除,再看變化

infimum
row_id:000000000213,info_bits:0000,n_owned:0000,order:2(0000000000010),next offset:34(0000000000100010)
1 test1
row_id:000000000214,info_bits:0000,n_owned:0000,order:3(0000000000011),next offset:68(0000000001000100)
2 test2
row_id:000000000216,info_bits:0000,n_owned:0000,order:5(0000000000101),next offset:34(0000000000100010)
4 test4
row_id:000000000217,info_bits:0000,n_owned:0000,order:6(0000000000110),next offset:-150(1111111101101010)
5 test5

咱們看到,row_id爲000000000215的記錄不見了,就是說在這個數據鏈表中被摘除了。此時記錄的物理順序也沒有變:2->3->5->6,第二行row_id爲000000000214的下一條記錄的offset再也不是34,而變成了68,指向的是row_id爲000000000216的行。印證了前一句我說的id爲3的記錄是被從數據鏈表中'摘除'而不是刪除。

step 3,咱們繼續插入4條數據以後再看

infimum
row_id:000000000213,info_bits:0000,n_owned:0000,order:2(0000000000010),next offset:34(0000000000100010)
   1 test1 
row_id:000000000214,info_bits:0000,n_owned:0000,order:3(0000000000011),next offset:68(0000000001000100)
   2 test2 
row_id:000000000216,info_bits:0000,n_owned:0000,order:5(0000000000101),next offset:34(0000000000100010)
   4 test4 
row_id:000000000217,info_bits:0000,n_owned:0100,order:6(0000000000110),next offset:-68(1111111110111100)
   5 test5 
row_id:000000000218,info_bits:0000,n_owned:0000,order:4(0000000000100),next offset:102(0000000001100110)
   6 test6 
row_id:000000000219,info_bits:0000,n_owned:0000,order:7(0000000000111),next offset:34(0000000000100010)
   7 test7 
row_id:00000000021a,info_bits:0000,n_owned:0000,order:8(0000000001000),next offset:34(0000000000100010)
   8 test8 
row_id:00000000021b,info_bits:0000,n_owned:0000,order:9(0000000001001),next offset:-252(1111111100000100)
   9 test9

此時數據鏈表中的物理順序變爲2->3->5->6->4->7->8->9,注意物理存儲的順序再也不是根據聚簇索引順序排序的順序了!咱們後插入的第一條row_id爲000000000218的記錄此時在堆中的排序變成4,同時row_id爲000000000217的下一條記錄的相對位置offset偏移量變成了負數(負數的存儲方式以補碼的形式存儲),而且-68就是剛剛被刪除的row_id爲000000000215的物理偏移量,那咱們能夠理解爲被刪除的空間重用了

step 4,咱們再刪除1條id爲8(row_id00000000021a)的行

localhost.test>select * from ibd2_test;
+----+-------+
| id | name  |
+----+-------+
|  1 | test1 | 
|  2 | test2 | 
|  4 | test4 | 
|  5 | test5 | 
|  6 | test6 | 
|  7 | test7 | 
|  9 | test9 | 
+----+-------+

而後咱們再觀察,根據mysql源碼裏對於PAGE HEADER的定義:

/*          PAGE HEADER
            ===========

Index page header starts at the first offset left free by the FIL-module */

typedef byte        page_header_t;

#define PAGE_HEADER FSEG_PAGE_DATA  /* index page header starts at this
                offset */
/*-----------------------------*/
#define PAGE_N_DIR_SLOTS 0  /* number of slots in page directory */
#define PAGE_HEAP_TOP    2  /* pointer to record heap top */
#define PAGE_N_HEAP  4  /* number of records in the heap,
                bit 15=flag: new-style compact page format */
#define PAGE_FREE    6  /* pointer to start of page free record list */
#define PAGE_GARBAGE     8  /* number of bytes in deleted records */

PAGE_FREE和PAGE_GARBAGE分別定義可重用空間的指針和可重用空間的大小,咱們打開debug信息,再看下物理行的變化

[root@hebe211 ibd]#  python innodb_extract.py ibd_test.ibd
PAGE_FREE pointer offset 330,PAGE_GARBAGE size 34
now row begin offset 99
infimum
now row begin offset 126
row_id:000000000213,info_bits:0000,n_owned:0000,order:2(0000000000010),next offset:34(0000000000100010)
   1 test1 
now row begin offset 160
row_id:000000000214,info_bits:0000,n_owned:0000,order:3(0000000000011),next offset:68(0000000001000100)
   2 test2 
now row begin offset 228
row_id:000000000216,info_bits:0000,n_owned:0000,order:5(0000000000101),next offset:34(0000000000100010)
   4 test4 
now row begin offset 262
row_id:000000000217,info_bits:0000,n_owned:0100,order:6(0000000000110),next offset:-68(1111111110111100)
   5 test5 
now row begin offset 194
row_id:000000000218,info_bits:0000,n_owned:0000,order:4(0000000000100),next offset:102(0000000001100110)
   6 test6 
now row begin offset 296
row_id:000000000219,info_bits:0000,n_owned:0000,order:7(0000000000111),next offset:68(0000000001000100)
   7 test7 
now row begin offset 364
row_id:00000000021b,info_bits:0000,n_owned:0000,order:9(0000000001001),next offset:-252(1111111100000100)
   9 test9

此時row_id爲000000000219的下一行指向了row_id00000000021b,相對offset從34變爲了68,跳過了剛纔刪除的row_id爲00000000021a的行。此時在看PAGE_FREE指向的offset爲330,PAGE_GARBAGE大小34個字節,等於row_id000000000219起始offset 296 + 34(剛纔刪除行的size),也就是說剛纔從數據鏈表被摘下的行被放入了可重用空間鏈表裏去了,這個指針永遠指向最新的被刪除的行,若是有數據插入,這個可重用空間被重用,那麼這行就從可重用空間鏈表裏摘除,同時放入數據鏈表中

step 5 爲了印證上面的想法,咱們繼續刪除id爲1(row_id爲000000000213)的行

localhost.test>select * from ibd2_test;
+----+-------+
| id | name  |
+----+-------+
|  2 | test2 | 
|  4 | test4 | 
|  5 | test5 | 
|  6 | test6 | 
|  7 | test7 | 
|  9 | test9 | 
+----+-------+
6 rows in set (0.00 sec)

咱們在看下可重用空間指針內容的變化

[root@hebe211 ibd]#  python innodb_extract.py ibd2_test.ibd
PAGE_FREE pointer offset 126,PAGE_GARBAGE size 68
now row begin offset 99
infimum
now row begin offset 160
row_id:000000000214,info_bits:0000,n_owned:0000,order:3(0000000000011),next offset:68(0000000001000100)
   2 test2 
now row begin offset 228
row_id:000000000216,info_bits:0000,n_owned:0000,order:5(0000000000101),next offset:34(0000000000100010)
   4 test4 
now row begin offset 262
row_id:000000000217,info_bits:0000,n_owned:0000,order:6(0000000000110),next offset:-68(1111111110111100)
   5 test5 
now row begin offset 194
row_id:000000000218,info_bits:0000,n_owned:0000,order:4(0000000000100),next offset:102(0000000001100110)
   6 test6 
now row begin offset 296
row_id:000000000219,info_bits:0000,n_owned:0000,order:7(0000000000111),next offset:68(0000000001000100)
   7 test7 
now row begin offset 364
row_id:00000000021b,info_bits:0000,n_owned:0000,order:9(0000000001001),next offset:-252(1111111100000100)
   9 test9

刪除id爲1的行以後,此時PAGE_FREE指針指向了位置爲126的位置,此時可重用空間的大小變成了68字節。而此時僞記錄的首記錄infimum的下一條記錄的指針指向了row_id爲000000000214的行,而再也不是row_id 000000000213的行,offset變爲68,跳過了被刪除的行。此時,咱們看下,PAGE_FREE指向的offset爲126,正是被刪除的行(row_id爲000000000213,offset爲126)的起始位置,而可重用空間的大小從34字節變成了64字節。說明PAGE_FREE指針指向的是最新的被刪除的行,而有新數據插入的時候,也是重用最後刪除的行的空間,符合「後入先出」規律,相似於棧。

step 6,咱們最後插入一條數據,看是否會重用row_id000000000213的行的空間,若是是的話,變驗證了上面的想法

localhost.test>select * from ibd2_test;
+----+-------+
| id | name  |
+----+-------+
|  2 | test2 | 
|  4 | test4 | 
|  5 | test5 | 
|  6 | test6 | 
|  7 | test7 | 
|  9 | test9 | 
|  3 | testa | 
+----+-------+
7 rows in set (0.00 sec)
[root@hebe211 ibd]#  python innodb_extract.py ibd2_test.ibd
PAGE_FREE pointer offset 330,PAGE_GARBAGE size 34
now row begin offset 99
infimum
now row begin offset 160
row_id:000000000214,info_bits:0000,n_owned:0000,order:3(0000000000011),next offset:68(0000000001000100)
   2 test2 
now row begin offset 228
row_id:000000000216,info_bits:0000,n_owned:0000,order:5(0000000000101),next offset:34(0000000000100010)
   4 test4 
now row begin offset 262
row_id:000000000217,info_bits:0000,n_owned:0000,order:6(0000000000110),next offset:-68(1111111110111100)
   5 test5 
now row begin offset 194
row_id:000000000218,info_bits:0000,n_owned:0000,order:4(0000000000100),next offset:102(0000000001100110)
   6 test6 
now row begin offset 296
row_id:000000000219,info_bits:0000,n_owned:0000,order:7(0000000000111),next offset:68(0000000001000100)
   7 test7 
now row begin offset 364
row_id:00000000021b,info_bits:0000,n_owned:0000,order:9(0000000001001),next offset:-238(1111111100010010)
   9 test9 
now row begin offset 126
row_id:00000000021c,info_bits:0000,n_owned:0000,order:2(0000000000010),next offset:-14(1111111111110010)
   3 testa

咱們看到插入id=3(row_id00000000021c)的行以後,PAGE_FREE指向的offset從126變回了330,可重用空間大小也變成了34字節,最新刪除的行的空間從刪除鏈中摘除,同時咱們看到新插入的行order爲2,也就是以前的刪除的id=1(row_id000000000213)佔用的空間,空間此處被新插入數據重用。

step5 到step6刪除鏈表的變化總結如圖:

reuse_list

最後,咱們打開debug信息,分析一下如今刪除鏈表存儲的內容

[root@hebe211 ibd]#  python innodb_extract.py ibd2_test.ibd

PAGE_FREE pointer offset 330,PAGE_GARBAGE size 34
row_id:00000000021a,info_bits:0010,n_owned:0000,order:8(0000000001000),next offset:0(0000000000000000)

now row begin offset 99
infimum
now row begin offset 160
row_id:000000000214,info_bits:0000,n_owned:0000,order:3(0000000000011),next offset:68(0000000001000100)
   2 test2 
now row begin offset 228
row_id:000000000216,info_bits:0000,n_owned:0000,order:5(0000000000101),next offset:34(0000000000100010)
   4 test4 
now row begin offset 262
row_id:000000000217,info_bits:0000,n_owned:0000,order:6(0000000000110),next offset:-68(1111111110111100)
   5 test5 
now row begin offset 194
row_id:000000000218,info_bits:0000,n_owned:0000,order:4(0000000000100),next offset:102(0000000001100110)
   6 test6 
now row begin offset 296
row_id:000000000219,info_bits:0000,n_owned:0000,order:7(0000000000111),next offset:68(0000000001000100)
   7 test7 
now row begin offset 364
row_id:00000000021b,info_bits:0000,n_owned:0000,order:9(0000000001001),next offset:-238(1111111100010010)
   9 test9 
now row begin offset 126
row_id:00000000021c,info_bits:0000,n_owned:0000,order:2(0000000000010),next offset:-14(1111111111110010)
   3 testa

row_id:00000000021a,info_bits:0010,n_owned:0000,order:8(0000000001000),next offset:0(0000000000000000)
now row begin offset 99

row_id00000000021a就是以前刪除的Id=8的記錄
重點是這個info_bits:0010,第三位是deleted標誌位,爲1說明該行記錄已被刪除
由於刪除鏈只有這一條數據,因此next offset指向的下一條記錄offset爲0

總結

經過以上record header結合物理存儲格式,咱們看到有3個鏈表:邏輯記錄,物理記錄,刪除記錄

  • 邏輯記錄的排序是根據聚簇索引的順序排序的,物理記錄的順序是行在堆中的順序。當放生數據被刪除以後又插入數據空間被重用的時候,物理記錄的順序與邏輯記錄的順序再也不一致
  • 刪除一條記錄時同時從邏輯記錄鏈表裏摘除,加入刪除鏈表,刪除鏈表指針老是指向最新被刪除的記錄的空間。當空間被重用,棧頂指向的空間從刪除鏈表中移除,加入到邏輯記錄鏈表
  • 刪除數據以後,若是該行記錄還在刪除鏈表裏存在,理論來說數據是能夠恢復的。可是若是空間被重用了,數據將不可恢復
相關文章
相關標籤/搜索