XIV（3）--Read/Write Operations

時間 2019-11-07

標籤 xiv read write operations 简体版

原文原文鏈接

XIV系列：node

《XIV （2）--Logical system concepts》ide

以前的文章曾經說過HOST發過來的data會在XIV上存2份，即Primary Copy和Secondary Copy。只有當這2份都同時存在時，系統纔是Full Redundancy狀態。那主機在XIV上讀寫數據分別是怎麼進行的呢？請看本篇：oop

-Each write is written to the cache of two data modules// 每一個寫操做是先寫到2個Data Module中的Cache中的spa

-Host is acknowledged as soon as two cache copies are available //只有當兩份Cache都寫完時纔會發送一個Acknowledge給Hostblog

-De-staging to the disk drives takes place: //至於何時將cache中的數據Flush到Disk上是各個Module獨立進行的get

–In the backgroundit

–Independently on each moduleio

Write Operation Overviewtable

1.Host sends write to interface

2.Interface sends write to primary data module

3.Primary data module sends write to secondary data module

4.Host is acknowledged only after write is completed on both modules

上圖只是講述了Host寫數據操做的大體步驟，涉及到XIV內部具體是怎麼進行的呢？

Write Operations

1, Host sends a write request to one of the i_nodes

2, i_node consults with the Slice Table, determines primary node ID and disk #

3, i_node forwards request to relevant module’s primary cache node

4, Primary cache node consults with the Slice Table, forwards request to the secondary cache node

5, Both cache nodes consult with their local Partition Table to determine physical location on disks

6, Both cache nodes save the written buffer in their memory cache

7, Secondary cache node send an ack to the primary cache, which then acks the i_node, which then acks the host

這裏有兩個Table，Slice Table和Local Partition Table。一個負責整套XIV的元數據，一個是負責盤上面的。能夠看出全部的涉及到寫到哪一個Node的具體哪塊Disk時，是要查詢Slice Table。而最後寫到Disk上哪塊Block上時，是查詢Local Partition Table的。

Slice table

--It’s an index stored all the slices info for the whole system

--It’s existed in every module’s memory

--i_node and cte can query it and know the slices (primary and secondary slices) are stored in which module and which disk

Partition table

--Each cache node holds a Partition Table that keeps one entry for each physical partition that exists on the module

--It keeps translation maps between a (vol ID, logical partition #) pairs to (disk ID, physical partition #) pairs

看完寫操做，再來看讀操做。

Read Operations

1, Host sends a read request to one of the i_nodes

2, i_node consults with the Slice Table, determines primary node ID and disk #

–A read request will always be directed to the primary copy of the data

3, i_node forwards request to relevant module’s cache node

4, Cache node consults with its local Partition Table, determines physical location on disk

5, Cache node reads the data from the memory cache, if there, or from the disk

6, Cache node sends data to i_node, which gives it to the host

一樣地，讀操做也涉及到Slice Table和Local Partition Table。

看到這裏，我感受和我以前接觸到的分佈式文件系統（Distributed File system）很是相似，例如MooseFS，Google的GFS，Hadoop File System等等。有機會在深刻研究以後對比下二者實現方式的異同點。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。