接上一篇:http://www.javashuo.com/article/p-wzmvbfdx-ka.html緩存
Each MOB has a threshold: if the value length of a cell is larger than this threshold, this cell is regarded as a MOB cell.安全
When the MOB cells are updated in the regions, they are written to the WAL and memstore, just like the normal cells. In flushing, the MOBs are flushed to MOB files, and the metadata and paths of MOB files are flushed to store files. The data consistency and HBase replication features are native to this design.dom
The MOB edits are larger than usual. In the sync, the corresponding I/O is larger too, which can slow down the sync operations of WAL. If there are other regions that share the same WAL, the write latency of these regions can be affected. However, if the data consistency and non-volatility are needed, WAL is a must.工具
中等大小的文件有一個最小值(閾值),若是單元格(cell)長度大於這個值,這個單元格就被認爲是一個MOB單元。oop
當MOB單元在region裏被更新時,被寫入WAL和memstore,跟正常的單元格沒區別。當刷新的時候,中等大小文件被刷新到MOB file裏,元數據和MOB file的路徑被刷入stroe file。這個設計中,一致性和副本都是原生的。ui
MOB的編輯日誌(edits)會比日常大,同步的時候,相應的IO也會變大,會使WAL同步變慢,若是其餘的region共享這個WAL,會影響這些region的寫入延遲。若是對數據一致性和穩定性有要求,必須用WAL。this
The cells are permitted to move between stored files and MOB files in the compactions by changing the threshold. The default threshold is 100KB.spa
As illustrated below, the cells that contain the paths of MOB files are called reference cells. The tags are retained in the cells, so we can continue to rely on the HBase security mechanism..net
The reference cells have reference tags that differentiates them from normal cells. A reference tag implies a MOB cell in a MOB file, and thus further resolving is needed in reading.設計
改變閾值,容許單元格在store file和壓縮過的MOB file之間移動,默認的閾值設置爲100KB。
以下圖,包含MOB file路徑(FileName)的單元格稱謂」引用單元格」。標籤(Tag)在單元格中保留,因此咱們仍然使用Hbase的安全機制。
「引用單元格」經過「引用標籤」來跟正常的單元格區分。「引用標籤」表示MOB file中的一個MOB 單元格,所以須要在讀取的時候進一步轉換。
In reading, the store scanner opens scanners to memstore and store files. If a reference cell is met, the scanner reads the file path from the cell value, and seeks the same row key from that file. The block cache can be enabled for the MOB files in scan, which can accelerate seeking..
It is not necessary to open readers to all the MOB files; only one is needed when required. This random read is not impacted by the number of MOB files. So, we don’t need to compact the MOB files over and over again when they are large enough.
The MOB filename is readable, and comprises three parts: the MD5 of the start key, the latest date of cells in this MOB file, and a UUID. The first part is the start key of the region from where this MOB file is flushed. Usually, the MOBs have a user-defined TTL, so you can find and delete expired MOB files by comparing the second part with the TTL.
讀取的時候,掃描器掃描memstore和store file,若是遇到「引用單元格」,掃描器讀取單元格里的文件路徑,經過相同的row key查找文件。能夠對掃描過的MOB文件啓用塊緩存, 這樣能夠加速查找。沒有必要打開全部MOB的reader。只須要打開一個。隨機讀取不會受文件數量的影響。因此,咱們不須要一遍又一遍的壓縮足夠大的文件。
MOB文件名是可讀的。由3部分組成,start key的MD5值,MOB文件中的單元格最新日期,UUID。第一部分是MOB文件刷入region的起始值。一般,MOB有一個用戶定義的過時時間,所以你能夠經過比較第二部分來找到、刪除過時的MOB文件。
To be more friendly to the snapshot, the MOB files are stored in a special dummy region, whereby the snapshot, table export/clone, and archive work as expected.
When storing a snapshot to a table, one creates the MOB region in the snapshot, and adds the existing MOB files into the manifest. When restoring the snapshot, create file links in the MOB region.
爲了更友好地使用快照, 這些MOB文件存儲在一個特殊的虛擬region中, 其中快照、表導出/複製和存檔按預期的方式工做。
將快照存儲到表中時, 會在快照中建立暴民區域, 並將現有的暴民文件添加到清單中。還原快照時, 在MOB region中建立文件連接。
There are two situations when MOB files should be deleted: when the MOB file is expired, and when the MOB file is too small and should be merged into bigger ones to improve HDFS efficiency.
HBase MOB has a chore in master: it scans the MOB files, finds the expired ones determined by the date in the filename, and deletes them. Thus disk space is reclaimed periodically by aging off expired MOB files.
這兩種狀況須要刪除MOB文檔:1.文件過時 2.文件過小,應當被合併到大文件裏來改善HDFS利用率。
Hbase Master有一個例行工做,掃描MOB文件,找到過時的文件而且刪除。隨着過時文件被清理。磁盤空間被按期回收。
MOB files may be relatively small compared to a HDFS block if you write rows where only a few entries qualify as MOBs; also, there might be deleted cells. You need to drop the deleted cells and merge the small files into bigger ones to improve HDFS utilization. The MOB compactions only compact the small files and the large files are not touched, which avoids repeated compaction to large files.
若是寫入的行只有少數條目符合 MOB 條件, MOB文件可能會比HDFS塊相對較小。而且,可能還有被刪除的單元格。你須要清理刪掉的單元格,而且用HDFS工具將小文件合併成大文件。HBase只壓縮小文件,不涉及大文件,避免了對大文件重複壓縮。
Some other things to keep in mind:
· Know which cells are deleted. In every HBase major compaction, the delete markers are written to a del file before they are dropped.
· In the first step of MOB compactions, these del files are merged into bigger ones.
· All the small MOB files are selected. If the number of small files is equal to the number of existing MOB files, this compaction is regarded as a major one and is called an ALL_FILES compaction.
· These selected files are partitioned by the start key and date in the filename. The small files in each partition are compacted with del files so that deleted cells could be dropped; meanwhile, a new HFile with new reference cells is generated, the compactor commits the new MOB file, and then it bulk loads this HFile into HBase.
· After compactions in all partitions are finished, if an ALL_FILES compaction is involved, the del files are archived.
謹記下面幾條:
1.知道哪些單元格已經被刪了。在Hbase的每次主要壓縮,單元格被刪除以前,會先寫入del文件。
2.在MOB壓縮的第一步, 將del文件被合併成更大的。
3. 全部MOB小文件都被選中。若是小文件的數量等於現有的MOB文件的數量, 這種壓縮被認爲是一個主要壓縮, 被稱爲 ALL_FILES 壓縮。
4.這些選定的文件由文件名中的開始鍵和日期進行分區。每一個分區中的小文件都用 del 文件壓縮, 這樣刪除的單元格就會被丟棄。同時, 一個新的 HFile 與新的參考單元產生, 壓縮器提交新的MOB文件, 而後它大批量加載這個 HFile 到 HBase。
5.在全部分區中的壓縮完成後, 若是涉及 ALL_FILES 壓縮, 則會存檔 del 文件。
The life cycle of MOB files is illustrated below. Basically, they are created when memstore is flushed, and deleted by HFileCleaner from the filesystem when they are not referenced by the snapshot or expired in the archive.
下面說明了MOB文件的生命週期。基本上, 它們是在 memstore 被刷新時建立的, 當未被快照引用或在存檔中過時時, 由HFileCleaner從文件系統中刪除。
In summary, the new HBase MOB design moves MOBs out of the main I/O path of HBase while retaining most security, compaction, and snapshotting features. It caters to the characteristics of operations in MOB, makes the write amplification of MOBs more predictable, and keeps low latencies in both reading and writing.
總之, 新的 HBase MOB設計將中等大小文件移出 HBase 的主要讀寫路徑, 同時保留大多數安全性、壓縮性和快照特性。它迎合了MOB操做的特色, 使大量的MOB寫入更可預測, 並保持讀寫低延遲。
感謝原文做者:
Jincheng Du is a Software Engineer at Intel and an HBase contributor.
Jon Hsieh is a Software Engineer at Cloudera and an HBase committer/PMC member. He is also the founder of Apache Flume, and a committer on Apache Sqoop.