HBase的基本原理

時間 2019-11-17

原文原文鏈接

HBase and MapR-DB: Designed for Distribution, Scale, and Speed-HBASE數據模型

Apache HBase is a database that runs on a Hadoop cluster. HBase is not a traditional RDBMS, as it relaxes the ACID (Atomicity, Consistency, Isolation, and Durability) properties of traditional RDBMS systems in order to achieve much greater scalability. Data stored in HBase also does not need to fit into a rigid schema like with an RDBMS, making it ideal for storing unstructured or semi-structured data.html

The MapR Converged Data Platform supports HBase, but also supports MapR-DB, a high performance, enterprise-grade NoSQL DBMS that includes the HBase API to run HBase applications. For this blog, I’ll specifically refer to HBase, but understand that many of the advantages of using HBase in your data architecture apply to MapR-DB. MapR built MapR-DB to take HBase applications to the next level, so if the thought of higher powered, more reliable HBase deployments sound appealing to you, take a look at some of the MapR-DB content here.node

HBase allows you to build big data applications for scaling, but with this comes some different ways of implementing applications compared to developing with traditional relational databases. In this blog post, I will provide an overview of HBase, touch on the limitations of relational databases, and dive into the specifics of the HBase data model.web

Relational Databases vs. HBase – Data Storage Model算法

Why do we need NoSQL/HBase? First, let’s look at the pros of relational databases before we discuss its limitations:sql

Relational databases have provided a standard persistence model
SQL has become a de-facto standard model of data manipulation (SQL)
Relational databases manage concurrency for transactions
Relational database have lots of tools

Relational databases were the standard for years, so what changed? With more and more data came the need to scale. One way to scale is vertically with a bigger server, but this can get expensive, and there are limits as your size increases.express

Relational Databases vs. HBase - Scalingapache

What changed to bring on NoSQL?緩存

An alternative to vertical scaling is to scale horizontally with a cluster of machines, which can use commodity hardware. This can be cheaper and more reliable. To horizontally partition or shard a RDBMS, data is distributed on the basis of rows, with some rows residing on a single machine and the other rows residing on other machines, However, it’s complicated to partition or shard a relational database, and it was not designed to do this automatically. In addition, you lose the querying, transactions, and consistency controls across shards. Relational databases were designed for a single node; they were not designed to be run on clusters.安全

Limitations of a Relational Model服務器

Database normalization eliminates redundant data, which makes storage efficient. However, a normalized schema causes joins for queries, in order to bring the data back together again. While HBase does not support relationships and joins, data that is accessed together is stored together so it avoids the limitations associated with a relational model. See the difference in data storage models in the chart below:

Relational databases vs. HBase - data storage model

HBase Designed for Distribution, Scale, and Speed

HBase was designed to scale due to the fact that data that is accessed together is stored together. Grouping the data by key is central to running on a cluster. In horizontal partitioning or sharding, the key range is used for sharding, which distributes different data across multiple servers. Each server is the source for a subset of data. Distributed data is accessed together, which makes it faster for scaling. HBase is actually an implementation of the BigTable storage architecture, which is a distributed storage system developed by Google that’s used to manage structured data that is designed to scale to a very large size.

HBase is referred to as a column family-oriented data store. It’s also row-oriented: each row is indexed by a key that you can use for lookup (for example, lookup a customer with the ID of 1234). Each column family groups like data (customer address, order) within rows. Think of a row as the join of all values in all column families.

HBase is a column family-oriented database

HBase is also considered a distributed database. Grouping the data by key is central to running on a cluster and sharding. The key acts as the atomic unit for updates. Sharding distributes different data across multiple servers, and each server is the source for a subset of data.

HBase is a distributed database

HBase Data Model

Data stored in HBase is located by its 「rowkey.」 This is like a primary key from a relational database. Records in HBase are stored in sorted order, according to rowkey. This is a fundamental tenet of HBase and is also a critical semantic used in HBase schema design.

HBase data model – row keys

Tables are divided into sequences of rows, by key range, called regions. These regions are then assigned to the data nodes in the cluster called 「RegionServers.」 This scales read and write capacity by spreading regions across the cluster. This is done automatically and is how HBase was designed for horizontal sharding.

Tables are split into regions = contiguous keys

The image below shows how column families are mapped to storage files. Column families are stored in separate files, which can be accessed separately.

The data is stored in HBase table cells. The entire cell, with the added structural information, is called Key Value. The entire cell, the row key, column family name, column name, timestamp, and value are stored for every cell for which you have set a value. The key consists of the row key, column family name, column name, and timestamp.

Logically, cells are stored in a table format, but physically, rows are stored as linear sets of cells containing all the key value information inside them.

In the image below, the top left shows the logical layout of the data, while the lower right section shows the physical storage in files. Column families are stored in separate files. The entire cell, the row key, column family name, column name, timestamp, and value are stored for every cell for which you have set a value.

Logical data model vs. physical data storage

As mentioned before, the complete coordinates to a cell's value are: Table:Row:Family:Column:Timestamp ➔ Value. HBase tables are sparsely populated. If data doesn’t exist at a column, it’s not stored. Table cells are versioned uninterpreted arrays of bytes. You can use the timestamp or set up your own versioning system. For every coordinate row family:column, there can be multiple versions of the value.

Sparse data with cell versions

Versioning is built in. A put is both an insert (create) and an update, and each one gets its own version. Delete gets a tombstone marker. The tombstone marker prevents the data being returned in queries. Get requests return specific version(s) based on parameters. If you do not specify any parameters, the most recent version is returned. You can configure how many versions you want to keep and this is done per column family. The default is to keep up to three versions. When the max number of versions is exceeded, extra records will be eventually removed.

Versioned data

In this blog post, you got an overview of HBase (and implicitly MapR-DB) and learned about the HBase/MapR-DB data model. Stay tuned for the next blog post, where I’ll take a deep dive into the details of the HBase architecture. In the third and final blog post in this series, we’ll take a look at schema design guidelines.

Want to learn more?

深度分析HBase架構

知乎中文版

HBase的架構

物理上看, HBase系統有3種類型的後臺服務程序, 分別是Region server, Master server 和 zookeeper.

Region server負責實際數據的讀寫. 當訪問數據時, 客戶端與HBase的Region server直接通訊.

Master server管理Region的位置, DDL(新增和刪除表結構)

Zookeeper負責維護和記錄整個HBase集羣的狀態.

全部的HBase數據都存儲在HDFS中. 每一個 Region server都把本身的數據存儲在HDFS中. 若是一個服務器既是Region server又是HDFS的Datanode. 那麼這個Region server的數據會在把其中一個副本存儲在本地的HDFS中, 加速訪問速度.

可是, 若是是一個新遷移來的Region server, 這個region server的數據並無本地副本. 直到HBase運行compaction, 纔會把一個副本遷移到本地的Datanode上面.

HDFS的Name node存儲這全部block文件的位置信息

HBase Region server

HBase的表根據Row Key的區域分紅多個Region, 一個Region包含這這個區域內全部數據. 而Region server負責管理多個Region, 負責在這個Region server上的全部region的讀寫操做. 一個Region server最多能夠管理1000個region.

HBase Master server

HBase Maste主要負責分配region和操做DDL(如新建和刪除表)等,

HBase Master的功能:

協調Region server
在集羣處於數據恢復或者動態調整負載時,分配Region到某一個Region Server中
管控集羣, 監控全部RegionServer的狀態
提供DDL相關的API, 新建(create),刪除(delete)和更新(update)表結構.

ZooKeeper: 集羣"物業"管理員

Zookeepper是一個分佈式的無中心的元數據存儲服務. zookeeper探測和記錄HBase集羣中服務器的狀態信息. 若是zookeeper發現服務器宕機, 它會通知Hbase的master節點. 在生產環境部署zookeeper至少須要3臺服務器, 用來知足zookeeper的核心算法Paxos的最低要求.

譯註: 如圖, zookeeper有三臺服務器, region server和master節點都經過heartbeat的方式向zookeeper報告狀態

ZooKeeper, Master和 Region server協同工做

Zookeepr負責維護集羣的memberlist, 哪臺服務器在線,哪臺服務器宕機都由zookeeper探測和管理. Region server, 主備Master節點主動鏈接Zookeeper, 維護一個Session鏈接,

這個session要求定時發送heartbeat, 向zookeeper說明本身在線, 並無宕機.

ZooKeeper有一個Ephemeral Node(臨時節點)的概念, session鏈接在zookeeper中創建一個臨時節點(Ephemeral Node), 若是這個session斷開, 臨時節點被自動刪除.

全部Region server都嘗試鏈接Zookeeper, 並在這個session中創建一個臨時節點(Ephemeral node). HBase的master節點監控這些臨時節點的是否存在, 能夠發現新加入region server和判斷已經存在的region server宕機.

爲了高可用需求, HBase的master也有多個, 這些master節點也同時向Zookeeper註冊臨時節點(Ephemeral Node). Zookeeper把第一個成功註冊的master節點設置成active狀態, 而其餘master node處於inactive狀態.

若是zookeeper規定時間內, 沒有收到active的master節點的heartbeat, 鏈接session超時, 對應的臨時節也自動刪除. 以前處於Inactive的master節點獲得通知, 立刻變成active狀態, 當即提供服務.

一樣, 若是zookeeper沒有及時收到region server的heartbeat, session過時, 臨時節點刪除. HBase master得知region server宕機, 啓動數據恢復方案.

HBase的第一次讀寫流程

HBase把各個region的位置信息存儲在一個特殊的表中, 這個表叫作Meta table.

Zookeeper裏面存儲了這個Meta table的位置信息.

HBase的訪問流程:

客戶端訪問Zookeep, 獲得了具體Meta table的位置
客戶端再訪問真正的Meta table, 從Meta table裏面獲得row key所在的region server
訪問rowkey所在的region server, 獲得須要的真正數據.

客戶端緩存meta table的位置和row key的位置信息, 這樣就不用每次訪問都讀zookeeper.

若是region server因爲宕機等緣由遷移到其餘服務器. Hbase客戶端訪問失敗, 客戶端緩存過時, 再從新訪問zookeeper, 獲得最新的meta table位置, 更新緩存.

HBase Meta Table

Meta table存儲全部region的列表

Meta table用相似於Btree的方式存儲

Meta table的結構以下:

- Key: region的開始row key, region id

- Values: Region server

譯註: 在google的bigtable論文中, bigtable採用了多級meta table, Hbase的Meta table只有2級

Region Server的結構

Region Server運行在HDFS的data node上面, 它有下面4個部分組成:

WAL: 預寫日誌(Write Ahead Log)是一HDFS上的一個文件, 若是region server崩潰後, 日誌文件用來恢復新寫入的的, 可是尚未存儲在硬盤上的數據.
BlockCache: 讀取緩存, 在內存裏緩存頻繁讀取的數據, 若是BlockCache滿了, 會根據LRU算法(Least Recently Used)選出最不活躍的數據, 而後釋放掉
MemStore: 寫入緩存, 在數據真正被寫入硬盤前, Memstore在內存中緩存新寫入的數據. 每一個region的每一個列簇(column family)都有一個memstore. memstore的數據在寫入硬盤前, 會先根據key排序, 而後寫入硬盤.
HFiles: HDFS上的數據文件, 裏面存儲KeyValue對.

HBase的寫入流程(1)

當hbase客戶端發起Put請求, 第一步是將數據寫入預寫日誌(WAL):

將修改的操做記錄在預寫日誌(WAL)的末尾
預寫日誌(WAL)被用來在region server崩潰時, 恢復memstore中的數據

WAL總寫入到文件末尾, 是順序寫入, 寫入速度較快

Hbase的寫入流程(2)

數據寫入預寫日誌(WAL), 並存儲在memstore以後, 向用戶返回寫成功.

HBase MemStore

MemStore在內存按照Key的順序, 存儲Key-Value對, 一個Memstore對應一個列簇(column family). 一樣在HFile裏面, 全部的Key-Value對也是根據Key有序存儲.

HBase Region Flush

譯註: 原文裏面Flush的意識是, 把緩衝的數據從內存轉存到硬盤裏, 這就相似與沖廁所(Flush the toilet) , 把數據比做是水, 一下把積攢的水衝到下水道, 想當於把緩存的數據寫入硬盤. 和Flush很是相似的英文還有un-plug, 好比有一浴缸的水, 只要un-plug浴缸裏面的塞子, 浴缸的水就開始流進下水道, 也類比把緩存數據寫入硬盤

當Memstore累計了足夠多的數據, Region server將Memstore中的數據寫入HDFS, 存儲爲一個HFile. 每一個列簇(column family)對於多個HFile, 每一個HFile裏面就是實際存儲的數據.

這些HFile都是當Memstore滿了之後, Flush到HDFS中的文件. 注意到HBase限制了列簇(column family)的個數. 由於每一個列簇(column family)都對應一個Memstore. [譯註: 太多的memstore佔用過多的內存].

當Memstore的數據Flush到硬盤時, 系統額外保存了最後寫入操做的序列號(last written squence number), 因此HBase知道有多少數據已經成功寫入硬盤. 每一個HFile都記錄這個序號, 代表這個HFile記錄了多少數據和從哪裏繼續寫入數據.

在region server啓動後, 讀取全部HFile中最高的序列號, 新的寫入序列號從這個最高序列號繼續向上累加.

HBase HFile

HFile中存儲有序的Key-Value對. 當Memstore滿了以後, Memstore中的全部數據寫入HDFS中,造成一個新的HFile. 這種大文件寫入是順序寫, 由於避免了機械硬盤的磁頭移動, 因此寫入速度很是快.

HBase HFile Structure

HFile存儲了一個多級索引(multi-layered index), 查詢請求不須要遍歷整個HFile查詢數據, 經過多級索引就能夠快速獲得數據(工做機制相似於b+tree)

Key-Value按照升序排列
Key-Value存儲在以64KB爲單位的Block裏
每一個Block有一個葉索引(leaf-index), 記錄Block的位置
每一個Block的最後一個Key(譯註: 最後一個key也是最大的key), 放入中間索引(intermediate index)
根索引(root index)指向中間索引

尾部指針(trailer pointer)在HFile的最末尾, 它指向元數據塊區(meta block), 布隆過濾器區域和時間範圍區域. 查詢布隆過濾器能夠很快得肯定row key是否在HFile內, 時間範圍區域也能夠幫助查詢跳過不在時間區域的讀請求.

譯註: 布隆過濾器在搜索和文件存儲中有普遍用途, 具體算法請參考https://china.googleblog.com/2007/07/bloom-filter_7469.html

HFile索引

當打開HFile後, 系統自動緩存HFile的索引在Block Cache裏, 這樣後續查找操做只須要一次硬盤的尋道.

HBase的混合讀(Read Merge)

咱們發現HBase中的一個row裏面的數據, 分配在多個地方. 已經持久化存儲的Cell在HFile, 最近寫入的Cell在Memstore裏, 最近讀取的Cell在Block cache裏. 因此當你讀HBase的一行時, 混合了Block cache, memstore和Hfiles的讀操做

首先, 在Block cache(讀cache)裏面查找cell, 由於最近的讀取操做都會緩存在這裏. 若是找到就返回, 沒有找到就執行下一步
其次, 在memstore(寫cache)裏查找cell, memstore裏面存儲裏最近的新寫入, 若是找到就返回, 沒有找到就執行下一步
最後, 在讀寫cache中都查找失敗的狀況下, HBase查詢Block cache裏面的Hfile索引和布隆過濾器, 查詢有可能存在這個cell的HFile, 最後在HFile中找到數據.

HBase Minor Compaction

HBase自動選擇較小的HFile, 將它們合併成更大的HFile. 這個過程叫作minor compaction. Minor compaction經過合併小HFile, 減小HFile的數量.

HFile的合併採用歸併排序的算法.

譯註: 較少的HFile能夠提升HBase的讀性能

HBase Major Compaction

Major compaction指一個region下的全部HFile作歸併排序, 最後造成一個大的HFile. 這能夠提升讀性能. 可是, major compaction重寫全部的Hfile, 佔用大量硬盤IO和網絡帶寬. 這也被稱爲寫放大現象(write amplification)

Major compaction能夠被調度成自動運行的模式, 可是因爲寫放大的問題(write amplification), major compaction一般在一週執行一次或者只在凌晨運行. 此外, major compaction的過程當中, 若是發現region server負責的數據不在本地的HDFS datanode上, major compaction除了合併文件外, 還會把其中一份數據轉存到本地的data node上.

Region = 一組連續key

快速的複習region的概念:

一張表垂直分割成一個或多個region, 一個region包括一組連續而且有序的row key, 每個row key對應一行的數據.
每一個region最大1GB(默認)
region由region server管理
一個region server能夠管理多個region, 最多大約1000個region(這些region能夠屬於相同的表,也能夠屬於不一樣的表)

Region的拆分

最初, 每張表只有一個region, 當一個region變得太大時, 它就分裂成2個子region. 2個子region, 各佔原始region的一半數據, 仍然被相同的region server管理. Region server向HBase master節點彙報拆分完成.

若是集羣內還有其餘region server, master節點傾向於作負載均衡, 因此master節點有可能調度新的region到其餘region server, 由其餘region管理新的分裂出的region.

負載均衡

最初, 一個Region server上的region一分爲二, 可是考慮到負載均衡, master node會把新region調度到其餘服務器上. 然而, 新region所在的region server在本地data node上沒有數據, 全部操做都是操做遠程HDFS上面的數據. 直到這個Region server運行了major compaction以後, 纔有一份副本落在本地datanode中.

譯註: HFile和WAL都是存儲在HDFS中, 這裏說的把副本存儲在本地是指: 因爲HDFS是一種聰明的FS, 若是他發現要求寫入文件的客戶端剛好也是HDFS的data node, 那麼在分配哪三臺服務器存儲副本時, 會優先在發請求的客戶端存儲數據, 這樣就可讓Region server管理的數據雖然是3份, 可是其中一份就在本地服務器上, 優化了訪問路徑.

具體能夠參考這篇文章http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html, 裏面詳述了HDFS如何實現這種本地化的存儲. 換句話說, 若是region server沒有和HDFS的data node部署在同一臺服務器, 就沒法實現上面說的本地存儲

HDFS的數據複製(1)

全部讀寫都是操做primary node. HDFS自動複製全部WAL和HFile的數據塊到其餘節點. HBase依賴HDFS保證數據安全. 當在HDFS裏面寫入一個文件時, 一份存儲在本地節點, 另兩份存儲到其餘節點

HDFS的數據複製(2)

預寫日誌(WAL) 和 HFile都存在HDFS裏面, 能夠保證數據的可靠性, 可是HBase memstore裏的數據都在內存中, 若是系統崩潰後重啓, Hbase如何恢復Memstore裏面的數據?

譯註: 從上圖看memstore的數據在內存中, 也沒有多副本

HBase的災難恢復

當region server宕機, 崩潰的region server管理的region不能再提供服務, HBase監測到異常後, 啓動恢復程序, 恢復region.

Zookeeper發現region server的heartbeat中止, 判斷region server宕機並通知master節點. Hbase master節點得知該region server停機後, 將崩潰的region server管理的region分配給其餘region server. HBase從預寫文件(WAL)裏恢復memstore裏的數據.

HBase master知道老的region被從新分配到哪些新的region server. Master把已經crash的Region server的預寫日誌(WAL)拆分紅多個. 參與故障恢復的每一個region server重放的預寫日誌(WAL), 從新構建出丟失Memstore.

數據恢復

預寫日誌(WAL)記錄了HBase的每一個操做, 每一個操做表明一個Put或者刪除Delete動做. 全部的操做按照時間順序在預寫日誌(WAL)排列, 文件頭記錄最老的操做, 最新的操做處於文件末尾.

如何恢復在memstore裏, 但尚未寫到HFile的數據? 從新執行預寫日誌(WAL)就能夠. 從前到後依次執行預寫日誌(WAL)裏的操做, 重建memstore數據. 最後, Flush memstore數據到的HFile, 完成恢復.

Apache Hbase架構優勢

強一致模型

- 當寫返回時, 確保全部讀操做讀到相同的值

自動擴展

- 數據增加過大時, 自動分裂region

- 利用HFDS分散數據和備份數據

內建自動回覆

- 預寫日誌(WAL)

集成Hadoop生態

- 在HBase上運行map reduce

Apache HBase存在的問題...

Business continuity reliability:
重放預寫日誌慢
故障恢復既慢又複雜
Major compaction容易引發IO風暴(寫放大)

Guidelines for HBase Schema Design

In this blog post, I’ll discuss how HBase schema is different from traditional relational schema modeling, and I’ll also provide you with some guidelines for proper HBase schema design.

Relational vs. HBase Schemas

There is no one-to-one mapping from relational databases to HBase. In relational design, the focus and effort is around describing the entity and its interaction with other entities; the queries and indexes are designed later.

With HBase, you have a 「query-first」 schema design; all possible queries should be identified first, and the schema model designed accordingly. You should design your HBase schema to take advantage of the strengths of HBase. Think about your access patterns, and design your schema so that the data that is read together is stored together. Remember that HBase is designed for clustering.

Distributed data is stored and accessed together
It is query-centric, so focus on how the data is read
Design for the questions

Normalization

In a relational database, you normalize the schema to eliminate redundancy by putting repeating information into a table of its own. This has the following benefits:

You don’t have to update multiple copies when an update happens, which makes writes faster.
You reduce the storage size by having a single copy instead of multiple copies.

However, this causes joins. Since data has to be retrieved from more tables, queries can take more time to complete.

In this example below, we have an order table which has one-to-many relationship with an order items table. The order items table has a foreign key with the id of the corresponding order.

De-normalization

In a de-normalized datastore, you store in one table what would be multiple indexes in a relational world. De-normalization can be thought of as a replacement for joins. Often with HBase, you de-normalize or duplicate data so that data is accessed and stored together.

Parent-Child Relationship–Nested Entity

Here is an example of denormalization in HBase, if your tables exist in a one-to-many relationship, it’s possible to model it in HBase as a single row. In the example below, the order and related line items are stored together and can be read together with a get on the row key. This makes the reads a lot faster than joining tables together.

The rowkey corresponds to the parent entity id, the OrderId. There is one column family for the order data, and one column family for the order items. The Order Items are nested, the Order Item IDs are put into the column names and any non-identifying attributes are put into the value.

This kind of schema design is appropriate when the only way you get at the child entities is via the parent entity.

Many-to-Many Relationship in an RDBMS

Here is an example of a many-to-many relationship in a relational database. These are the query requirements:

Get name for user x
Get title for book x
Get books and corresponding ratings for userID x
Get all userIDs and corresponding ratings for book y

Many-to-Many Relationship in HBase

The queries that we are interested in are:

Get books and corresponding ratings for userID x
Get all userIDs and corresponding ratings for book y

For an entity table, it is pretty common to have one column family storing all the entity attributes, and column families to store the links to other entities.

The entity tables are as shown below:

Generic Data, Event Data, and Entity-Attribute-Value

Generic data that is schemaless is often expressed as name value or entity attribute value. In a relational database, this is complicated to represent. A conventional relational table consists of attribute columns that are relevant for every row in the table, because every row represents an instance of a similar object. A different set of attributes represents a different type of object, and thus belongs in a different table. The advantage of HBase is that you can define columns on the fly, put attribute names in column qualifiers, and group data by column families.

Here is an example of clinical patient event data. The Row Key is the patient ID plus a time stamp. The variable event type is put in the column qualifier, and the event measurement is put in the column value. OpenTSDB is an example of variable system monitoring data.

Self-Join Relationship – HBase

A self-join is a relationship in which both match fields are defined in the same table.

Consider a schema for twitter relationships, where the queries are: which users does userX follow, and which users follow userX? Here’s a possible solution: The userids are put in a composite row key with the relationship type as a separator. For example, Carol follows Steve Jobs and Carol is followed by BillyBob. This allows for row key scans for everyone carol:follows or carol:followedby

Below is the example Twitter table:

Tree, Graph Data

Here is an example of an adjacency list or graph, using a separate column for each parent and child:

Each row shows a node, and the row key is equal to the node id. There is a column family for parent p, and a column family children c. The column qualifiers are equal to the parent or child node ids, and the value is equal to the type to node. This allows to quickly find the parent or children nodes from the row key.

You can see there are multiple ways to represent trees, the best way depends on your queries.

Inheritance Mapping

In this online store example, the type of product is a prefix in the row key. Some of the columns are different, and may be empty depending on the type of product. This allows to model different product types in the same table and to scan easily by product type.

Data Access Patterns

Use Cases: Large-scale offline ETL analytics and generating derived data

In analytics, data is written multiple orders of magnitude more frequently than it is read. Offline analysis can also be used to provide a snapshot for online viewing. Offline systems don’t have a low-latency requirement; that is, a response isn’t expected immediately. Offline HBase ETL data access patterns, such as Map Reduce or Hive, are characterized by high latency reads and high throughput writes.

Data Access Patterns

Use Cases: Materialized view, pre-calculated summaries

To provide fast reads for online web sites, or an online view of data from data analysis, MapReduce jobs can reorganize the data into different groups for different readers or materialized views. Batch offline analysis could also be used to provide a snapshot for online views. This is going to be high throughput for batch offline writes and high latency for read (when online).

Examples include:

• Generating derived data, duplicating data for reads in HBase schemas, and delayed secondary indexes

Schema Design Exploration:

Raw data from HDFS or HBase
MapReduce for data transformation and ETL from raw data.
Use bulk import from MapReduce to HBase
Serve data for online reads from HBase

Designing for reads means aggressively de-normalizing data so that the data that is read together is stored together.

Data Access Patterns

Lambda Architecture

The Lambda architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer.

MapReduce jobs are used to create artifacts useful to consumers at scale. Incremental updates are handled in real time by processing updates to HBase in a Storm cluster, and are applied to the artifacts produced by MapReduce jobs.

The batch layer precomputes the batch views. In the batch view, you read the results from a precomputed view. The precomputed view is indexed so that it can be accessed quickly with random reads.

The serving layer indexes the batch view and loads it up so it can be efficiently queried to get particular values out of the view. A serving layer database only requires batch updates and random reads. The serving layer updates whenever the batch layer finishes precomputing a batch view.

You can do stream-based processing with Storm and batch processing with Hadoop. The speed layer only produces views on recent data, and is for functions computed on data in the few hours not covered by the batch. In order to achieve the fastest latencies possible, the speed layer doesn’t look at all the new data at once. Instead, it updates the real time view as it receives new data, instead of re-computing them like the batch layer does. In the speed layer, HBase provides the ability for Storm to continuously increment the real-time views.

How does Storm know to process new data in HBase? A needs work flag is set. Processing components scan for notifications and process them as they enter the system.

MapReduce Execution and Data Flow

The flow of data in a MapReduce execution is as follows:

Data is being loaded from the Hadoop file system
Next, the job defines the input format of the data
Data is then split between different map() methods running on all the nodes
Then record readers parse out the data into key-value pairs that serve as input into the map() methods
The map() method produces key-value pairs that are sent to the partitioner
When there are multiple reducers, the mapper creates one partition for each reduce task
The key-value pairs are sorted by key in each partition
The reduce() method takes the intermediate key-value pairs and reduces them to a final list of key-value pairs
The job defines the output format of the data
Data is written to the Hadoop file system

In this blog post, you learned how HBase schema is different from traditional relational schema modeling, and you also got some guidelines for proper HBase schema design. If you have any questions about this blog post, please ask them in the comments section below.

Want to learn more? Take a look at these resources that I used to prepare this blog post:

Here are some additional resources for helping you get started with HBase:

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。