[MapReduce] Google三駕馬車:GFS、MapReduce和Bigtable

  聲明:此文轉載自博客開發團隊的博客,尊重原創工做。該文適合學分佈式系統以前,做爲背景介紹來讀。html

  談到分佈式系統,就不得不提Google的三駕馬車:Google FS[1],MapReduce[2],Bigtable[3]。linux

  雖然Google沒有公佈這三個產品的源碼,可是他發佈了這三個產品的詳細設計論文。並且,Yahoo資助的Hadoop也有按照這三篇論文的開源Java實現:Hadoop對應MapReduce, Hadoop Distributed File System (HDFS)對應Google FS,Hbase對應Bigtable。不過在性能上Hadoop比Google要差不少,參見表1。數據庫

  表1:Hbase和BigTable性能比較(來源於http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation)apache

Experiment編程

HBase20070916緩存

BigTable架構

random reads併發

272app

1212負載均衡

random reads (mem)

Not implemented

10811

random writes

1460

8850

sequential reads

267

4425

sequential writes

1278

8547

Scans

3692

15385

  如下分別介紹這三個產品:

1. Google FS

  GFS是一個可擴展的分佈式文件系統,用於大型的、分佈式的、對大量數據進行訪問的應用。它運行於廉價的普通硬件上,提供容錯功能。

  分佈式系統漫談一 <wbr>—— Google三駕馬車: <wbr>GFS,mapreduce,Bigtable

  圖1 GFS Architecture

  (1)GFS的結構

  1. GFS的結構圖見圖1,由一個master和大量的chunkserver構成,

  2. 不像Amazon Dynamo的沒有主的設計,Google設置一個主來保存目錄和索引信息,這是爲了簡化系統結果,提升性能來考慮的,可是這就會形成主成爲單點故障或者瓶頸。爲了消除主的單點故障Google把每一個chunk設置的很大(64M),這樣,因爲代碼訪問數據的本地性,application端和master的交互會減小,而主要數據流量都是Application和chunkserver之間的訪問。

  3. 另外,master全部信息都存儲在內存裏,啓動時信息從chunkserver中獲取。提升了master的性能和吞吐量,也有利於master當掉後,很容易把後備j機器切換成master。

  4. 客戶端和chunkserver都不對文件數據單獨作緩存,只是用linux文件系統本身的緩存

  「The master stores three major types of metadata: the file and chunk namespaces, the mapping from files to chunks, and the locations of each chunk’s replicas.」

  「Having a single master vastly simplifies our design and enables the master to make sophisticated chunk placement and replication decisions using global knowledge. However,we must minimize its involvement in reads and writes so that it does not become a bottleneck. Clients never read and write file data through the master. Instead, a client asks the master which chunkservers it should contact. It caches this information for a limited time and interacts with the chunkservers directly for many subsequent operations.」

  「Neither the client nor the chunkserver caches file data.Client caches offer little benefit because most applications stream through huge files or have working sets too large to be cached. Not having them simplifies the client and the overall system by eliminating cache coherence issues.(Clients do cache metadata, however.) Chunkservers need not cache file data because chunks are stored as local files and so Linux’s buffer cache already keeps frequently accessed data in memory.」

  (2)GFS的複製

  GFS典型的複製到3臺機器上,參看圖2

分佈式系統漫談一 <wbr>—— Google三駕馬車: <wbr>GFS,mapreduce,Bigtable

  圖2 一次寫操做的控制流和數據流

  (3) 對外的接口

  和文件系統相似,GFS對外提供create, delete,open, close, read, 和 write 操做。另外,GFS還新增了兩個接口snapshot and record append,snapshot。有關snapshot的解釋:

  「Moreover, GFS has snapshot and record append operations. Snapshot creates a copy of a file or a directory tree at low cost.

Record append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of each individual client’s append.」

2. MapReduce

  MapReduce是針對分佈式並行計算的一套編程模型。

  講到並行計算,就不能不談到微軟的Herb Sutter在2005年發表的文章」 The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software」[6],主要意思是經過提升cpu主頻的方式來提升程序的性能很快就要過去了,cpu的設計方向也主要是多核,超線程等併發上。可是之前的程序並不能自動的獲得多核的好處,只有編寫併發程序,才能真正得到多核的好處。分佈式計算也是同樣。

  分佈式系統漫談一 <wbr>—— Google三駕馬車: <wbr>GFS,mapreduce,Bigtable

  圖3 MapReduce Execution overview

  1)MapReduce是由Map和reduce組成,來自於Lisp,Map是影射,把指令分發到多個worker上去,Reduce是規約,把Map的worker計算出來的結果合併。(參見圖3)

  2)Google的MapReduce實現使用GFS存儲數據。

  3)MapReduce可用於Distributed Grep,Count of URL Access Frequency,ReverseWeb-Link Graph,Distributed Sort,Inverted Index

3. Bigtable

  就像文件系統須要數據庫來存儲結構化數據同樣,GFS也須要Bigtable來存儲結構化數據。

  1)BigTable 是創建在 GFS ,Scheduler ,Lock Service 和 MapReduce 之上的。

  2)每一個Table都是一個多維的稀疏圖

  3)爲了管理巨大的Table,把Table根據行分割,這些分割後的數據統稱爲:Tablets。每一個Tablets大概有 100-200 MB,每一個機器存儲100個左右的 Tablets。底層的架構是:GFS。因爲GFS是一種分佈式的文件系統,採用Tablets的機制後,能夠得到很好的負載均衡。好比:能夠把常常響應的表移動到其餘空閒機器上,而後快速重建。

參考文獻

  [1]The Google File System; http://labs.google.com/papers/gfs-sosp2003.pdf

  [2]MapReduce: Simplifed Data Processing on Large Clusters; http://labs.google.com/papers/mapreduce-osdi04.pdf

  [3]Bigtable: A Distributed Storage System for Structured Data;http://labs.google.com/papers/bigtable-osdi06.pdf

  [4]Hadoop ; http://lucene.apache.org/hadoop/

  [5]Hbase: Bigtable-like structured storage for Hadoop HDFS;http://wiki.apache.org/lucene-hadoop/Hbase

  [6]The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software;http://www.gotw.ca/publications/concurrency-ddj.htm

相關文章
相關標籤/搜索