Pretend that you’re working on an open source project for searching the web by crawling websites and indexing them. You have an implementation that works on a small cluster of machines but requires a lot of manual steps. Pretend too that you’re working on this project around the same time Google publishes papers about its data storage and processing frameworks. Clearly, you would jump on these publications and spearhead an open source implementation based on them. Okay, maybe you wouldn’t, and we surely didn’t; but Doug Cutting and Mike Cafarella did.web
假設你正在作一個開源的項目,經過抓取和索引網站來搜索網頁。你的應用系統跑在一個幾臺機器組成的小羣集上,須要大量的手工步驟來配置實現。又假設,你在作這個項目同時谷歌發佈了它的數據存儲和數據處理的框架。很明顯,你會立刻查找這些資料並在他們開源框架的基礎上來實現本身的開源應用。也許你不會這麼作,咱們確定也會這麼作,但道格卡丁和邁克Cafarella倒是這麼作的。app
Built out of Apache Lucene, Nutch was their open source web-search project and the motivation for the first implementation of Hadoop. From there, Hadoop began to receive lots of attention from Yahoo!, which hired Cutting and others to work on it full time. From there, Hadoop was extracted out of Nutch and eventually became an Apache top-level project. With Hadoop well underway and the Bigtable paper published, the groundwork existed to implement an open source Bigtable on top of Hadoop. In 2007, Cafarella released code for an experimental, open source Bigtable.He called it HBase. The startup Powerset decided to dedicate Jim Kellerman and Michael Stack to work on this Bigtable analog as a way of contributing back to the open source community on which it relied.ide
Nutch爬蟲系統,內置了Apache的Lucene,是Apache的開源Web搜索項目,也是Hadoop最早實施應用的動機。在這個項目實施過程當中,Hadoop開始受到雅虎大量的關注,雅虎聘請了卡丁和其餘人開始全職推動Hadoop的開發工做。在雅虎,Hadoop從Nutch的提取了出來,並最終成爲Apache下的一個頂級項目。Hadoop研發的進行和Bigtable論文的發表,奠基了在Hadoop之上實現一個開源的Bigtable的基礎。 2007年,Cafarella發佈了一個實驗性開源的Bigtable的代碼. 他把它稱爲HBase。這促使了吉姆·凱勒曼和邁克爾斯塔克決定在此Bigtable的基礎上持續推動該項目,做爲回報開源社區的一種方式。工具
HBase proved to be a powerful tool, especially in places where Hadoop was already in use. Even in its infancy, it quickly found production deployment and developer support from other companies. Today, HBase is a top-level Apache project with thriving developer and user communities. It has become a core infrastructure component and is being run in production at scale worldwide in companies like StumbleUpon, Trend Micro, Facebook, Twitter, Salesforce, and Adobe網站
HBase證實了自身是一個強大的工具,尤爲是在那些Hadoop已經在使用的場合。即便它才處於起步階段,但很快就被其餘公司用於生產部署和得到了其餘公司裏開發者的支持。今天,HBase做爲頂級的Apache項目,它的開發者和用戶社區處於欣欣向榮的狀態。它已成爲一個核心基礎設施組件, 在全球各個公司裏的生產系統裏運行着,如,StumbleUpon公司,趨勢科技,Facebook,Twitter,Salesforce和Adobe公司。ui
HBase isn’t a cure-all of data management problems, and you might include another technology in your stack at a later point for a different use case. Let’s look at how HBase is being used today and the types of applications people have built using it. Through this discussion, you’ll gain a feel for the kinds of data problems HBase can solve and has been used to tackle. code
HBase並非包治百病的,它並不能解決全部的數據管理問題,你可能須要針對不一樣的技術場景使用不一樣的技術框架。讓咱們來看看如今人們是如何應用HBase的和人們用它來構建了什麼類型的應用系統。經過接下來的討論,你將會得到HBase能夠解決什麼問題及如何解決問題方面的經驗。