7.HBase In Action 第一章-HBase簡介(1.2.1 典型的網絡搜索問題:Bigtable的起原)

Search is the act of locating information you care about: for example, searching for pages in a textbook that contain the topic you want to read about, or for web pages that have the information you’re looking for. Searching for documents containing particular terms requires looking up indexes that map terms to the documents that contain them. To enable search, you have to build these indexes. This is precisely what Google and other search engines do. Their document corpus is the entire internet; the search terms are whatever you type in the search box.web

搜索是咱們所關注的查找信息的行爲:例如,在教科書中搜索包含您想閱讀的主題的書頁,或搜索含有你想尋找的信息的網頁。搜索包含特定詞的文件,是須要查詢文檔索引的,這些索引映射了詞與文檔的關係。要啓用搜索,你必須先創建這些索引。谷歌和其餘搜索引擎就正是這麼作的。他們的文檔語料庫是整個互聯網級的,搜索條件是你在搜索框中鍵入的任何內容。api

http://www.uifanr.com/網絡

Bigtable, and by extension HBase, provides storage for this corpus of documents. Bigtable supports row-level access so crawlers can insert and update documents individually. The search index can be generated efficiently via MapReduce directly against Bigtable. Individual document results can be retrieved directly. Support for all these access patterns was key in influencing the design of Bigtable. Figure 1.1 illustrates the critical role of Bigtable in the web-search application.app

Bigtable和HBase爲語料庫中的文檔提供了存儲。 Bigtable支持行級的訪問,以便抓取工具能夠插入或更新單獨的文檔。搜索索引能夠經過Bigtable提供的MapReduce而產生。特定的文檔結果能夠直接被檢索出來。支持全部這些訪問模式是決定Bigtable設計的關鍵。圖1.1顯示了Bigtable在網絡搜索應用程序中的關鍵做用。ide

http://www.uifanr.com/工具

NOTE:In the interest of brevity, this look at Bigtable doesn’t do the original authors justice. We highly recommend the three papers on Google File System, MapReduce, and Bigtable as required reading for anyone curious about these technologies. You won’t be disappointed性能

注:爲簡便起見,這個看的Bigtable不作原做者正義。咱們強烈建議任何好奇這些技術的人都閱讀一下「谷歌文件系統」,「MapReduce」,和「Bigtable」這三篇論文。看事後,你必定不會感到失望的。ui

http://www.uifanr.com/this

clipboard[9]

Figure 1.1 Providing web-search results using Bigtable, simplified. The crawlers—applications collecting web pages—store their data in Bigtable. A MapReduce process scans the table to produce the search  index. Search results are queried from Bigtable to display to the user.搜索引擎

1.Crawlers constantly scour the internet for new pages. Those pages are stored as individual records in Bigtable.

2.A MapReduce job runs over the entire table, generating search indexes for the web search application

3.The user initiates a web search request.

4.The web search application queries the search indexes and retries matching documents directly from Bigtable.

5.Search results are presented to the user.

圖1.1 簡單地了介紹了下基於Bigtable的網頁搜索處理流程。爬蟲是收集的網頁的應用程序,把數據存儲在Bigtable中。MapReduce進程掃描表來創建搜索索引。搜尋結果是從Bigtable中查詢出來的,並顯示給用戶。

1. 爬蟲們不斷抓取互聯網新的頁面。這些網頁都存儲在Bigtable的文檔記錄。

2. MapReduce的做業運行在整個表上,爲頁面搜索應用程序生成搜索索引。

3. 用戶發起網絡頁面搜索請求。

4. 網絡頁面搜索應用程序查詢搜索索引,而後直接從Bigtable中找出匹配的文檔。

5. 查詢的結果返回並呈現給用戶。

http://www.uifanr.com/

With the canonical HBase example covered, let’s look at other places where HBase has found purchase. The adoption of HBase has grown rapidly over the last couple of years. This has been fueled by the system becoming more reliable and performant, due in large part to the engineering effort invested by the various companies backing and using it. As more commercial vendors provide support, users are increasingly confident in using the system for critical applications. A technology designed to store a continuously updated copy of the internet turns out to be pretty good at other things internet-related. HBase has found a home filling a variety of roles in and around social-networking companies. From storing communications between individuals to communication analytics, HBase has become a critical infrastructure at Facebook, Twitter, and StumbleUpon, to name a few.

瞭解了典型的HBase應用案例後,讓咱們來看看其餘有HBase市場的地方。在過去幾年裏,基於HBase的應用發展迅速。這帶動了HBase系統變得更可靠,更高性能,這一變化在很大程度上是因爲一些公司支持並使用它,爲它投入了工程設計與開發的精力。隨着愈來愈多的商業供應商對HBase提供支持,用戶愈來愈有信心在關鍵應用系統裏使用HBase。這一原來設計用來存儲互聯網不斷更新的數據的技術,變得也適用其餘的東西還不錯互聯網相關。 HBase的已經找到了家和周圍的社交網絡公司灌裝各類角色。從存儲我的通信分析之間的通訊,HBase的已成爲在Facebook,Twitter的,和StumbleUpon一個重要的基礎設施,僅舉幾例。

http://www.uifanr.com/

HBase has been used in three major types of use cases but it’s not limited to those. In the interest of keeping this chapter short and sweet, we’ll cover the major use cases here.

HBase的已被用於在用例3主要類型,但它並不侷限於這些。爲了保持這一章簡短而親切的利益,咱們將在這裏涵蓋了主要的用例。

http://www.uifanr.com/

相關文章
相關標籤/搜索