ZooKeeper官方文檔翻譯——ZooKeeper Overview 3.4.6

ZooKeeperhtml

ZooKeeper: A Distributed Coordination Service for Distributed Applications

ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.java

Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.node

Zookeeper是一個爲分佈式應用提供分佈式、開源的調度服務。它暴露一組簡單的基本架構,分佈式應用能夠在其上面來實現高層次服務用於同步、維護配置、分組和命名。它被設計得容易編程,在類似的文件系統樹結構目錄下使用一個數據模型。它運行在java環境上和綁定Java和C。ios

調度服務是出了名的難。它們特別容易出錯例如競態條件和死鎖。ZooKeeper的動機是減輕分佈式應用從零開始實現調度服務的責任。算法

Design Goals (設計目的)

ZooKeeper is simple. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system. The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can acheive high throughput and low latency numbers.數據庫

The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access. The performance aspects of ZooKeeper means it can be used in large, distributed systems. The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client.apache

ZooKeeper是簡單的。ZooKeeper容許分佈式進程經過一個共享的跟標準文件系統類似的架構的層級命名空間來互相調度。命名空間包含稱爲znodes的數據寄存器(在ZooKeeper的說法中),這些相似於文件和目錄。不像傳統的文件系統被設計用於存儲,ZooKeeper數據是保存在內存中,那就意味着ZooKeeper可以得到高吞吐量和低延遲。編程

ZooKeeper實現高性能、高可能性和嚴格的訪問命令。性能方面意味着它能夠用在大型分佈式系統。可靠性方面使它避免了單點故障。嚴格的訪問命令意味着複雜的同步原語能夠在客戶端實現。緩存

ZooKeeper is replicated. Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble.服務器

ZooKeeper是可複製的。像它所調度的分佈式進程,ZooKeeper他自己也是能夠被複制來構成一組集合。

 ZooKeeper Service

The servers that make up the ZooKeeper service must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store. As long as a majority of the servers are available, the ZooKeeper service will be available.

Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.

構成ZooKeeper服務的全部服務器都必須知道彼此。它們維持着一個狀態相關的內存圖像,和事務日誌和快照保存在一個持久化的倉庫。只要大多數服務器是可用的,那麼ZooKeeper服務就可用。

客戶端與一個單獨的服務器創建鏈接。它們之間經過發送請求、得到回覆,得到觀察事件和發送心跳來維持一個TCP鏈接。若是客戶端與服務器的TCP鏈接斷開了,那麼客戶端會去鏈接另外一個服務器。

ZooKeeper is ordered. ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives.

ZooKeeper是有序的。ZooKeeper用一個數字來記錄每一個反映全部ZooKeeper事務的順序。後續的操做可使用順序來實現高水平的抽象,例如同步原語。

ZooKeeper is fast. It is especially fast in "read-dominant" workloads. ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

ZooKeeper是快速的。尤爲是在讀取性能特性明顯。ZooKeeper應用運行在成千臺機器上,而且它在讀取上比寫入表現得更好,比率大概爲10:1

Data model and the hierarchical namespace

The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper's name space is identified by a path.

ZooKeeper所提供的命名空間跟標準文件系統很類似。路徑中一系列元素是用斜槓(/)分隔的。每一個節點在ZooKeper命名空間中是用路徑來識別的。

   

        ZooKeeper's Hierarchical Namespace      

Nodes and ephemeral nodes(節點和臨時節點)

Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory. (ZooKeeper was designed to store coordination data: status information, configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.) We use the term znode to make it clear that we are talking about ZooKeeper data nodes.

Znodes maintain a stat structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.

The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement [tbd].

不像標準的文件系統,ZooKeeper命名空間中每一個節點擁有與它以及它的孩子有關的數據。就像擁有一個文件系統同樣容許一個文件也作爲一個目錄。(ZooKeeper被設計爲儲存調度數據:狀態信息,配置信息、位置信息等等,因此儲存每一個節點中的數據一般很小,在字節到千字節之間)。當咱們討論ZooKeeper數據節點時使用「znode」這個稱呼使得表述清晰。

Znodes維持一個狀態結構包括數據改變的狀態碼,ACL改變和時間戳,容許緩存驗證和調度更新信息。znodes的每一個時間點的數據改變,版本號會增長。例如,當客戶端得到數據時也接收到數據的版本。

儲存在每一個znode命名空間中的數據的讀寫都是原子性的。讀取時得到與znode相關聯的全部數據,寫入時替換全部數據。每一個節點都有嚴格的准入控制來限制誰能夠作什麼。

ZooKeeper也擁有臨時節點的概念。這些節點一直存在只要建立這些節點的會話仍是活躍的。當會話結束時節點被刪除。當你想要實現臨時節點是有用(待定)。

Conditional updates and watches(條件更新和監控)

ZooKeeper supports the concept of watches. Clients can set a watch on a znodes. A watch will be triggered and removed when the znode changes. When a watch is triggered the client receives a packet saying that the znode has changed. And if the connection between the client and one of the Zoo Keeper servers is broken, the client will receive a local notification. These can be used to [tbd].

ZooKeeper支持監控的概念。客戶端對znode設置一個監控。當znode改變時監控會觸發並移除。當一個監控觸發時客戶端會收到一個數據包包含znode已經改變的信息。若是當客戶端和ZooKeeper服務器的鏈接斷開,客戶端將會收到一個本地通知。這些均可以用來(待定)

Guarantees(保證)

ZooKeeper is very fast and very simple. Since its goal, though, is to be a basis for the construction of more complicated services, such as synchronization, it provides a set of guarantees. These are:

    • Sequential Consistency - Updates from a client will be applied in the order that they were sent.
    • Atomicity - Updates either succeed or fail. No partial results.
    • Single System Image - A client will see the same view of the service regardless of the server that it connects to.
    • Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
    • Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

For more information on these, and how they can be used, see [tbd]

ZooKeeper很是快和很是簡單。然而它一直以來的目標,是做爲更多複雜服務結構的基礎,例如同步,提供一系列的保證。它們是:

    • 順序一致性:來自客戶端的更新會按照它們的發送順序進行應用。
    • 原子性:更新只有成功或者失敗,沒有中間狀態
    • 單一系統圖像:客戶端不管鏈接哪一個服務器,它所獲得ZooKeeper服務的圖像都是一致的
    • 可靠性:一旦更新被應用,那麼它將會一直持續保存直到更新被覆蓋。
    • 時效性:系統的客戶端視圖在一個特定的時間裏都保證是最新的。

更多關於這些保證的信息和如何使用,能夠看[待定]

Simple API

One of the design goals of ZooKeeper is provide a very simple programming interface. As a result, it supports only these operations:

ZooKeeper的一個設計目標就是提供簡單的編程接口。所以,他只提供這些操做:

create(建立)

creates a node at a location in the tree(在樹結構位置中建立一個節點)

delete(刪除)

deletes a node(刪除一個節點)

exists(判斷是否存在)

tests if a node exists at a location(判斷一個節點是否存在麼謳歌位置上)

get data(獲取數據)

reads the data from a node(從一個節點讀取數據)

set data(設置數據)

writes data to a node(往一個節點裏寫入數據)

get children(得到子集)

retrieves a list of children of a node(得到一個節點的子集)

sync(同步)

waits for data to be propagated(等待數據同步到每一個節點上)

For a more in-depth discussion on these, and how they can be used to implement higher level operations, please refer to [tbd]

這些方法的深刻討論和如何是用來實現更高程度的操做,請參考[tbd]

Implementation(實現)

ZooKeeper Components shows the high-level components of the ZooKeeper service. With the exception of the request processor, each of the servers that make up the ZooKeeper service replicates its own copy of each of components.

    ZooKeeper Components

                ZooKeeper Components   

The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.

Every ZooKeeper server services clients. Clients connect to exactly one server to submit irequests. Read requests are serviced from the local replica of each server database. Requests that change the state of the service, write requests, are processed by an agreement protocol.

As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.

ZooKeeper uses a custom atomic messaging protocol. Since the messaging layer is atomic, ZooKeeper can guarantee that the local replicas never diverge. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.

ZooKeeper Components 展現了ZooKeeper服務的高級別的組件。除了請求處理器,組成ZooKeeper服務的每一個服務器複製它自己每一個組件的副本。

副本數據庫是一個包含整個數據樹的內存數據庫。更新信息將被記錄在磁盤中保證可恢復性,在它們被寫到內存數據庫以前序列化寫到磁盤中。

每一個ZooKeeper服務器服務客戶端。客戶端鏈接到一個正確的服務器來提交請求。讀取請求是由每一個服務器的數據庫的本地副本提供服務的。改變服務的請求和寫請求都是由一致性協議來處理的。

全部來自客戶端的寫請求做爲協議的一部分都將轉發給一個單獨的服務器,稱之爲leader。剩下的ZooKeeper服務器稱之爲followers,接收來自leader的信息提案和達成信息傳輸的一致性。消息傳遞層負責leaders的失效替換和同步leaders和followers。

ZooKeeper使用一個自定義的原子的消息傳遞協議。因此消息傳遞層是原子性的。ZooKeeper能夠保證本地副本不會分割。當leader服務器收到一個寫請求,它會計算這個寫入操做執行時系統的狀態和獲取這個操做轉化成一個事務的新狀態。

Uses(使用)

The programming interface to ZooKeeper is deliberately simple. With it, however, you can implement higher order operations, such as synchronizations primitives, group membership, ownership, etc. Some distributed applications have used it to: [tbd: add uses from white paper and video presentation.] For more information, see [tbd]

ZooKeeper的編程接口特地設計得簡單。然而,你可使用它來實現高層次的命令操做,例如同步原語,組的成員關係。全部權等。一些分佈式應用可使用它。

Performance(性能)

ZooKeeper is designed to be highly performant. But is it? The results of the ZooKeeper's development team at Yahoo! Research indicate that it is. (See ZooKeeper Throughput as the Read-Write Ratio Varies.) It is especially high performance in applications where reads outnumber writes, since writes involve synchronizing the state of all servers. (Reads outnumbering writes is typically the case for a coordination service.)

ZooKeeper是設計成高性能的,可是真的這樣麼?ZooKeeper在雅虎的研發團隊研究結果顯明它真的如此。(看ZooKeeper Throughput as the Read-Write Ratio Varies.)應用在讀取性能上表現地寫性能高得多,由於寫操做要涉及全部服務器的同步。(在調度服務中讀性能超過寫性能是廣泛的狀況)

   ZooKeeper Throughput as the Read-Write Ratio Varies

       ZooKeeper Throughput as the Read-Write Ratio Varies

The figure ZooKeeper Throughput as the Read-Write Ratio Varies is a throughput graph of ZooKeeper release 3.2 running on servers with dual 2Ghz Xeon and two SATA 15K RPM drives. One drive was used as a dedicated ZooKeeper log device. The snapshots were written to the OS drive. Write requests were 1K writes and the reads were 1K reads. "Servers" indicate the size of the ZooKeeper ensemble, the number of servers that make up the service. Approximately 30 other servers were used to simulate the clients. The ZooKeeper ensemble was configured such that leaders do not allow connections from clients.

 ZooKeeper Throughput as the Read-Write Ratio Varies 圖是ZooKeeper3.2發佈版本運行在配置爲兩個2GHz的至強芯片和兩個SATA 15K RPM驅動器上的吞吐量圖表。一個驅動器用來ZooKeeper專用的日誌設備。快照寫到系統驅動。1K的讀和1K的寫。「服務器」數代表ZooKeeper集羣的大小,服務器的數量構成服務。大概30個服務器用於模擬客戶端。ZooKeeper集羣配置leaders不容許客戶端的鏈接。

Note(說明)

In version 3.2 r/w performance improved by ~2x compared to the previous 3.1 release.

Benchmarks also indicate that it is reliable, too. Reliability in the Presence of Errors shows how a deployment responds to various failures. The events marked in the figure are the following:

    1. Failure and recovery of a follower
    2. Failure and recovery of a different follower
    3. Failure of the leader
    4. Failure and recovery of two followers
    5. Failure of another leader

3.2版本比以前3.1版本提升了兩倍性能。

基準測試也代表它的可靠性。Reliability in the Presence of Errors 展現了部署的框架如何應用各類失效。下面是圖像中標誌的事件:

    1. follower的失效和恢復
    2. 不一樣的follower的失效和恢復
    3. leader的失效
    4. 兩個follower的失效和恢復
    5. 另外一個 leader 的失效

Reliability(可靠性)

To show the behavior of the system over time as failures are injected we ran a ZooKeeper service made up of 7 machines. We ran the same saturation benchmark as before, but this time we kept the write percentage at a constant 30%, which is a conservative ratio of our expected workloads.

展現運行在7臺機器上的ZooKeeper服務在故障發生後隨着時間的推動系統的行爲。咱們運行跟上面測試一樣的環境上,但此次只保持30%的寫入,保持在一個保守的負載。

    Reliability in the Presence of Errors

                                   Reliability in the Presence of Errors

The are a few important observations from this graph. First, if followers fail and recover quickly, then ZooKeeper is able to sustain a high throughput despite the failure. But maybe more importantly, the leader election algorithm allows for the system to recover fast enough to prevent throughput from dropping substantially. In our observations, ZooKeeper takes less than 200ms to elect a new leader. Third, as followers recover, ZooKeeper is able to raise throughput again once they start processing requests.

從圖表中咱們獲得一些重要的觀察。第一,若是followers失效和迅速恢復,zooKeeper可以保持一個高吞吐量無視失效。可是可能重要的是,leader選舉算法容許系統快速恢復來避免吞吐量的大幅降低。在咱們的觀察當中,ZooKeeper只須要不到200ms來選舉中一個新的leader。第三,隨着follower恢復,ZooKeeper可以提升吞吐量一旦他們開始處理請求。

The ZooKeeper Project(ZooKeeper項目)

ZooKeeper has been successfully used in many industrial applications. It is used at Yahoo! as the coordination and failure recovery service for Yahoo! Message Broker, which is a highly scalable publish-subscribe system managing thousands of topics for replication and data delivery. It is used by the Fetching Service for Yahoo! crawler, where it also manages failure recovery. A number of Yahoo! advertising systems also use ZooKeeper to implement reliable services.

All users and developers are encouraged to join the community and contribute their expertise. See the Zookeeper Project on Apache for more information.

ZooKeeper已經成功運行在許多單獨的項目中。它被Yahoo!用來做爲Yahoo!消息中間件,一個具備高可擴展性的用於管理上千個話題的複製和數據傳輸的發佈-訂閱系統的調度和失效恢復服務。也用在Yahoo!爬蟲程序中管理失效恢復。大量的Yahoo!廣告系統也它來實現可靠地服務。

 

*因爲譯者自己能力有限,因此譯文中確定會出現表述不正確的地方,請你們多多包涵,也但願你們可以指出文中翻譯得不對或者不許確的地方,共同探討進步,謝謝。

相關文章
相關標籤/搜索