適合分佈式系統工程師的分佈式系統理論中英對照

時間 2019-12-26

原文原文鏈接

Distributed systems theory for the distributed systems engineer

適合分佈式系統工程師的分佈式系統理論

Gwen Shapira, who at the time was an engineer at Cloudera and now is spreading the Kafka gospel, asked a question on Twitter that got me thinking.html

Gwen Shapira曾在Cloudera作工程師，如今宣傳Kafka，他在Twitter問了如下問題，使我有所思考。node

I need to improve my proficiency in distributed systems theory. Where do I start? Any recommended books?
我想在分佈式理論上有所提高。應該從哪開始？有推薦的書？
— Gwen (Chen) Shapira (@gwenshap) August 7, 2014

My response of old might have been 「well, here’s the FLP paper, and here’s the Paxos paper, and here’s the Byzantine generals paper…」,
我第一反應是「能夠看：FLP論文、paxos論文、Byzantine將軍論文」,
and I’d have prescribed a laundry list of primary source material which would have taken at least six months to get through if you rushed.
我推薦的主要閱讀材料，若是你貿然去讀，你至少要閱讀6個月纔會有感受。
But I’ve come to thinking that recommending a ton of theoretical papers is often precisely the wrong way to go about learning distributed systems theory (unless you are in a PhD program).
由此可知，推薦一噸的理論論文讓你閱讀，這是瞭解分佈式系統的錯誤的方式。（除非你在讀博士）
Papers are usually deep, usually complex, and require both serious study, and usually significant experience to glean their important contributions and to place them in context.
論文通常是深奧、複雜的，並且須要一系列學習和豐富的經驗才能感受到其貢獻、才能其放到對應的場景(以理解和應用)。
What good is requiring that level of expertise of engineers?
工程師瞭解分佈式理論有什麼好處？ios

And yet, unfortunately, there’s a paucity of good ‘bridge’ material that summarises, distills and contextualises the important results and ideas in distributed systems theory;
很不幸，幾乎沒有好的引導文章，來總結、提煉、場景化分佈式系統理論中的重要結論和想法；
particularly material that does so without condescending.
特別是通俗易懂的引導文章更沒有。
Considering that gap lead me to another interesting question:
考慮這樣的空白區域，讓我想問另外一個問題：web

What distributed systems theory should a distributed systems engineer know?
一個分佈式系統工程師應該瞭解什麼樣的分佈式系統理論？算法

A little theory is, in this case, not such a dangerous thing.
這種狀況下，瞭解一點點理論並非壞事。
So I tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer.
我平常工做是一個分佈式系統工程師，我認爲適合個人基本概念，下面會給出這些基本概念。
Let me know what you think I missed!
你認爲我缺失的請告知我!api

First steps 準備

These four readings do a pretty good job of explaining what about building distributed systems is challenging.
下面四個讀物解釋了構建分佈式系統會遇到的困難。
Collectively they outline a set of abstract but technical difficulties that the distributed systems engineer has to overcome, and set the stage for the more detailed investigation in later sections
這些讀物都勾勒了一些列抽象而非技術的困難，分佈式系統工程師必需要克服這些困難。這些讀物的後面章節有更詳細的研究。安全

Distributed Systems for Fun and Profit is a short book which tries to cover some of the basic issues in distributed systems including the role of time and different strategies for replication.
Distributed Systems for Fun and Profit 是一本小書，它想覆蓋分佈式系統中的一些基本問題，包括時鐘所起的做用、不一樣策略的複製。app

Notes on distributed systems for young bloods - not theory, but a good practical counterbalance to keep the rest of your reading grounded.
Notes on distributed systems for young bloods - 非理論，而是一個很好的實踐，以讓你落到實處。cors

A Note on Distributed Systems - a classic paper on why you can’t just pretend all remote interactions are like local objects.
A Note on Distributed Systems - 一個經典論文，關於爲何你不能僞裝全部遠程交互像本地對象同樣。less

The fallacies of distributed computing - 8 fallacies of distributed computing that set the stage for the kinds of things system designers forget.
The fallacies of distributed computing 分佈式計算的8個錯誤的推論，以提醒系統設計者。

You should know about _safety and liveness properties_:
你應該知道安全和活力:

safety properties say that nothing bad will ever happen. For example, the property of never returning an inconsistent value is a safety property, as is never electing two leaders at the same time.
安全說的是永遠不會發生壞事。好比，不返回不一致的值是一種安全，同一時刻不會選出兩個主節點也是一種安全。
liveness properties say that something good will eventually happen. For example, saying that a system will eventually return a result to every API call is a liveness property, as is guaranteeing that a write to disk always eventually completes.
活力說的是好事情終究會發生。好比，對於每一個api調用，一個系統終究會返回一個結果，這是一種活力；保證一次寫磁盤最終總能結束，這是一種活力。

Failure and Time 失敗和時鐘

Many difficulties that the distributed systems engineer faces can be blamed on two underlying causes:
分佈式系統工程師面對的許多困難能夠歸結爲如下兩個緣由:

Processes may fail
進程可能失敗
There is no good way to tell that they have done so

There is a very deep relationship between what, if anything, processes share about their knowledge of _time_, what failure scenarios are possible to detect, and what algorithms and primitives may be correctly implemented.
進程間怎麼共用時鐘、什麼樣的失敗能夠檢測、什麼樣的算法和原語能夠被正確實現，這三者之間有很深的聯繫。
Most of the time, we assume that two different nodes have absolutely no shared knowledge of what time it is, or how quickly time passes.
通常狀況下，咱們假設不一樣節點絕對沒法共用時鐘(時刻值或流過了多少時間)

You should know:
你應該知道:

The (partial) hierarchy of failure modes: crash stop -> omission -> Byzantine. You should understand that what is possible at the top of the hierarchy must be possible at lower levels, and what is impossible at lower levels must be impossible at higher levels.
失敗模型的層次：節點崩潰後關機 -> 節點崩潰後死機(通過無限長時間後才響應) -> 惡意節點 (不遵照約定的規則) 。各個層次間逐漸將限制放鬆，你應該知道這些限制.
How you decide whether an event happened before another event in the absence of any shared clock. This means Lamport clocks and their generalisation to Vector clocks, but also see the Dynamo paper.
兩個節點之間，沒有任何共用時鐘，你怎麼肯定一個節點上的一個事件和另外一個節點上的另外一個事件之間的前後順序. 這就要閱讀Lamport時鐘和更通常化的Vector時鐘, 也能夠閱讀Dynamo論文.
How big an impact the possibility of even a single failure can actually have on our ability to implement correct distributed systems (see my notes on the FLP result below).
容許單節點失敗對實現正確的分佈式系統有多大的衝擊？(見下面FLP結論處)
Different models of time: synchronous, partially synchronous and asynchronous
時鐘的不一樣模型：同步、部分同部、異步
That detecting failures is a fundamental problem, one that trades off accuracy and completeness - yet another safety vs liveness conflict. The paper that really set out failure detection as a theoretical problem is Chandra and Toueg’s ‘Unreliable Failure Detectors for Reliable Distributed Systems’. But there are several shorter summaries around - I quite like this random one from Stanford.
失敗檢測是一個基本問題，失敗檢測能夠平衡準確度和完成度(若是能檢測到失敗了，則能夠允許不那麼準確、沒徹底作完)，失敗檢測也能夠解決安全和活力間的衝突。把失敗檢測做爲理論來研究的論文是 Chandra and Toueg’s ‘Unreliable Failure Detectors for Reliable Distributed Systems’. 不過也有一些簡短的總結-我特別喜歡this random one from Stanford.

The basic tension of fault tolerance 容錯致使的基本矛盾

A system that tolerates some faults without degrading must be able to act as though those faults had not occurred.
一個系統容忍一些錯誤而沒有降級必須能當成就像這些錯誤沒有發生過同樣。
This means usually that parts of the system must do work redundantly, but doing more work than is absolutely necessary typically carries a cost both in performance and resource consumption.
這意味着系統的一部分要冗餘地工做(一樣的功能部署多個節點)，冗餘是絕對必要的，冗餘通常會帶來性能和資源的消耗。
This is the basic tension of adding fault tolerance to a system.
這就是給一個系統添加冗餘的基本矛盾。

You should know:
你應該知道：

The quorum technique for ensuring single-copy serialisability. See Skeen’s original paper, but perhaps better is Wikipedia’s entry).
確保串行單複製的多數派技術. 見 Skeen的原始論文, 不過或許更好的是 Wikipedia’s entry).

(多數派中有一個是主節點,其他爲從節點，以主節點接收到的寫請求序列爲準[串行]，主節點單方面的要求從們接受字節的寫請求序列[從節點不得反抗、不得有異議：從節點是非惡意的、遵照全局規則的、非拜占庭的])

About 2-phase-commit, 3-phase-commit and Paxos, and why they have different fault-tolerance properties.
兩步提交、三步提交、Paxos, 以及爲何他們不一樣於容錯.
How eventual consistency, and other techniques, seek to avoid this tension at the cost of weaker guarantees about system behaviour. The Dynamo paper is a great place to start, but also Pat Helland’s classic Life Beyond Transactions is a must-read.
最終一致性、其餘技術以對系統行爲作更弱的保證爲代價來設法避開此矛盾 . 能夠看 Dynamo 論文 , 不過必需要讀 Pat Helland的論文經典 Life Beyond Transactions .

Basic primitives 基本原語

There are few agreed-upon basic building blocks in distributed systems, but more are beginning to emerge. You should know what the following problems are, and where to find a solution for them:
在分佈式系統中，不多有約定的基本構建塊，更多的是處於造成中的基本構建塊。有應該知道下面的問題是什麼，而且從哪能找到他們的解決方案：

Leader election (e.g. the Bully algorithm)
主節點選舉 (例如 Bully 算法)
Consistent snapshotting (e.g. this classic paper from Chandy and Lamport)
一致快照 (好比這個來自 Chandy and Lamport的經典論文 )
一致性 (見上面 2PC 、 Paxos 處)
Distributed state machine replication (Wikipedia is ok, Lampson’s paper is canonical but dry).
分佈式狀態機複製 (看Wikipedia 就行, Lampson的論文是權威可是太枯燥了).
Broadcast - delivering messages to more than one node at once
廣播 - 同時發送消息給集羣
- Atomic broadcast - can you deliver a message to all nodes in a group, or none?
- 原子廣播 - 你能發送消息給一集羣，使得要麼集羣中的全部節點都收到了這條信息、要麼集羣中所有節點都沒收到此消息?(這就是原子廣播)
- Gossip (the classic paper)
- Gossip (經典論文)
- Causal multicast (but also consider the enjoyable back-and-forth between Birman and Cheriton).
- 因果廣播 (也能夠看看 Birman和forth ).
Chain replication (a neat way of ensuring consistency and ordering of writes by organizing nodes into a virtual linked list).
鏈式複製 (將節點們放進一個虛擬鏈表中，從而能夠乾淨的確保寫請求的一致性和順序 ).
- The original paper
- 原始論文
- 一系列改良 for read-mostly workloads
- 對讀請求佔絕大多數的一系列改良
- An experiential report by @slfritchie
- @slfritchie給出的一個經驗報告

Fundamental Results 基礎結論

Some facts just need to be internalised. There are more than this, naturally, but here’s a flavour:
有些事實只須要主觀理解(不須要關注證實).

You can’t implement consistent storage and respond to all requests if you might drop messages between processes. This is the CAP theorem.
若是節點間可能丟失消息[:P]，那麼你不可能既實現一致性存儲[:C] 又響應全部時刻的請求[:A]. 這就是 CAP理論.
Consensus is impossible to implement in such a way that it both a) is always correct and b) always terminates if even one machine might fail in an asynchronous system with crash-* stop failures (the FLP result). The first slides - before the proof gets going - of my Papers We Love SF talk do a reasonable job of explaining the result, I hope. _Suggestion: there’s no real need to understand the proof_.
在一個異步系統中，一致性不可能以這樣一個途徑實現：既a) 老是正確的；又b) 老是能結束即便只有一個節點可能以崩潰-*中止失敗 (FLP結論). 在看證實以前，看下我以簡明的方式解釋FLP結論的論文 Papers We Love SF talk . _建議: 沒有理解證實的需求_.

(一個異步系統中，假設節點崩潰後中止而不是奔潰後又恢復；一、要確保結果老是正確的，二、每次寫請求可以在有限時間內返回結果。這兩點無法同時知足：這就是FLP結論)

Consensus is impossible to solve in fewer than 2 rounds of messages in general.
通常地，只進行少於2輪的消息傳遞，不可能達成一致性 .
Atomic broadcast is exactly as hard as consensus - in a precise sense, if you solve atomic broadcast, you solve consensus, and vice versa. Chandra and Toueg prove this, but you just need to know that it’s true.
原子廣播和一致性，兩者的難度精確的相等。更直白的說，若是你能解原子廣播，那麼你也能解一致性，反之亦然。 Chandra 和 Toueg 證實了這一點, 可是你只須要知道這個論斷是成立的。

Real systems 真實系統

The most important exercise to repeat is to read descriptions of new, real systems, and to critique their design decisions. Do this over and over again. Some suggestions:
最重要的、應該不斷重複的實踐是：讀新的、真實的系統的描述，並評價他們設計的決定。下面是建議的系統：

Google:

Not Google:

Postscript 結尾

If you tame all the concepts and techniques on this list, I’d like to talk to you about engineering positions working with the menagerie of distributed systems we curate at Cloudera.
若是你馴服了這個列表中的全部概念和技術，我很樂意和你聊聊Cloudera的分佈式系統工程師職位。