Brewer's CAP Theoremhtml
原文地址:http://www.julianbrowne.com/article/brewers-cap-theoremjava
Brewer’s (CAP) Theoremnode
So what exactly is Brewer’s Theorem, and why does it warrant comparison with a 1976 punk gig in Manchester?web
Brewer’s 2000 talk was based on his theoretical work at UC Berkley and observations from running Inktomi, though Brewer and others were talking about trade-off decisions that need to be made in highly scalable systems years before that (e.g. 「Cluster-Based Scalable Network Services」 from SOSP in 1997 and 「Harvest, yield, and scalable tolerant systems」 in 1999) so the contents of the presentation weren’t new and, like many of these ideas, they were the work of many smart people (as I am sure Brewer himself would be quick to point out).算法
What he said was there are three core systemic requirements that exist in a special relationship when it comes to designing and deploying applications in a distributed environment (he was talking specifically about the web but so many corporate businesses are multi-site/multi-country these days that the effects could equally apply to your data-centre/LAN/WAN arrangement).sql
The three requirements are: Consistency, Availability and Partition Tolerance, giving Brewer’s Theorem its other name - CAP.數據庫
To give these some real-world meaning let’s use a simple example: you want to buy a copy of Tolstoy’s War & Peace to read on a particularly long vacation you’re starting tomorrow. Your favourite web bookstore has one copy left in stock. You do your search, check that it can be delivered before you leave and add it to your basket. You remember that you need a few other things so you browse the site for a bit (have you ever bought just one thing online? Gotta maximise the parcel dollar). While you’re reading the customer reviews of a suntan lotion product, someone, somewhere else in the country, arrives at the site, adds a copy to their basket and goes right to the checkout process (they need an urgent fix for a wobbly table with one leg much shorter than the others).後端
那麼, 布魯爾的定理究竟是什麼, 爲何它與曼徹斯特的1976朋克演出相比更值得呢?瀏覽器
布魯爾的2000談話是基於他在 UC 伯克利的理論工做和運行 Inktomi 的觀察, 雖然布魯爾和其餘人談論的權衡決定, 須要在高度可伸縮的系統前幾年 (例如, "基於羣集的可伸縮性網絡服務 "從 SOSP 在1997年和" 收穫, 產量和可擴展的容忍系統 "在 1999年) 因此介紹的內容不是新的, 而且, 象許多這些想法, 他們是許多聰明的人的工做 (我確定布魯爾本身將很快指出).安全
他所說的是, 在分佈式環境中設計和部署應用程序時, 有三核心系統要求存在於特殊關係中 (他是專門討論 web 的, 但許多企業業務都是多站點/多國這些天的效果能夠一樣適用於您的數據中心/LAN/WAN 安排)。
三的要求是: 一致性, 可用性和分區寬容, 給布魯爾的定理, 它的其餘名稱上限。
爲了給這些真實世界的意義, 讓咱們用一個簡單的例子: 你想買一本托爾斯泰的《戰爭與和平》的書,從明天開始在一個特別長的假期閱讀。你最喜歡的網上書店有一份存貨。你搜索到, 並添加到購物車,而後你離開了網上書店。你想起來, 你須要一些其餘的東西, 因此你瀏覽了一下網站 (你曾經買了一個網上的東西嗎?爲了包郵,爲了湊單折扣)。當你閱讀的其餘買家評論的防曬霜產品時, 某人, 在該國的其餘地方, 也登陸了網站, 也添加這本書到他們的購物車, 並進行正確的結賬過程。
Consistency
A service that is consistent operates fully or not at all. Gilbert and Lynch use the word 「atomic」 instead of consistent in their proof, which makes more sense technically because, strictly speaking, consistent is the C in ACID as applied to the ideal properties of database transactions and means that data will never be persisted that breaks certain pre-set constraints. But if you consider it a preset constraint of distributed systems that multiple values for the same piece of data are not allowed then I think the leak in the abstraction is plugged (plus, if Brewer had used the word atomic, it would be called the AAP theorem and we’d all be in hospital every time we tried to pronounce it).
In the book buying example you can add the book to your basket, or fail. Purchase it, or not. You can’t half-add or half-purchase a book. There’s one copy in stock and only one person will get it the next day. If both customers can continue through the order process to the end (i.e. make payment) the lack of consistency between what’s in stock and what’s in the system will cause an issue. Maybe not a huge issue in this case - someone’s either going to be bored on vacation or spilling soup - but scale this up to thousands of inconsistencies and give them a monetary value (e.g. trades on a financial exchange where there’s an inconsistency between what you think you’ve bought or sold and what the exchange record states) and it’s a huge issue.
We might solve consistency by utilising a database. At the correct moment in the book order process the number of War and Peace books-in-stock is decremented by one. When the other customer reaches this point, the cupboard is bare and the order process will alert them to this without continuing to payment. The first operates fully, the second not at all.
Databases are great at this because they focus on ACID properties and give us Consistency by also giving us Isolation, so that when Customer One is reducing books-in-stock by one, and simultaneously increasing books-in-basket by one, any intermediate states are isolated from Customer Two, who has to wait a few milliseconds while the data store is made consistent.
服務一致性是指操做全作,或者都不作。吉爾伯特和林奇使用 "原子" 這個詞, 而不是一致性, 這使得更有意義的技術上, 由於嚴格說來, 一致性C在ACID中適用於數據庫事務中理想的屬性, 意思是事務被中斷時數據毫不會被持久化,這是數據庫事務中一致性C的預設約束。可是, 若是您認爲分佈式系統的預設約束是不容許同一數據有多個值, 那麼我認爲抽象概念的漏洞就不存在了(另外, 若是布魯爾使用了原子這個詞, 它將被稱爲 「AAP定理」,咱們每次嘗試這麼稱呼時都會被送進醫院)。
在買書的例子中, 你能夠把書添加到你的購物車裏, 或者失敗。買或不買。不存在買版本書的狀況。只有一本書的狀況,也就只有一我的能買下它。若是兩個客戶均可以繼續經過訂單流程付款,那麼將致使倉庫裏的數量和系統裏數量不一致。也許不是一個大問題, 在這種狀況下,只不過有人要度過一個無聊的假期或擠弄防曬霜而已,大不了退款。但大規模數以千計的不一致就是個大問題了,若是還涉及到貨幣 (如虛擬貨幣的金融交易,首款或支付交易,一旦出錯的話難以回朔,由於它對應不上實際帳目), 這是一個巨大的問題。
咱們能夠利用數據庫來解決一致性問題。在買書的訂單過程當中某個點去減小《戰爭與和平》書的庫存數量一個(原子事務操做,先讀取數量而後立刻再減-1)。當其餘客戶到達這一點時, 當減庫存減無可減時(原子事務操做,先讀取數量,爲0時拋出異常並回滾), 訂單過程將提醒他們, 而不是繼續支付。
數據庫是偉大的, 由於他們專一於ACID的屬性, 並給咱們的一致性, 也給咱們隔離性, 這樣當客戶把書放入購物車時,庫存裏減小一本書, 並同時增長一本書到購物車, 任何其餘客戶作一樣操做時必須等待幾毫秒,由於在數據庫要保持事務一致性的狀況下鎖定了庫存表。
Availability
Availability means just that - the service is available (to operate fully or not as above). When you buy the book you want to get a response, not some browser message about the web site being uncommunicative. Gilbert & Lynch in their proof of CAP Theorem make the good point that availability most often deserts you when you need it most - sites tend to go down at busy periods precisely because they are busy. A service that’s available but not being accessed is of no benefit to anyone.
可用性意味着-服務可用 (像上面那樣操做,能操做徹底,或不徹底)。當你買這本書, 你但願獲得一個回覆, 而不是一些瀏覽器給出的異常信息。吉爾伯特和林奇在他們的CAP定理中指出一個很好的點,可用性最常常在你最須要的時候拋棄你-網站每每是在繁忙的時間未響應, 正是由於他們很忙。可用但不能被訪問的服務對任何人都沒有好處。
譯者:他這裏的意思是上面的操做把書放入購物車時有個事務是要麼都作,要麼都不作,這裏就會鎖定庫存讓其餘客戶等待幾毫秒響應。在這幾毫秒時間裏,服務是不可用的。因此要保證一致性就要損失可用性。
Partition Tolerance
If your application and database runs on one box then (ignoring scale issues and assuming all your code is perfect) your server acts as a kind of atomic processor in that it either works or doesn’t (i.e. if it has crashed it’s not available, but it won’t cause data inconsistency either).
Once you start to spread data and logic around different nodes then there’s a risk of partitions forming. A partition happens when, say, a network cable gets chopped, and Node A can no longer communicate with Node B. With the kind of distribution capabilities the web provides, temporary partitions are a relatively common occurrence and, as I said earlier, they’re also not that rare inside global corporations with multiple data centres.
若是您的應用程序和數據庫運行在一個box上, 那麼 (忽略了縮放問題並假定全部代碼都是完美的), 服務器做爲一種原子處理器, 它要麼工做, 要麼不起做用 (也就是說, 若是它崩潰了, 它是不可用的, 但不會致使數據不一致)。
一旦開始在不一樣節點之間傳播數據和邏輯, 就會造成分區的風險。當網絡電纜被砍掉, 節點 a 不能再與節點 B 通訊時, 就會發生分區。經過 web 提供的分發能力, 臨時分區是一個相對常見的事件, 正如我前面所說的那樣, 它們在具備多個數據中心的全球公司中也並很多見。
Gilbert & Lynch defined partition tolerance as:
「
No set of failures less than total network failure is allowed to cause the system to respond incorrectly
」
and noted Brewer’s comment that a one-node partition is equivalent to a server crash, because if nothing can connect to it, it may as well not be there.
吉爾伯特和林奇定義的分區寬容爲:
「沒有比總網絡故障更少的故障設置, 致使系統響應不正確」
並注意到布魯爾的評論, 單節點分區等同於服務器崩潰, 由於若是沒有什麼能夠鏈接到它, 它可能也沒有在那裏。
The Significance of the Theorem
定理的意義
CAP Theorem comes to life as an application scales. At low transactional volumes, small latencies to allow databases to get consistent has no noticeable affect on either overall performance or the user experience. Any load distribution you do undertake, therefore, is likely to be for systems management reasons.
CAP 定理以應用的尺度來生活。在低事務性卷中, 容許數據庫保持一致的小延遲對整體性能或用戶體驗沒有明顯的影響。所以, 您承擔的任何負載分配均可能是出於系統管理緣由。
But as activity increases, these pinch-points in throughput will begin limit growth and create errors. It’s one thing having to wait for a web page to come back with a response and another experience altogether to enter your credit card details to be met with 「HTTP 500 java.lang.schrodinger.purchasingerror」 and wonder whether you’ve just paid for something you won’t get, not paid at all, or maybe the error is immaterial to this transaction. Who knows? You are unlikely to continue, more likely to shop elsewhere, and very likely to phone your bank.
Either way this is not good for business. Amazon claim that just an extra one tenth of a second on their response times will cost them 1% in sales. Google said they noticed that just a half a second increase in latency caused traffic to drop by a fifth.
可是, 隨着活動的增長, 這些在吞吐量中的捏點將開始限制增加併產生錯誤。這是一件事, 必須等待一個網頁回來與響應和其餘的資料, 以及您輸入信用卡詳細信息後遇到:」HTTP 500 java.lang.schrodinger.purchasingerror", 而後你必定想知道你是否剛剛支付的東西, 錢付了不會發貨,仍是沒有支付, 或者可能錯誤是可有可無的這個事務。誰知道呢?你不大可能繼續, 更有可能在別處購物, 極可能打電話給你的銀行。
不管哪一種方式, 這都不適合作生意。亞馬遜聲稱, 在他們的回覆時間裏, 僅僅額外的1/10 秒將花費他們1% 的銷售額。谷歌表示, 他們注意到, 僅有半秒的延遲增長致使流量降低了15%。
I’ve written a little about scalability before, so won’t repeat all that here except to make two points: the first is that whilst addressing the problems of scale might be an architectural concern, the initial discussions are not. They are business decisions. I get very tired of hearing, from techies, that such-and-such an approach is not warranted because current activity volumes don’t justify it. It’s not that they’re wrong; more often than not they’re quite correct, it’s that to limit scale from the outset is to implicitly make revenue decisions - a factor that should be made explicit during business analysis.
我之前寫過一些關於可伸縮性的文章, 因此不會重複這裏的全部內容, 除了提出兩點: 第一個問題是, 雖然解決橫向擴容的難題多是一個架構上的關注, 但最初的討論卻不是。他們是商業決策。我厭倦了聽, 從技術人員, 這樣-和-這樣的方法是沒有必要的, 因爲目前的activity volumes。這並非說他們錯了;一般狀況下, 他們是徹底正確的, 從一開始就限制規模是爲了隱含地作出收入決定—在業務分析中應該明確的一個因素。
The second point is that once you embark on discussions around how to best scale your application the world falls broadly into two ideological camps: the database crowd and the non-database crowd.
The database crowd, unsurprisingly, like database technology and will tend to address scale by talking of things like optimistic locking and sharding, keeping the database at the heart of things.
he non-database crowd will tend to address scale by managing data outside of the database environment (avoiding the relational world) for as long as possible.
第二個問題是, 一旦你開始討論如何最好地擴展你的應用, 世界就會大體分紅兩個思想陣營: 數據庫人羣和非數據庫人羣。
數據庫的人羣, 絕不奇怪,傾向於數據庫技術解決橫向擴容的問題, 如樂觀鎖和分片, 保持數據庫的核心的東西。
非數據庫人羣將傾向於經過管理數據庫環境以外的數據 (避免關係世界) 儘量長的戰線來解決橫向擴容問題。
I think it’s fair to say that the former group haven’t taken to CAP Theorem with quite the same gusto as the latter (though they are talking about it). This is because if you have to drop one of consistency, availability, or partition tolerance, many opt to drop consistency which is the raison d’être of the database. The logic. no doubt, is that availability and partition-tolerance keep your money-making application alive, whereas inconsistency just feels like one of those things you can work around with clever design.
Like so much else in IT, it’s not as black and white as this. Eric Brewer, on slide 13 of his PODC talk, when comparing ACID and it’s informal counterpart BASE even says 「I think it’s a spectrum」. And if you’re interested in this as a topic (it’s slightly outside of what I want to talk about here) you could do worse than start with a paper called 「Design and Evaluation of a Continuous Consistency Model for Replicated Services」 by Haifeng Yu and Amin Vahdat. Nobody should interpret CAP as implying the database is dead.
我認爲這是公平的說, 前者採起CAP定理沒有後者至關的熱情 (雖然他們正在談論它)。這是由於, 若是必須刪除一致性、可用性或分區寬容, 許多人會選擇刪除一致性, 這是數據庫存在的理由。選擇的邏輯毫無疑問是可用性和分區寬容使您的賺錢應用程序活着, 而不一致只是感受像這些事情之一, 你能夠用巧妙的設計來解決。
譯者:這篇文章的做者認爲非數據庫人羣更加熱衷CAP定理,而且以爲分區寬容和可用性關係到用戶體驗,甚至是公司生存的問題,而一致性是能夠別的方法來處理的。
就好像不少公司同樣,先拿到訂單再說,作反正都是能夠作的,合同條款也都能知足的,至於中間出點小差錯小困難,反正之後都有更長的時間去解決。
就像其餘的同樣, 它沒有像這樣的黑白。埃裏克. 布魯爾, 在他的 PODC 談話的幻燈片 13, 當比較酸和它的非正式對應基礎甚而說 "我認爲它是光譜"。若是你對這個話題感興趣 (它稍微超出了我想談論的內容), 你能夠作得比從海豐和阿明 Vahdat 的一篇名爲 "複製服務的連續一致性模型設計和評估" 的論文開始要糟糕得多。沒有人應該把 CAP 解釋爲暗示數據庫已死。
Where both sides agree though is that the answer to scale is distributed parallelisation not, as was once thought, supercomputer grunt. Eric Brewer’s influence on the Network of Workstations projects of the mid-nineties led to the architectures that exposed CAP theorem, because as he says in another presentation on Inktomi and the Internet Bubble (flash) the answer has always been processors working in parallel:
雖然雙方都贊成, 橫向擴容的答案是並行處理, 而不是超級計算機。埃裏克. 布魯爾對九十年代中旬工做站項目網絡的影響致使了公開 CAP 定理的體系結構, 由於正如他在 Inktomi 和互聯網泡沫 (flash) 的另外一個演示中所說的, 答案老是處理器並行工做:
「If they’re not working in parallel you have no chance to get the problem done in a reasonable amount of time. This is a lot like anything else. If you have a really big job to do you get lots of people to do it. So if you are building a bridge you have lots of construction workers. That’s parallel processing also. So a lot of this will end up being 「how do we mix parallel processing and the internet?」
若是他們不併行工做, 你就沒有機會在合理的時間內完成這個問題。這和其餘東西很像。若是你有一個很大的工做要作, 你有不少人作。因此, 若是你正在建造一座橋, 你有不少建築工人。這也是並行處理。所以, 不少這將最終成爲 「咱們如何整合並行處理和互聯網?」
The Proof in Pictures
Here’s a simplified proof, in pictures because I find it much easier to understand that way. I’ve mostly used the same terms as Gilbert and Lynch so that this ties up with their paper.
圖片中的證實
這是一個簡化的證實, 在圖片中, 由於我發現它更容易理解這種方式。我主要是用與吉爾伯特和林奇相同的術語, 以便與他們的論文聯繫起來。
The diagram above shows two nodes in a network, N1 and N2. They both share a piece of data V (how many physical copies of War and Peace are in stock), which has a value V0. Running on N1 is an algorithm called A which we can consider to be safe, bug free, predictable and reliable. Running on N2 is a similar algorithm called B. In this experiment, A writes new values of V and B reads values of V.
上圖顯示了網絡中的兩個節點, N1 和 N2。他們都分享了一塊數據 V (多少本《戰爭與和平》的物理庫存), 有價值 V0。在 N1 上運行的是一種咱們能夠考慮安全、無 bug、可預測和可靠的算法。在 N2 上運行的是一種相似的算法, 稱爲 B。在這個實驗中,A 寫入 V 和 B 讀取 V 的值。
In a sunny-day scenario this is what happens: (1) First A writes a new value of V, which we’ll call V1. (2) Then a message (M) is passed from N1 to N2 which updates the copy of V there. (3) Now any read by B of V will return V1.
在正常的狀況下, 這是發生了什麼: (1) A寫一個新的值 V, 咱們叫它 V1。(2) 而後N1發送一個消息給N2,告訴他V更新了。(3) 如今B讀取 V 都將返回 值V1。
If the network partitions (that is messages from N1 to N2 are not delivered) then N2 contains an inconsistent value of V when step (3) occurs.
若是網絡分區 (即從 N1 到 N2 的消息未傳遞), 則 N2 在發生 (3) 時包含 V 的不一致值。
Hopefully that seems fairly obvious. Scale this is up to even a few hundred transactions and it becomes a major issue. If M is an asynchronous message then N1 has no way of knowing whether N2 gets the message. Even with guaranteed delivery of M, N1 has no way of knowing if a message is delayed by a partition event or something failing in N2. Making M synchronous doesn’t help because that treats the write by A on N1 and the update event from N1 to N2 as an atomic operation, which gives us the same latency issues we have already talked about (or worse). Gilbert and Lynch also prove, using a slight variation on this, that even in a partially-synchronous model (with ordered clocks on each node) atomicity cannot be guaranteed.
但願這彷佛至關明顯。規模, 這是甚至幾個百筆交易, 它成爲一個重大問題。若是 M 是異步消息, 則 N1 沒法知道 N2 是否獲取消息。即便在保證交付的狀況下, N1 也沒法知道消息是由分區事件延遲仍是在 N2 中發生故障。使 M 同步沒有幫助, 由於它將 N1 上的寫入以及從 N1 到 N2 的更新事件視爲原子操做, 這給出了咱們已經討論過 (或更糟) 的相同的滯後時間問題。吉爾伯特和林奇也證實, 使用一個輕微的變化, 在這一點上, 即便在一個部分同步模型 (與有序時鐘在每一個節點) 原子性不能保證。
So what CAP tells us is that if we want A and B to be highly available (i.e. working with minimal latency) and we want our nodes N1 to Nn (where n could be hundreds or even thousands) to remain tolerant of network partitions (lost messages, undeliverable messages, hardware outages, process failures) then sometimes we are going to get cases where some nodes think that V is V0 (one copy of War and Peace in stock) and other nodes will think that V is V1 (no copies of War and Peace in stock).
所以, CAP 告訴咱們的是, 若是咱們但願 A 和 B 高度可用 (即便用最小滯後時間), 而且咱們但願咱們的節點 N1 到 Nn (其中 n 能夠是成百上千個甚至上千個) 來保持對網絡分區的容忍 (丟失的消息、沒法傳遞的消息、硬件停機、進程故障) 而後, 有時咱們會獲得一些節點認爲 v 是 V0 和其餘節點會認爲 v 是 V1的狀況下。
We’d really like everything to be structured, consistent and harmonious, like the music of a prog rock band from the early seventies, but what we are faced with is a little bit of punk-style anarchy. And actually, whilst it might scare our grandmothers, it’s OK once you know this, because both can work together quite happily.
Let’s quickly analyse this from a transactional perspective.
咱們真的但願一切都是結構化的, 連貫的, 和諧的, 就像七十年代代初的搖滾樂隊的音樂, 可是咱們所面臨的是一些朋克式的無政府狀態。事實上, 雖然它可能會嚇到咱們的祖母, 這是能夠的, 一旦你知道這一點, 由於二者都能很愉快地一塊兒工做。
讓咱們從事務性的角度快速分析這個問題。
If we have a transaction (i.e. unit of work based around the persistent data item V) called α, then α1 could be the write operation from before and α2 could be the read. On a local system this would easily be handled by a database with some simple locking, isolating any attempt to read in α2 until α1 completes safely. In the distributed model though, with nodes N1 and N2 to worry about, the intermediate synchronising message has also to complete. Unless we can control when α2 happens, we can never guarantee it will see the same data values α1 writes. All methods to add control (blocking, isolation, centralised management, etc) will impact either partition tolerance or the availability of α1 (A) and/or α2 (B).
若是咱們有一個事務 (即基於持久性數據項 V 的工做單元) 稱爲α, 那麼α1多是之前的寫操做, α2能夠是讀取。在本地系統上, 這很容易由一個具備一些簡單鎖定的數據庫處理, 隔離任何在α2中讀取的嘗試, 直到α1安全地完成。然而在分佈式模型中, 隨着節點 N1 和 N2 的擔心, 中間同步消息也已完成。除非咱們可以控制α2發生的時間, 不然咱們永遠不能保證它會看到α1寫的相同的數據值。添加控制 (阻塞、隔離、集中管理等) 的全部方法都將影響分區寬容或α1 (A) 和/或α2 (B) 的可用性。
Dealing with CAP
You’ve got a few choices when addressing the issues thrown up by CAP. The obvious ones are:
處理CAP
在處理 CAP 拋出的問題時, 您有幾個選擇。明顯的是:
Drop Partition Tolerance
If you want to run without partitions you have to stop them happening. One way to do this is to put everything (related to that transaction) on one machine, or in one atomically-failing unit like a rack. It’s not 100% guaranteed because you can still have partial failures, but you’re less likely to get partition-like side-effects. There are, of course, significant scaling limits to this.
若是你想在沒有分區的狀況下運行, 你必須阻止它們的發生。這樣作的一個方法是將全部 (相關的事務) 放在一臺機器上, 或者放在一個像機架同樣的原子故障單元中。它不是100% 保證, 由於你仍然能夠有部分失敗, 但你不太可能獲得分區同樣的反作用。固然, 對此有顯著的擴展限制。
譯者:放在同一臺服務器,解決一個斷網的可能性。那麼斷電用UPS解決的話,也沒法避免服務器關閉(有計劃的關閉),高可用集羣下局部更新新的辦法。或者是更新前先把用戶擋在外面,就好像許多遊戲要例行維護那樣,這就屬於放棄可用性了。
Drop Availability
This is the flip side of the drop-partition-tolerance coin. On encountering a partition event, affected services simply wait until data is consistent and therefore remain unavailable during that time. Controlling this could get fairly complex over many nodes, with re-available nodes needing logic to handle coming back online gracefully.
這是放棄分區容忍硬幣的反面。遇到分區事件時, 受影響的服務只是等待數據保持一致, 所以在這段時間內仍然不可用。控制這個可能會在許多節點上變得至關複雜, 而從新可用的節點須要邏輯來處理恢復正常的在線。
譯者:犧牲可用性,服務之間同步通訊,不發異步MQ消息。或者異步通訊,但邏輯上處理某個帳戶的凍結和解凍,來達到其餘帳戶的高可用。
Drop Consistency
Or, as Werner Vogels puts it, accept that things will become 「Eventually Consistent」 (updated Dec 2008). Vogels’ article is well worth a read. He goes into a lot more detail on operational specifics than I do here.
Lots of inconsistencies don’t actually require as much work as you’d think (meaning continuous consistency is probably not something we need anyway). In my book order example if two orders are received for the one book that’s in stock, the second just becomes a back-order. As long as the customer is told of this (and remember this is a rare case) everybody’s probably happy.
或者, 正如沃納 Vogels 所說, 接受 "最終一致" (2008年12月更新)。Vogels 的文章很值得一讀。他在操做細節上的細節比我在這裏詳細得多。
許多不一致實際上並不須要像你想象的那麼多的工做 (這意味着持續的一致性可能不是咱們須要的東西)。在個人圖書訂單示例中, 若是收到了一本庫存的書的兩個訂單, 則第二個命令就會變成一個後端順序。只要客戶被告知這一點 (記住這是一個罕見的狀況), 每一個人均可能會高興。
The BASE Jump
The notion of accepting eventual consistency is supported via an architectural approach known as BASE (Basically Available, Soft-state, Eventually consistent). BASE, as its name indicates, is the logical opposite of ACID, though it would be quite wrong to imply that any architecture should (or could) be based wholly on one or the other. This is an important point to remember, given our industry’s habit of 「oooh shiny」 strategy adoption.
接受最終一致性的概念是經過一種稱爲BASE (B基本A可用、S軟狀態、E最終一致) 的體系結構方法來支持的。基, 顧名思義, 是酸的邏輯相反, 但暗示任何體系結構應該 (或可能) 徹底基於一個或另外一個是徹底錯誤的。這是一個重要的問題, 要記住, 考慮到咱們行業的習慣, "oooh shiny" 戰略的採用。
And here I defer to Professor Brewer himself who emailed me some comments on this article, saying:
「the term 「BASE」 was first presented in the 1997 SOSP article that you cite. I came up with acronym with my students in their office earlier that year. I agree it is contrived a bit, but so is 「ACID」 – much more than people realize, so we figured it was good enough. Jim Gray and I discussed these acronyms and he readily admitted that ACID was a stretch too – the A and D have high overlap and the C is ill-defined at best. But the pair connotes the idea of a spectrum, which is one of the points of the PODC lecture as you correctly point out.」
在這裏, 我遵從了布魯爾教授本身誰發電子郵件給我一些意見, 在這篇文章, 說:
"BASE" 一詞首先出如今你引用的 1997 SOSP 文章中。那年早些時候, 我和個人學生一塊兒在他們的辦公室裏學到了縮寫詞。我贊成這一點是作做的, 可是 "ACID"--比人們意識到的要多, 因此咱們認爲這是足夠好的。吉姆. 格雷和我討論了這些縮略語, 他欣然認可, ACID也是一個延伸--a 和 D 有很高的重疊, 而 C 定義有點理想化?。但這對是指一個頻譜的想法, 這是 PODC 演講的要點之一, 你正確地指出。
Dan Pritchett of EBay has a nice presentation on BASE.
丹普里切特的易趣ebay在BASE有一個很好的介紹。
Design around it
Guy Pardon, CTO of atomikos wrote an interesting post which he called 「A CAP Solution (Proving Brewer Wrong)」, suggesting an architectural approach that would deliver Consistency, Availability and Partition-tolerance, though with some caveats (notably that you don’t get all three guaranteed in the same instant).
It’s worth a read as Guy eloquently represents an opposing view in this area.
Guy Pardon, atomikos公司 的CTO寫了一個有趣的帖子, 他稱之爲 "CAP 解決方案 (證實布魯爾是錯誤的)", 建議一種架構方法, 將提供一致性, 可用性和分區容忍, 雖然有一些警告 (特別是你不要獲得全部三保證在同一瞬間)。
這是值得一讀,Guy Pardon的雄辯表明了在這一領域反對意見。
Summary
That you can only guarantee two of Consistency, Availability and Partition Tolerance is real and evidenced by the most successful websites on the planet. If it works for them I see no reason why the same trade-offs shouldn’t be considered in everyday design in corporate environments. If the business explicitly doesn’t want to scale then fine, simpler solutions are available, but it’s a conversation worth having. In any case these discussions will be about appropriate designs for specific operations, not the whole shebang. As Brewer said in his email 「the only other thing I would add is that different parts of the same service can choose different points in the spectrum」. Sometimes you absolutely need consistency whatever the scaling cost, because the risk of not having it is too great.
These days I’d go so far as to say that Amazon and EBay don’t have a scalability problem. I think they had one and now they have the tools to address it. That’s why they can freely talk about it. Any scaling they do now (given the size they already are) is really more of the same. Once you’ve scaled, your problems shift to those of operational maintenance, monitoring, rolling out software updates etc. - tough to solve, certainly, but nice to have when you’ve got those revenue streams coming in.
總結
你只能保證兩個一致性, 可用性和分區容忍是真實的, 並證實了最成功的網站在地球上。若是它適用於他們, 我看不出有任何理由不該該在公司環境的平常設計中考慮相同的權衡。若是業務明確不但願擴展而後罰款, 更簡單的解決方案是可用的, 但它是一個值得擁有的對話。在任何狀況下, 這些討論將是有關具體操做的適當設計, 而不是整個工做。正如布魯爾在他的電子郵件中說的, "我惟一要補充的是, 同一服務的不一樣部分能夠在頻譜中選擇不一樣的點"。有時, 你絕對須要一致性不管什麼規模的成本, 由於沒有它的風險是太大了。
這些天來, 我不得不說, 亞馬遜和 EBay 沒有一個可伸縮性的問題。我認爲他們有一個, 如今他們有解決這個問題的工具。這就是爲何他們能夠自由地談論它。他們如今所作的任何縮放 (考慮到它們已經存在的大小) 其實是相同的。一旦你調整了規模, 你的問題就會轉移到操做維護、監控、軟件更新等方面--很難解決, 固然, 可是當你有了這些收入流的時候, 你會很高興的。