Actor的原理

先從著名的c10k問題談起。有一個叫Dan Kegel的人在網上(http://www.kegel.com/c10k.html)提出:如今的硬件應該可以讓一臺機器支持10000個併發的client。而後他討論了用不一樣的方式實現大規模併發服務的技術,概括起來就是兩種方式:一個client一個thread,用blocking I/O;多個clients一個thread,用nonblocking I/O或者asynchronous I/O。目前asynchronous I/O的支持在Linux上還不是很好,因此通常都是用nonblocking I/O。大多數的實現都是用epoll()的edge triggering(傳統的select()有很大的性能問題)。這就引出了thread和event之爭,由於前者就是徹底用線程來處理併發,後者是用事件驅動來處理併發。固然實際的系統當中每每是混合系統:用事件驅動來處理網絡時間,而用線程來處理事務。因爲目前操做系統(尤爲是Linux)和程序語言的限制(Java/C/C++等),線程沒法實現大規模的併發事務。通常的機器,要保證性能的話,線程數量基本要限制幾百(Linux上的線程有個特色,就是達到必定數量之後,會致使系統性能指數降低,參看SEDA的論文)。因此如今不少高性能web server都是使用事件驅動機制,好比nginx,Tornado,node.js等等。事件驅動幾乎成了高併發的同義詞,一時間紅的不得了。css

其實線程和事件,或者說同步和異步之爭早就在學術領域爭了幾十年了。1978年有人爲了平息爭論,寫了論文證實了用線性的process(線程的模式)和消息傳遞(事件的模式)是等價的,並且若是實現合適,二者應該有同等性能。固然這是理論上的。針對事件驅動的流行,2003年加大伯克利發表了一篇論文叫「Why events are a bad idea (for high-concurrency servers)」,指出其實事件驅動並無在功能上有比線程有什麼優越之處,但編程要麻煩不少,並且特別容易出錯。線程的問題,無非是目前的實現的緣由。一個是線程佔的資源太大,一建立就分配幾個MB的stack,通常的機器能支持的線程大受限制。針對這點,能夠用自動擴展的stack,建立的先少分點,而後動態增長。第二個是線程的切換負擔太大,Linux中實際上process和thread是一回事,區別就在因而否共享地址空間。解決這個問題的辦法是用輕量級的線程實現,經過合做式的辦法來實現共享系統的線程。這樣一個是切換的花費不多,另一個能夠維護比較小的stack。他們用coroutine和nonblocking I/O(用的是poll()+thread pool)實現了一個原型系統,證實了性能並不比事件驅動差。html

那是否是說明線程只要實現的好就好了呢。也不徹底對。2006年仍是加大伯克利,發表了一篇論文叫「The problem with threads」。線程也不行。緣由是這樣的。目前的程序的模型基本上是基於順序執行。順序執行是肯定性的,容易保證正確性。而人的思惟方式也每每是單線程的。線程的模式是強行在單線程,順序執行的基礎上加入了併發和不肯定性。這樣程序的正確性就很難保證。線程之間的同步是經過共享內存來實現的,你很難來對併發線程和共享內存來創建數學模型,其中有很大的不肯定性,而不肯定性是編程的巨大敵人。做者以他們的一個項目中的經驗來講明,保證多線程的程序的正確性,幾乎是不可能的事情。首先,不少很簡單的模式,在多線程的狀況下,要保證正確性,須要注意不少很是微妙的細節,不然就會致使deadlock或者race condition。其次,因爲人的思惟的限制,即便你採起各類消除不肯定的辦法,好比monitor,transactional memory,還有promise/future,等等機制,仍是很難保證面面俱到。以做者的項目爲例,他們有計算機科學的專家,有最聰明的研究生,採用了整套軟件工程的流程:design review, code review, regression tests, automated code coverage metrics,認爲已經消除了大多數問題,不過仍是在系統運行4年之後,出現了一個deadlock。做者說,不少多線程的程序實際上存在併發錯誤,只不過因爲硬件的並行度不夠,每每不顯示出來。隨着硬件的並行度愈來愈高,不少原來運行無缺的程序,極可能會發生問題。我本身的體會也是,程序NPE,core dump都不怕,最怕的就是race condition和deadlock,由於這些都是不肯定的(non-deterministic),每每很難重現。node

那既然線程+共享內存不行,什麼樣的模型能夠幫咱們解決併發計算的問題呢。研究領域已經發展了一些模型,目前愈來愈多地開始被新的程序語言採用。最主要的一個就是Actor模型。它的主要思想就是用一些併發的實體,稱爲actor,他們之間的經過發送消息來同步。所謂「Don’t communicate by sharing memory, share memory by communicating」。Actor模型和線程的共享內存機制是等價的。實際上,Actor模型通常經過底層的thread/lock/buffer 等機制來實現,是高層的機制。Actor模型是數學上的模型,有理論的支持。另外一個相似的數學模型是CSP(communicating sequential process)。早期的實現這些理論的語言最著名的就是erlang和occam。尤爲是erlang,所謂的Ericsson Language,目的就是實現大規模的併發程序,用於電信系統。Erlang後來成爲比較流行的語言。python

相似Actor/CSP的消息傳遞機制。Go語言中也提供了這樣的功能。Go的併發實體叫作goroutine,相似coroutine,但不須要本身調度。Runtime本身就會把goroutine調度到系統的線程上去運行,多個goroutine共享一個線程。若是有一個要阻塞,系統就會自動把其餘的goroutine調度到其餘的線程上去。nginx


一些名詞定義:Processes, threads, green threads, protothreads, fibers, coroutines: what's the difference?web

  1. Process: OS-managed (possibly) truly concurrent, at least in the presence of suitable hardware support. Exist within their own address space.
  2. Thread: OS-managed, within the same address space as the parent and all its other threads. Possibly truly concurrent, and multi-tasking is pre-emptive.
  3. Green Thread: These are user-space projections of the same concept as threads, but are not OS-managed. Probably not truly concurrent, except in the sense that there may be multiple worker threads or processes giving them CPU time concurrently, so probably best to consider this as interleaved or multiplexed.
  4. Protothreads: I couldn't really tease a definition out of these. I think they are interleaved and program-managed, but don't take my word for it. My sense was that they are essentially an application-specific implementation of the same kind of "green threads" model, with appropriate modification for the application domain.
  5. Fibers: OS-managed. Exactly threads, except co-operatively multitasking, and hence not truly concurrent.
  6. Coroutines: Exactly fibers, except not OS-managed.Coroutines are computer program components that generalize subroutines to allow multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing more familiar program components such as cooperative tasks, iterators, infinite lists and pipes.Continuation: An abstract representation of the control state of a computer program.A continuation reifies the program control state, i.e. the continuationis a data structure that represents the computational process at a given point in the process' execution; the created data structure can be accessed by the programming language, instead of being hidden in the runtime environment. Continuations are useful for encoding other control mechanisms in programming languages such as exceptions, generators, coroutines, and so on.
  7. The "current continuation" or "continuation of the computation step" is the continuation that, from the perspective of running code, would be derived from the current point in a program's execution. The term continuations can also be used to refer to first-class continuations, which are constructs that give a programming language the ability to save the execution state at any pointand return to that point at a later point in the program.(yield keywork in some languages, such as c# or python)
  8. Goroutines: They claim to be unlike anything else, but they seem to be exactly green threads, as in, process-managed in a single address space and multiplexed onto system threads. Perhaps somebody with more knowledge of Go can cut through the marketing material.
相關文章
相關標籤/搜索