本文是我對這篇文章的翻譯:Thrift Java Servers Compared,爲了便於閱讀,我將原文附於此處,翻譯穿插在其中。此外,爲了防止原連接在將來某一天失效後,文中的圖片再也看不到的問題,我將原文中的圖片也保存到了本站的服務器上,我不知道github或原做者是否容許這樣作,但我翻譯本文僅在於傳播知識的目的,在此向原做者和github表示深深的感謝:感謝大家分享了這樣好的文章。
Thrift Java Servers Compared
This article talks only about Java servers. See this page if you are interested in C++ servers.
本文僅討論Java版的Thrift server.若是你對C++版的感興趣,請參考 這個 頁面。
Thrift is a cross-language serialization/RPC framework with three major components, protocol, transport, and server. Protocol defines how messages are serialized. Transport defines how messages are communicated between client and server. Server receives serialized messages from the transport, deserializes them according to the protocol and invokes user-defined message handlers, and serializes the responses from the handlers and writes them back to the transport. The modular architecture of Thrift allows it to offer various choices of servers. Here are the list of server available for Java:
Thrift 是一個跨語言的序列化/RPC框架,它含有三個主要的組件:protocol,transport和server,其中,protocol定義了消息是怎樣序列化的,transport定義了消息是怎樣在客戶端和服務器端之間通訊的,server用於從transport接收序列化的消息,根據protocol反序列化之,調用用戶定義的消息處理器,並序列化消息處理器的響應,而後再將它們寫回transport。Thrift模塊化的結構使得它能提供各類server實現。下面列出了Java中可用的server實現:
· TSimpleServer
· TNonblockingServer
· THsHaServer
· TThreadedSelectorServer
· TThreadPoolServer
Having choices is great, but which server is right for you? In this article, I'll describe the differences among all those servers and show benchmark results to illustrate performance characteristics (the details of the benchmark is explained in Appendix B). Let's start with the simplest one: TSimpleServer.
有多個選擇很好,可是哪一個適合你呢?在本文中,我將描述這些server之間的區別,並展現測試結果,以說明它們的性能特色(測試的細節在附錄B中)。下面,咱們就從最簡單的開始:TSimpleServer。
文章來源:http://www.codelast.com/html
TSimpleServer
TSimpleServer accepts a connection, processes requests from the connection until the client closes the connection, and goes back to accept a new connection. Since it is all done in a single thread with blocking I/O, it can only serve one client connection, and all the other clients will have to wait until they get accepted. TSimpleServer is mainly used for testing purpose. Don't use it in production!
TSimplerServer接受一個鏈接,處理鏈接請求,直到客戶端關閉了鏈接,它纔回去接受一個新的鏈接。正由於它只在一個單獨的線程中以阻塞I/O的方式完成這些工做,因此它只能服務一個客戶端鏈接,其餘全部客戶端在被服務器端接受以前都只能等待。TSimpleServer主要用於測試目的,不要在生產環境中使用它!
文章來源:http://www.codelast.com/java
TNonblockingServer vs. THsHaServer
TNonblockingServer solves the problem with TSimpleServer of one client blocking all the other clients by using non-blocking I/O. It usesjava.nio.channels.Selector, which allows you to get blocked on multiple connections instead of a single connection by calling select(). The select() call returns when one ore more connections are ready to be accepted/read/written. TNonblockingServer handles those connections either by accepting it, reading data from it, or writing data to it, and calls select() again to wait for the next available connections. This way, multiple clients can be served without one client starving others.
TNonblockingServer使用非阻塞的I/O解決了TSimpleServer一個客戶端阻塞其餘全部客戶端的問題。它使用了java.nio.channels.Selector,經過調用select(),它使得你阻塞在多個鏈接上,而不是阻塞在單一的鏈接上。當一或多個鏈接準備好被接受/讀/寫時,select()調用便會返回。TNonblockingServer處理這些鏈接的時候,要麼接受它,要麼從它那讀數據,要麼把數據寫到它那裏,而後再次調用select()來等待下一個可用的鏈接。通用這種方式,server可同時服務多個客戶端,而不會出現一個客戶端把其餘客戶端所有「餓死」的狀況。git
There is a catch, however. Messages are processed by the same thread that calls select(). Let's say there are 10 clients, and each message takes 100 ms to process. What would be the latency and throughput? While a message is being processed, 9 clients are waiting to be selected, so it takes 1 second for the clients to get the response back from the server, and throughput will be 10 requests / second. Wouldn't it be great if multiple messages can be processed simultaneously?
然而,還有個棘手的問題:全部消息是被調用select()方法的同一個線程處理的。假設有10個客戶端,處理每條消息所需時間爲100毫秒,那麼,latency和吞吐量分別是多少?當一條消息被處理的時候,其餘9個客戶端就等着被select,因此客戶端須要等待1秒鐘才能從服務器端獲得迴應,吞吐量就是10個請求/秒。若是能夠同時處理多條消息的話,會很不錯吧?github
This is where THsHaServer (Half-Sync/Half-Async server) comes into picture. It uses a single thread for network I/O, and a separate pool of worker threads to handle message processing. This way messages will get processed immediately if there is an idle worker threads, and multiple messages can be processed concurrently. Using the example above, now the latency is 100 ms and throughput will be 100 requests / sec.
所以,THsHaServer(半同步/半異步的server)就應運而生了。它使用一個單獨的線程來處理網絡I/O,一個獨立的worker線程池來處理消息。這樣,只要有空閒的worker線程,消息就會被當即處理,所以多條消息能被並行處理。用上面的例子來講,如今的latency就是100毫秒,而吞吐量就是100個請求/秒。apache
To demonstrate this, I ran a benchmark with 10 clients and a modified message handler that simply sleeps for 100 ms before returning. I used THsHaServer with 10 worker threads. The handler looks something like this:
爲了演示,我作了一個測試,有10客戶端和一個修改過的消息處理器——它的功能僅僅是在返回以前簡單地sleep 100毫秒。我使用的是有10個worker線程的THsHaServer。消息處理器的代碼看上去就像下面這樣:api
1安全 2服務器 3網絡 4併發 5 6 7 8 |
|
The results are as expected. THsHaServer is able to process all the requests concurrently, while TNonblockingServer processes requests one at a time.
結果正如咱們想像的那樣,THsHaServer可以並行處理全部請求,而TNonblockingServer只能一次處理一個請求。
文章來源:http://www.codelast.com/
THsHaServer vs. TThreadedSelectorServer
Thrift 0.8 introduced yet another server, TThreadedSelectorServer. The main difference between TThreadedSelectorServer and THsHaServer is that TThreadedSelectorServer allows you to have multiple threads for network I/O. It maintains 2 thread pools, one for handling network I/O, and one for handling request processing. TThreadedSelectorServer performs better than THsHaServer when the network io is the bottleneck. To show the difference, I ran a benchmark with a handler that returns immediately without doing anything, and measured the average latency and throughput with varying number of clients. I used 32 worker threads for THsHaServer, and 16 worker threads/16 selector threads for TThreadedSelectorServer.
Thrift 0.8引入了另外一種server實現,即TThreadedSelectorServer。它與THsHaServer的主要區別在於,TThreadedSelectorServer容許你用多個線程來處理網絡I/O。它維護了兩個線程池,一個用來處理網絡I/O,另外一個用來進行請求的處理。當網絡I/O是瓶頸的時候,TThreadedSelectorServer比THsHaServer的表現要好。爲了展示它們的區別,我進行了一個測試,令其消息處理器在不作任何工做的狀況下當即返回,以衡量在不一樣客戶端數量的狀況下的平均latency和吞吐量。對THsHaServer,我使用32個worker線程;對TThreadedSelectorServer,我使用16個worker線程和16個selector線程。
The result shows that TThreadedSelectorServer has much higher throughput than THsHaServer while maintaining lower latency.
結果顯示,TThreadedSelectorServer比THsHaServer的吞吐量高得多,而且維持在一個更低的latency上。
文章來源:http://www.codelast.com/
TThreadedSelectorServer vs. TThreadPoolServer
Finally, there is TThreadPoolServer. TThreadPoolServer is different from the other 3 servers in that:
最後,還剩下 TThreadPoolServer。TThreadPoolServer與其餘三種server不一樣的是:
· There is a dedicated thread for accepting connections.
· 有一個專用的線程用來接受鏈接。
· Once a connection is accepted, it gets scheduled to be processed by a worker thread in ThreadPoolExecutor.
· 一旦接受了一個鏈接,它就會被放入ThreadPoolExecutor中的一個worker線程裏處理。
· The worker thread is tied to the specific client connection until it's closed. Once the connection is closed, the worker thread goes back to the thread pool.
· worker線程被綁定到特定的客戶端鏈接上,直到它關閉。一旦鏈接關閉,該worker線程就又回到了線程池中。
· You can configure both minimum and maximum number of threads in the thread pool. Default values are 5 and Integer.MAX_VALUE, respectively.
· 你能夠配置線程池的最小、最大線程數,默認值分別是5(最小)和Integer.MAX_VALUE(最大)。
This means that if there are 10000 concurrent client connections, you need to run 10000 threads. As such, it is not as resource friendly as other servers. Also, if the number of clients exceeds the maximum number of threads in the thread pool, requests will be blocked until a worker thread becomes available.
這意味着,若是有1萬個併發的客戶端鏈接,你就須要運行1萬個線程。因此它對系統資源的消耗不像其餘類型的server同樣那麼「友好」。此外,若是客戶端數量超過了線程池中的最大線程數,在有一個worker線程可用以前,請求將被一直阻塞在那裏。
Having said that, TThreadPoolServer performs very well; on the box I'm using it's able to support 10000 concurrent clients without any problem. If you know the number of clients that will be connecting to your server in advance and you don't mind running a lot of threads, TThreadPoolServer might be a good choice for you.
咱們已經說過,TThreadPoolServer的表現很是優異。在我正在使用的計算機上,它能夠支持1萬個併發鏈接而沒有任何問題。若是你提早知道了將要鏈接到你服務器上的客戶端數量,而且你不介意運行大量線程的話,TThreadPoolServer對你多是個很好的選擇。
Conclusion
結論
I hope this article helps you decide which Thrift server is right for you. I think TThreadedSelectorServer would be a safe choice for most of the use cases. You might also want to consider TThreadPoolServer if you can afford to run lots of concurrent threads. Feel free to send me email atmapkeeper-users@googlegroups.com or post your comments here if you have any questions/comments.
但願本文能幫你作出決定:哪種Thrift server適合你。我認爲TThreadedSelectorServer對大多數案例來講都是個安全之選。若是你的系統資源容許運行大量併發線程的話,你可能會想考慮使用TThreadPoolServer。(後面的就不翻譯了)
Appendix A: Hardware Configuration
1 2 3 4 |
|
Appendix B: Benchmark Details
It's pretty straightforward to run the benchmark yourself. First clone the MapKeeper repository and compile the stub java server:
1 2 3 |
|
Then, start the server you like to benchmark:
1 2 3 4 |
|
Then, clone YCSB repository and compile:
1 2 3 |
|
Once the compilation finishes, you can run YCSB against the stub server:
1 |
|
For more detailed information about how to use YCSB, check out their wiki page.