到目前爲止,咱們已經學了網絡併發編程的2個套路, 多進程,多線程,這哥倆的優點和劣勢都很是的明顯,咱們一塊兒來回顧下html
協程,又稱微線程,纖程。英文名Coroutine。一句話說明什麼是線程:協程是一種用戶態的輕量級線程。node
協程擁有本身的寄存器上下文和棧。協程調度切換時,將寄存器上下文和棧保存到其餘地方,在切回來的時候,恢復先前保存的寄存器上下文和棧。所以:python
協程能保留上一次調用時的狀態(即全部局部狀態的一個特定組合),每次過程重入時,就至關於進入上一次調用的狀態,換種說法:進入上一次離開時所處邏輯流的位置。linux
協程的好處:git
缺點:程序員
import time import queue def consumer(name): print("--->starting eating baozi...") while True: new_baozi = yield print("[%s] is eating baozi %s" % (name, new_baozi)) time.sleep(1) #此處相似於IO輸出,就會出現卡頓 def producer(): r = con.__next__() r = con2.__next__() n = 0 while n < 5000: n += 1 con.send(n) con2.send(n) print("\033[32;1m[producer]\033[0m is making baozi %s" % n) if __name__ == '__main__': con = consumer("c1") con2 = consumer("c2") p = producer()
greenlet是一個用C實現的協程模塊,相比與python自帶的yield,它可使你在任意函數之間隨意切換,而不需把這個函數先聲明爲generatorgithub
from greenlet import greenlet def test1(): print(12) gr2.switch() print(34) gr2.switch() def test2(): print(56) gr1.switch() print(78) gr1 = greenlet(test1) #至關於啓動一個協程 gr2 = greenlet(test2) gr1.switch() gr2.switch() #若是此時再執行的話,test2已經在最後,因此沒法再執行,(協程會自動保存上下文)
from greenlet import greenlet import time def test1(): print(12) time.sleep(1) #並無結果阻塞 gr2.switch() print(34) gr2.switch() def test2(): print(56) gr1.switch() print(78) gr1 = greenlet(test1) #至關於啓動一個協程 gr2 = greenlet(test2) gr1.switch() gr2.switch() #若是此再執行的話,test2已經在最後,因此沒法再執行,(協程會自動保存上下文)
感受確實用着比generator還簡單了呢,但好像尚未解決一個問題,就是遇到IO操做,自動切換,對不對?web
那麼,接下來就引入一個模塊shell
Gevent 是一個第三方庫,能夠輕鬆經過gevent實現併發同步或異步編程,在gevent中用到的主要模式是Greenlet, 它是以C擴展模塊形式接入Python的輕量級協程。 Greenlet所有運行在主程序操做系統進程的內部,但它們被協做式地調度。編程
#實現IO切換 import gevent def func1(): print('\033[31;1m李闖在跟海濤搞\033[0m') gevent.sleep(2) print('\033[31;1m李闖又回去跟繼續跟海濤搞...\033[0m') def func2(): print('\033[32;1m李闖切換到了跟海龍搞...\033[0m') gevent.sleep(1) print('\033[32;1m李闖搞完了海濤,回來繼續跟海龍搞...\033[0m') def func3(): print(3333) gevent.sleep(1) print(44444) gevent.joinall([gevent.spawn(func1),gevent.spawn(func2),gevent.spawn(func3)]) #生成一個協程
輸出結果:
李闖在跟海濤搞
李闖切換到了跟海龍搞...
3333
李闖搞完了海濤,回來繼續跟海龍搞...
44444
李闖又回去跟繼續跟海濤搞...
同步與異步的性能區別
import gevent def task(pid): """ Some non-deterministic task """ gevent.sleep(0.5) print('Task %s done' % pid) def synchronous(): for i in range(1, 10): task(i) def asynchronous(): threads = [gevent.spawn(task, i) for i in range(10)] # threads = [] # for i in range(10) # threads.append(gevent.spawn(task,i)) 與上面執行結果一致 gevent.joinall(threads) print('Synchronous:') synchronous() print('Asynchronous:') asynchronous()
上面程序的重要部分是將task函數封裝到Greenlet內部線程的gevent.spawn
。 初始化的greenlet列表存放在數組threads
中,此數組被傳給gevent.joinall
函數,後者阻塞當前流程,並執行全部給定的greenlet。執行流程只會在 全部greenlet執行完後纔會繼續向下走。
import gevent from gevent import monkey monkey.patch_all() from urllib.request import urlopen #urllib不支持異步,只能經過monkey這個插件來實現 import time def pa_web_page(url): print("GET url",url) req = urlopen(url) data =req.read() print(data) print('%d bytes received from %s.' % (len(data), url)) t_start = time.time() pa_web_page("http://www.autohome.com.cn/beijing/") pa_web_page("http://www.xiaohuar.com/") print("time cost:",time.time()-t_start) t2_start = time.time() gevent.joinall([ #gevent.spawn(pa_web_page, 'https://www.python.org/'), gevent.spawn(pa_web_page, 'http://www.autohome.com.cn/beijing/'), gevent.spawn(pa_web_page, 'http://www.xiaohuar.com/'), #gevent.spawn(pa_web_page, 'https://github.com/'), ]) print("time cost t2:",time.time()-t2_start)
經過gevent實現單線程下的多socket併發
server side
import sys import socket import time import gevent from gevent import socket, monkey monkey.patch_all() def server(port): s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() gevent.spawn(handle_request, cli) def handle_request(conn): try: while True: data = conn.recv(1024) print("recv:", data) conn.send(data) if not data: conn.shutdown(socket.SHUT_WR) except Exception as ex: print(ex) finally: conn.close() if __name__ == '__main__': server(8001)
client side
import socket import threading def sock_conn(): client = socket.socket() client.connect(("localhost",8001)) count = 0 while True: #msg = input(">>:").strip() #if len(msg) == 0:continue client.send( ("hello %s" %count).encode("utf-8")) data = client.recv(1024) print("[%s]recv from server:" % threading.get_ident(),data.decode()) #結果 count +=1 client.close() for i in range(100): t = threading.Thread(target=sock_conn) t.start()
在UI編程中,經常要對鼠標點擊進行相應,首先如何得到鼠標點擊呢?
方式一:建立一個線程,該線程一直循環檢測是否有鼠標點擊,那麼這個方式有如下幾個缺點:
1. CPU資源浪費,可能鼠標點擊的頻率很是小,可是掃描線程仍是會一直循環檢測,這會形成不少的CPU資源浪費;若是掃描鼠標點擊的接口是阻塞的呢?
2. 若是是堵塞的,又會出現下面這樣的問題,若是咱們不但要掃描鼠標點擊,還要掃描鍵盤是否按下,因爲掃描鼠標時被堵塞了,那麼可能永遠不會去掃描鍵盤;
3. 若是一個循環須要掃描的設備很是多,這又會引來響應時間的問題;
因此,該方式是很是很差的。
方式二:就是事件驅動模型
目前大部分的UI編程都是事件驅動模型,如不少UI平臺都會提供onClick()事件,這個事件就表明鼠標按下事件。事件驅動模型大致思路以下:
1. 有一個事件(消息)隊列;
2. 鼠標按下時,往這個隊列中增長一個點擊事件(消息);
3. 有個循環,不斷從隊列取出事件,根據不一樣的事件,調用不一樣的函數,如onClick()、onKeyDown()等;
4. 事件(消息)通常都各自保存各自的處理函數指針,這樣,每一個消息都有獨立的處理函數;
事件驅動編程是一種編程範式,這裏程序的執行流由外部事件來決定。它的特色是包含一個事件循環,當外部事件發生時使用回調機制來觸發相應的處理。另外兩種常見的編程範式是(單線程)同步以及多線程編程。
讓咱們用例子來比較和對比一下單線程、多線程以及事件驅動編程模型。下圖展現了隨着時間的推移,這三種模式下程序所作的工做。這個程序有3個任務須要完成,每一個任務都在等待I/O操做時阻塞自身。阻塞在I/O操做上所花費的時間已經用灰色框標示出來了。
在單線程同步模型中,任務按照順序執行。若是某個任務由於I/O而阻塞,其餘全部的任務都必須等待,直到它完成以後它們才能依次執行。這種明確的執行順序和串行化處理的行爲是很容易推斷得出的。若是任務之間並無互相依賴的關係,但仍然須要互相等待的話這就使得程序沒必要要的下降了運行速度。
在多線程版本中,這3個任務分別在獨立的線程中執行。這些線程由操做系統來管理,在多處理器系統上能夠並行處理,或者在單處理器系統上交錯執行。這使得當某個線程阻塞在某個資源的同時其餘線程得以繼續執行。與完成相似功能的同步程序相比,這種方式更有效率,但程序員必須寫代碼來保護共享資源,防止其被多個線程同時訪問。多線程程序更加難以推斷,由於這類程序不得不經過線程同步機制如鎖、可重入函數、線程局部存儲或者其餘機制來處理線程安全問題,若是實現不當就會致使出現微妙且使人痛不欲生的bug。
在事件驅動版本的程序中,3個任務交錯執行,但仍然在一個單獨的線程控制中。當處理I/O或者其餘昂貴的操做時,註冊一個回調到事件循環中,而後當I/O操做完成時繼續執行。回調描述了該如何處理某個事件。事件循環輪詢全部的事件,當事件到來時將它們分配給等待處理事件的回調函數。這種方式讓程序儘量的得以執行而不須要用到額外的線程。事件驅動型程序比多線程程序更容易推斷出行爲,由於程序員不須要關心線程安全問題。
當咱們面對以下的環境時,事件驅動模型一般是一個好的選擇:
當應用程序須要在任務間共享可變的數據時,這也是一個不錯的選擇,由於這裏不須要採用同步處理。
網絡應用程序一般都有上述這些特色,這使得它們可以很好的契合事件驅動編程模型。
此處要提出一個問題,就是,上面的事件驅動模型中,只要一遇到IO就註冊一個事件,而後主程序就能夠繼續幹其它的事情了,只到io處理完畢後,繼續恢復以前中斷的任務,這本質上是怎麼實現的呢?哈哈,下面咱們就來一塊兒揭開這神祕的面紗。。。。
同步IO和異步IO,阻塞IO和非阻塞IO分別是什麼,到底有什麼區別?不一樣的人在不一樣的上下文下給出的答案是不一樣的。因此先限定一下本文的上下文。
本文討論的背景是Linux環境下的network IO。
在進行解釋以前,首先要說明幾個概念: - 用戶空間和內核空間 - 進程切換 - 進程的阻塞 - 文件描述符 - 緩存 I/O
如今操做系統都是採用虛擬存儲器,那麼對32位操做系統而言,它的尋址空間(虛擬存儲空間)爲4G(2的32次方)。操做系統的核心是內核,獨立於普通的應用程序,能夠訪問受保護的內存空間,也有訪問底層硬件設備的全部權限。爲了保證用戶進程不能直接操做內核(kernel),保證內核的安全,操心繫統將虛擬空間劃分爲兩部分,一部分爲內核空間,一部分爲用戶空間。針對linux操做系統而言,將最高的1G字節(從虛擬地址0xC0000000到0xFFFFFFFF),供內核使用,稱爲內核空間,而將較低的3G字節(從虛擬地址0x00000000到0xBFFFFFFF),供各個進程使用,稱爲用戶空間。
爲了控制進程的執行,內核必須有能力掛起正在CPU上運行的進程,並恢復之前掛起的某個進程的執行。這種行爲被稱爲進程切換。所以能夠說,任何進程都是在操做系統內核的支持下運行的,是與內核緊密相關的。
從一個進程的運行轉到另外一個進程上運行,這個過程當中通過下面這些變化: 1. 保存處理機上下文,包括程序計數器和其餘寄存器。 2. 更新PCB信息。
3. 把進程的PCB移入相應的隊列,如就緒、在某事件阻塞等隊列。 4. 選擇另外一個進程執行,並更新其PCB。 5. 更新內存管理的數據結構。 6. 恢復處理機上下文。
總而言之就是很耗資源,具體的能夠參考這篇文章:進程切換
注:進程控制塊(Processing Control Block),是操做系統核心中一種數據結構,主要表示進程狀態。其做用是使一個在多道程序環境下不能獨立運行的程序(含數據),成爲一個能獨立運行的基本單位或與其它進程併發執行的進程。或者說,OS是根據PCB來對併發執行的進程進行控制和管理的。 PCB一般是系統內存佔用區中的一個連續存區,它存放着操做系統用於描述進程狀況及控制進程運行所需的所有信息
正在執行的進程,因爲期待的某些事件未發生,如請求系統資源失敗、等待某種操做的完成、新數據還沒有到達或無新工做作等,則由系統自動執行阻塞原語(Block),使本身由運行狀態變爲阻塞狀態。可見,進程的阻塞是進程自身的一種主動行爲,也所以只有處於運行態的進程(得到CPU),纔可能將其轉爲阻塞狀態。當進程進入阻塞狀態,是不佔用CPU資源的
。
文件描述符(File descriptor)是計算機科學中的一個術語,是一個用於表述指向文件的引用的抽象化概念。
文件描述符在形式上是一個非負整數。實際上,它是一個索引值,指向內核爲每個進程所維護的該進程打開文件的記錄表。當程序打開一個現有文件或者建立一個新文件時,內核向進程返回一個文件描述符。在程序設計中,一些涉及底層的程序編寫每每會圍繞着文件描述符展開。可是文件描述符這一律念每每只適用於UNIX、Linux這樣的操做系統。
緩存 I/O 又被稱做標準 I/O,大多數文件系統的默認 I/O 操做都是緩存 I/O。在 Linux 的緩存 I/O 機制中,操做系統會將 I/O 的數據緩存在文件系統的頁緩存( page cache )中,也就是說,數據會先被拷貝到操做系統內核的緩衝區中,而後纔會從操做系統內核的緩衝區拷貝到應用程序的地址空間。
緩存 I/O 的缺點: 數據在傳輸過程當中須要在應用程序地址空間和內核進行屢次數據拷貝操做,這些數據拷貝操做所帶來的 CPU 以及內存開銷是很是大的。
剛纔說了,對於一次IO訪問(以read舉例),數據會先被拷貝到操做系統內核的緩衝區中,而後纔會從操做系統內核的緩衝區拷貝到應用程序的地址空間。因此說,當一個read操做發生時,它會經歷兩個階段: 1. 等待數據準備 (Waiting for the data to be ready) 2. 將數據從內核拷貝到進程中 (Copying the data from the kernel to the process)
正式由於這兩個階段,linux系統產生了下面五種網絡模式的方案。 - 阻塞 I/O(blocking IO) - 非阻塞 I/O(nonblocking IO) - I/O 多路複用( IO multiplexing) - 信號驅動 I/O( signal driven IO) - 異步 I/O(asynchronous IO)
注:因爲signal driven IO在實際中並不經常使用,因此我這隻說起剩下的四種IO Model。
在linux中,默認狀況下全部的socket都是blocking,一個典型的讀操做流程大概是這樣:
當用戶進程調用了recvfrom這個系統調用,kernel就開始了IO的第一個階段:準備數據(對於網絡IO來講,不少時候數據在一開始尚未到達。好比,尚未收到一個完整的UDP包。這個時候kernel就要等待足夠的數據到來)。這個過程須要等待,也就是說數據被拷貝到操做系統內核的緩衝區中是須要一個過程的。而在用戶進程這邊,整個進程會被阻塞(固然,是進程本身選擇的阻塞)。當kernel一直等到數據準備好了,它就會將數據從kernel中拷貝到用戶內存,而後kernel返回結果,用戶進程才解除block的狀態,從新運行起來。
因此,blocking IO的特色就是在IO執行的兩個階段都被block了。
linux下,能夠經過設置socket使其變爲non-blocking。當對一個non-blocking socket執行讀操做時,流程是這個樣子:
當用戶進程發出read操做時,若是kernel中的數據尚未準備好,那麼它並不會block用戶進程,而是馬上返回一個error。從用戶進程角度講 ,它發起一個read操做後,並不須要等待,而是立刻就獲得了一個結果。用戶進程判斷結果是一個error時,它就知道數據尚未準備好,因而它能夠再次發送read操做。一旦kernel中的數據準備好了,而且又再次收到了用戶進程的system call,那麼它立刻就將數據拷貝到了用戶內存,而後返回。
因此,nonblocking IO的特色是用戶進程須要不斷的主動詢問kernel數據好了沒有。
IO multiplexing就是咱們說的select,poll,epoll,有些地方也稱這種IO方式爲event driven IO。select/epoll的好處就在於單個process就能夠同時處理多個網絡鏈接的IO。它的基本原理就是select,poll,epoll這個function會不斷的輪詢所負責的全部socket,當某個socket有數據到達了,就通知用戶進程。
當用戶進程調用了select,那麼整個進程會被block
,而同時,kernel會「監視」全部select負責的socket,當任何一個socket中的數據準備好了,select就會返回。這個時候用戶進程再調用read操做,將數據從kernel拷貝到用戶進程。
因此,I/O 多路複用的特色是經過一種機制一個進程能同時等待多個文件描述符,而這些文件描述符(套接字描述符)其中的任意一個進入讀就緒狀態,select()函數就能夠返回。
這個圖和blocking IO的圖其實並無太大的不一樣,事實上,還更差一些。由於這裏須要使用兩個system call (select 和 recvfrom),而blocking IO只調用了一個system call (recvfrom)。可是,用select的優點在於它能夠同時處理多個connection。
因此,若是處理的鏈接數不是很高的話,使用select/epoll的web server不必定比使用multi-threading + blocking IO的web server性能更好,可能延遲還更大。select/epoll的優點並非對於單個鏈接能處理得更快,而是在於能處理更多的鏈接。)
在IO multiplexing Model中,實際中,對於每個socket,通常都設置成爲non-blocking,可是,如上圖所示,整個用戶的process實際上是一直被block的。只不過process是被select這個函數block,而不是被socket IO給block。
inux下的asynchronous IO其實用得不多。先看一下它的流程:
用戶進程發起read操做以後,馬上就能夠開始去作其它的事。而另外一方面,從kernel的角度,當它受到一個asynchronous read以後,首先它會馬上返回,因此不會對用戶進程產生任何block。而後,kernel會等待數據準備完成,而後將數據拷貝到用戶內存,當這一切都完成以後,kernel會給用戶進程發送一個signal,告訴它read操做完成了。
調用blocking IO會一直block住對應的進程直到操做完成,而non-blocking IO在kernel還準備數據的狀況下會馬上返回。
在說明synchronous IO和asynchronous IO的區別以前,須要先給出二者的定義。POSIX的定義是這樣子的: - A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes; - An asynchronous I/O operation does not cause the requesting process to be blocked;
二者的區別就在於synchronous IO作」IO operation」的時候會將process阻塞。按照這個定義,以前所述的blocking IO,non-blocking IO,IO multiplexing都屬於synchronous IO。
有人會說,non-blocking IO並無被block啊。這裏有個很是「狡猾」的地方,定義中所指的」IO operation」是指真實的IO操做,就是例子中的recvfrom這個system call。non-blocking IO在執行recvfrom這個system call的時候,若是kernel的數據沒有準備好,這時候不會block進程。可是,當kernel中數據準備好的時候,recvfrom會將數據從kernel拷貝到用戶內存中,這個時候進程是被block了,在這段時間內,進程是被block的。
而asynchronous IO則不同,當進程發起IO 操做以後,就直接返回不再理睬了,直到kernel發送一個信號,告訴進程說IO完成。在這整個過程當中,進程徹底沒有被block。
各個IO Model的比較如圖所示:
經過上面的圖片,能夠發現non-blocking IO和asynchronous IO的區別仍是很明顯的。在non-blocking IO中,雖然進程大部分時間都不會被block,可是它仍然要求進程去主動的check,而且當數據準備完成之後,也須要進程主動的再次調用recvfrom來將數據拷貝到用戶內存。而asynchronous IO則徹底不一樣。它就像是用戶進程將整個IO操做交給了他人(kernel)完成,而後他人作完後發信號通知。在此期間,用戶進程不須要去檢查IO操做的狀態,也不須要主動的去拷貝數據。
經過一個select例子來了解一下,select 是如何經過單進程實現同時處理多個非阻塞的socket鏈接的
socket_server
import socket import select import queue server = socket.socket() server.bind(('localhost',9999)) server.listen(5) server.setblocking(0) #非阻塞,reve時不卡 inputs = [server] #讓select幫我檢測server這個對面 ,若是活躍,表示有連接 msg_queues = {} outputs = [] while True: r_list,w_list,exception_list = select.select(inputs,outputs,inputs) # #r_list只要有數據,就是就緒的 for s in r_list: if s is server: #新鏈接來了 conn,addr = s.accept() print('got new conn',conn,addr) inputs.append(conn) msg_queues[conn] = queue.Queue() #初始化隊列,將 else: try: data = s.recv(1024) print('recv data from [%s]:[%s]'%(s.getpeername(),data.decode()))#此處getpeername是獲取ip和端口 msg_queues[s].put(data) if s not in outputs: outputs.append(s) #等下次select的時候,確保w_list的數據返回給客戶端 except ConnectionResetError as e: #當鏈接出問題了,從select裏清空這個socket對象 print('conn closed.',s.getpeername(),e) inputs.remove(s) if s in outputs: outputs.remove(s) del msg_queues[s] for s in w_list: try: data = msg_queues[s].get_nowait() s.send(data.upper()) except queue.Empty as e: outputs.remove(s)
soclet_client
import socket HOST = 'localhost' # The remote host PORT = 9999 # The same port as used by the server s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT)) while True: msg = bytes(input(">>:"), encoding="utf8") s.sendall(msg) data = s.recv(1024) # print(data) print('Received', repr(data)) s.close()
select()方法接收並監控3個通訊列表, 第一個是全部的輸入的data,就是指外部發過來的數據,第2個是監控和接收全部要發出去的data(outgoing data),第3個監控錯誤信息,接下來咱們須要建立2個列表來包含輸入和輸出信息來傳給select().
# Sockets from which we expect to read
inputs = [ server ] # Sockets to which we expect to write outputs = [ ]
全部客戶端的進來的鏈接和數據將會被server的主循環程序放在上面的list中處理,咱們如今的server端須要等待鏈接可寫(writable)以後才能過來,而後接收數據並返回(所以不是在接收到數據以後就馬上返回),由於每一個鏈接要把輸入或輸出的數據先緩存到queue裏,而後再由select取出來再發出去。
Connections are added to and removed from these lists by the server main loop. Since this version of the server is going to wait for a socket to become writable before sending any data (instead of immediately sending the reply), each output connection needs a queue to act as a buffer for the data to be sent through it.
# Outgoing message queues (socket:Queue)
message_queues = {}
The main portion of the server program loops, calling select() to block and wait for network activity.
下面是此程序的主循環,調用select()時會阻塞和等待直到新的鏈接和數據進來
while inputs: # Wait for at least one of the sockets to be ready for processing print >>sys.stderr, '\nwaiting for the next event' readable, writable, exceptional = select.select(inputs, outputs, inputs)
當你把inputs,outputs,exceptional(這裏跟inputs共用)傳給select()後,它返回3個新的list,咱們上面將他們分別賦值爲readable,writable,exceptional, 全部在readable list中的socket鏈接表明有數據可接收(recv),全部在writable list中的存放着你能夠對其進行發送(send)操做的socket鏈接,當鏈接通訊出現error時會把error寫到exceptional列表中。
select() returns three new lists, containing subsets of the contents of the lists passed in. All of the sockets in the readable list have incoming data buffered and available to be read. All of the sockets in the writable list have free space in their buffer and can be written to. The sockets returned in exceptional have had an error (the actual definition of 「exceptional condition」 depends on the platform).
Readable list 中的socket 能夠有3種可能狀態,第一種是若是這個socket是main "server" socket,它負責監聽客戶端的鏈接,若是這個main server socket出如今readable裏,那表明這是server端已經ready來接收一個新的鏈接進來了,爲了讓這個main server能同時處理多個鏈接,在下面的代碼裏,咱們把這個main server的socket設置爲非阻塞模式。
The 「readable」 sockets represent three possible cases. If the socket is the main 「server」 socket, the one being used to listen for connections, then the 「readable」 condition means it is ready to accept another incoming connection. In addition to adding the new connection to the list of inputs to monitor, this section sets the client socket to not block.
第二種狀況是這個socket是已經創建了的鏈接,它把數據發了過來,這個時候你就能夠經過recv()來接收它發過來的數據,而後把接收到的數據放到queue裏,這樣你就能夠把接收到的數據再傳回給客戶端了。
The next case is an established connection with a client that has sent data. The data is read with recv(), then placed on the queue so it can be sent through the socket and back to the client.
第三種狀況就是這個客戶端已經斷開了,因此你再經過recv()接收到的數據就爲空了,因此這個時候你就能夠把這個跟客戶端的鏈接關閉了。
A readable socket without data available is from a client that has disconnected, and the stream is ready to be closed.
對於writable list中的socket,也有幾種狀態,若是這個客戶端鏈接在跟它對應的queue裏有數據,就把這個數據取出來再發回給這個客戶端,不然就把這個鏈接從output list中移除,這樣下一次循環select()調用時檢測到outputs list中沒有這個鏈接,那就會認爲這個鏈接還處於非活動狀態
There are fewer cases for the writable connections. If there is data in the queue for a connection, the next message is sent. Otherwise, the connection is removed from the list of output connections so that the next time through the loop select() does not indicate that the socket is ready to send data.
最後,若是在跟某個socket鏈接通訊過程當中出了錯誤,就把這個鏈接對象在inputs\outputs\message_queue中都刪除,再把鏈接關閉掉
import selectors import socket sel = selectors.DefaultSelector() def accept(sock, mask): conn, addr = sock.accept() # Should be ready print('accepted', conn, 'from', addr) conn.setblocking(False) sel.register(conn, selectors.EVENT_READ, read) def read(conn, mask): data = conn.recv(1000) # Should be ready if data: print('echoing', repr(data), 'to', conn) conn.send(data) # Hope it won't block else: print('closing', conn) sel.unregister(conn) conn.close() sock = socket.socket() sock.bind(('localhost', 10000)) sock.listen(100) sock.setblocking(False) sel.register(sock, selectors.EVENT_READ, accept) while True: events = sel.select() for key, mask in events: callback = key.data callback(key.fileobj, mask)
一.先安裝erlang
1.去erlang官網或百度雲盤下載安裝包
2.解壓 unzip otp-OTP-19.1.zip 到/ 目錄下
3.安裝依賴環境
yum install -y gcc* ncurses-devel openssl-devel unixODBC-devel (Java編譯器,有問題再安裝)
4.cd /otp-OTP-19.1
export ERL_TOP = 'pwd'
./configure (個人包是在github上下載的,需先執行./otp_build autoconf產生一個configure腳本,再執行./configure)
make
make install
5.測試是否安裝成功
[root@bogon ~]#erl
Erlang/OTP 19 [erts-8.1] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V8.1 (abort with ^G)
1> EvenN = lists:filter (fun (N) -> N rem 2 == 0 end, lists:seq(1,100)).
輸出:[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,
44,46,48,50,52,54,56,58|...]
有幾種退出Erlang Shell的方法
快捷鍵方式1:Control+C 而後選a 快捷鍵方式2:Control+G 而後按q
2、安裝rabbitmq
1.下載安裝包
去官網或個人分享百度網盤,我下載的是rpm包
2.rpm -ivh --nodeps rabbitmq-server-3.6.5-1.noarch.rpm
3.RPM安裝的,啓動控制檯插件就能夠了
cd /usr/lib/rabbimq/bin
./rabbitmq-server –detached 後臺啓動
./rabbitmqctl status查看狀態
./rabbitmqctl stop關閉服務
4.安裝web監控界面
啓動 控制檯插件
1. ./rabbitmq-plugins enable rabbitmq_management
2.而後新建一個用戶:
新建用戶:./rabbitmqctl add_user 用戶名 密碼
刪除用戶: rabbitmqctl delete_user 用戶名
改密碼: ./rabbimqctl change_password {username} {newpassword}
設置用戶角色:./rabbitmqctl set_user_tags {username} {tag ...}
Tag能夠爲 administrator,monitoring, management
3.重啓服務
cd /usr/lib/rabbimq/bin
./rabbitmqctl stop關閉服務
./rabbitmq-server –detached 後臺啓動
./rabbitmqctl status查看狀態
4.關閉防火牆,在瀏覽器訪問192.168.0.107:15672 (ip地址是安裝rabbitmq的機器的ip,端口號是固定的15672),輸入用戶名密碼便可。
3、rabbitmq經常使用命令
./rabbitmq-server start #啓動rabbitmq
./rabbitmqctl list_exchanges
./rabbitmqctl list_bindings
./rabbitmqctl list_queues #分別查看當前系統種存在的Exchange和Exchange上綁定的Queue信息。
./rabbitmqctl status #查看運行信息
./rabbitmqctl stop #中止運行rabbitmq
./rabbitmq-plugins enable rabbitmq_management
實現最簡單的隊列通訊
rabbitmq中文文檔
http://rabbitmq.mr-ping.com/
收消息端
import pika username = 'lwq' pwd = '111111' user_pwd = pika.PlainCredentials(username, pwd) connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.15',credentials=user_pwd)) channel = connection.channel() channel.queue_declare(queue='hello') def callback(ch,method,properties,body): print(ch,method,properties) print("[x] Received %r" %body) channel.basic_consume (callback,queue='hello',no_ack=True) print('[*] Waiting for message . To exit press ') channel.start_consuming()
發消息端
import pika username = 'lwq' pwd = '111111' user_pwd = pika.PlainCredentials(username, pwd) connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.15',credentials=user_pwd)) #聲明 channel = connection.channel() channel.queue_declare(queue='hello') channel.basic_public(exchange='',routing_key='hello',body='Hello World') print("[X] Sent 'hello world'")
Work Queues
在這種模式下,RabbitMQ會默認把p發的消息依次分發給各個消費者(c),跟負載均衡差很少
import pika import time connection = pika.BlockingConnection(pika.ConnectionParameters( 'localhost')) channel = connection.channel() # 聲明queue channel.queue_declare(queue='task_queue') # n RabbitMQ a message can never be sent directly to the queue, it always needs to go through an exchange. import sys message = ' '.join(sys.argv[1:]) or "Hello World! %s" % time.time() channel.basic_publish(exchange='', routing_key='task_queue', body=message, properties=pika.BasicProperties( delivery_mode=2, # make message persistent ) ) print(" [x] Sent %r" % message) connection.close()
消費者代碼
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
#_*_coding:utf-8_*_
import
pika, time
connection
=
pika.BlockingConnection(pika.ConnectionParameters(
'localhost'
))
channel
=
connection.channel()
def
callback(ch, method, properties, body):
print
(
" [x] Received %r"
%
body)
time.sleep(
20
)
print
(
" [x] Done"
)
print
(
"method.delivery_tag"
,method.delivery_tag)
ch.basic_ack(delivery_tag
=
method.delivery_tag)
channel.basic_consume(callback,
queue
=
'task_queue'
,
no_ack
=
True
)
print
(
' [*] Waiting for messages. To exit press CTRL+C'
)
channel.start_consuming()
|
此時,先啓動消息生產者,而後再分別啓動3個消費者,經過生產者多發送幾條消息,你會發現,這幾條消息會被依次分配到各個消費者身上
Doing a task can take a few seconds. You may wonder what happens if one of the consumers starts a long task and dies with it only partly done. With our current code once RabbitMQ delivers message to the customer it immediately removes it from memory. In this case, if you kill a worker we will lose the message it was just processing. We'll also lose all the messages that were dispatched to this particular worker but were not yet handled.
But we don't want to lose any tasks. If a worker dies, we'd like the task to be delivered to another worker.
In order to make sure a message is never lost, RabbitMQ supports message acknowledgments. An ack(nowledgement) is sent back from the consumer to tell RabbitMQ that a particular message had been received, processed and that RabbitMQ is free to delete it.
If a consumer dies (its channel is closed, connection is closed, or TCP connection is lost) without sending an ack, RabbitMQ will understand that a message wasn't processed fully and will re-queue it. If there are other consumers online at the same time, it will then quickly redeliver it to another consumer. That way you can be sure that no message is lost, even if the workers occasionally die.
There aren't any message timeouts; RabbitMQ will redeliver the message when the consumer dies. It's fine even if processing a message takes a very, very long time.
Message acknowledgments are turned on by default. In previous examples we explicitly turned them off via the no_ack=True flag. It's time to remove this flag and send a proper acknowledgment from the worker, once we're done with a task.
1
2
3
4
5
6
7
8
|
def
callback(ch, method, properties, body):
print
" [x] Received %r"
%
(body,)
time.sleep( body.count(
'.'
) )
print
" [x] Done"
ch.basic_ack(delivery_tag
=
method.delivery_tag)
channel.basic_consume(callback,
queue
=
'hello'
)
|
Using this code we can be sure that even if you kill a worker using CTRL+C while it was processing a message, nothing will be lost. Soon after the worker dies all unacknowledged messages will be redelivered
We have learned how to make sure that even if the consumer dies, the task isn't lost(by default, if wanna disable use no_ack=True). But our tasks will still be lost if RabbitMQ server stops.
When RabbitMQ quits or crashes it will forget the queues and messages unless you tell it not to. Two things are required to make sure that messages aren't lost: we need to mark both the queue and messages as durable.
First, we need to make sure that RabbitMQ will never lose our queue. In order to do so, we need to declare it as durable:
1
|
channel.queue_declare(queue
=
'hello'
, durable
=
True
)
|
Although this command is correct by itself, it won't work in our setup. That's because we've already defined a queue called hello which is not durable. RabbitMQ doesn't allow you to redefine an existing queue with different parameters and will return an error to any program that tries to do that. But there is a quick workaround - let's declare a queue with different name, for exampletask_queue:
1
|
channel.queue_declare(queue
=
'task_queue'
, durable
=
True
)
|
This queue_declare change needs to be applied to both the producer and consumer code.
At that point we're sure that the task_queue queue won't be lost even if RabbitMQ restarts. Now we need to mark our messages as persistent - by supplying a delivery_mode property with a value 2.
1
2
3
4
5
6
|
channel.basic_publish(exchange
=
'',
routing_key
=
"task_queue"
,
body
=
message,
properties
=
pika.BasicProperties(
delivery_mode
=
2
,
# make message persistent
))
|
若是Rabbit只管按順序把消息發到各個消費者身上,不考慮消費者負載的話,極可能出現,一個機器配置不高的消費者那裏堆積了不少消息處理不完,同時配置高的消費者卻一直很輕鬆。爲解決此問題,能夠在各個消費者端,配置perfetch=1,意思就是告訴RabbitMQ在我這個消費者當前消息還沒處理完的時候就不要再給我發新消息了。
1
|
channel.basic_qos(prefetch_count
=
1
)
|
帶消息持久化+公平分發的完整代碼
生產者端
import pika import sys connection = pika.BlockingConnection(pika.ConnectionParameters( host='localhost')) channel = connection.channel() channel.queue_declare(queue='task_queue', durable=True) message = ' '.join(sys.argv[1:]) or "Hello World!" channel.basic_publish(exchange='', routing_key='task_queue', body=message, properties=pika.BasicProperties( delivery_mode=2, # make message persistent )) print(" [x] Sent %r" % message) connection.close()
消費者端
import pika import time connection = pika.BlockingConnection(pika.ConnectionParameters( host='localhost')) channel = connection.channel() channel.queue_declare(queue='task_queue', durable=True) print(' [*] Waiting for messages. To exit press CTRL+C') def callback(ch, method, properties, body): print(" [x] Received %r" % body) time.sleep(body.count(b'.')) print(" [x] Done") ch.basic_ack(delivery_tag=method.delivery_tag) channel.basic_qos(prefetch_count=1) channel.basic_consume(callback, queue='task_queue') channel.start_consuming()
以前的例子都基本都是1對1的消息發送和接收,即消息只能發送到指定的queue裏,但有些時候你想讓你的消息被全部的Queue收到,相似廣播的效果,這時候就要用到exchange了,
An exchange is a very simple thing. On one side it receives messages from producers and the other side it pushes them to queues. The exchange must know exactly what to do with a message it receives. Should it be appended to a particular queue? Should it be appended to many queues? Or should it get discarded. The rules for that are defined by the exchange type.
Exchange在定義的時候是有類型的,以決定究竟是哪些Queue符合條件,能夠接收消息
fanout: 全部bind到此exchange的queue均可以接收消息
direct: 經過routingKey和exchange決定的那個惟一的queue能夠接收消息
topic:全部符合routingKey(此時能夠是一個表達式)的routingKey所bind的queue能夠接收消息
表達式符號說明:#表明一個或多個字符,*表明任何字符
例:#.a會匹配a.a,aa.a,aaa.a等
*.a會匹配a.a,b.a,c.a等
注:使用RoutingKey爲#,Exchange Type爲topic的時候至關於使用fanout
headers: 經過headers 來決定把消息發給哪些queue
消息publisher
import pika import sys credentials = pika.PlainCredentials('alex','alex3714') connection = pika.BlockingConnection(pika.ConnectionParameters(host='192.168.184.128',credentials=credentials)) channel = connection.channel() channel.exchange_declare(exchange='log',type='fanout') message = ' '.join(sys.argv[1:]) or 'info:Hello World!' channel.basic_public(exchange='log',routing_key='',body=message) print('[x] Sent %r' %message) connection.close()
消息subscriber
import pika credentials = pika.PlainCredentials('alex','alex3714') connection = pika.BlockingConnection(pika.ConnectionParameters(host='192.168.184.128',credentials=credentials)) channel = connection.channel() channel.exchange_declare(exchange='log',type='fanout') result = channel.queue_declare(exclusive=True) #不指定queue名字,rabbit隨機分配名字,exclusive會在使用完之後,刪除這個queue, queue_name = result.method.queue channel.queue_bind(exchange='log',queue=queue_name) print('[*] Waiting for logs. To exit press CTRL +C') def callback(ch,method,properties,body): print('[x] %r' %body) channel.basic_consume(callback,queue=queue_name,) channel.start_consuming()
RabbitMQ還支持根據關鍵字發送,即:隊列綁定關鍵字,發送者將數據根據關鍵字發送到消息exchange,exchange根據 關鍵字 斷定應該將數據發送至指定隊列。
import pika import sys connection = pika.BlockingConnection(pika.ConnectionParameters( host='192.168.184.128')) channel = connection.channel() channel.exchange_declare(exchange='direct_logs', type='direct') severity = sys.argv[1] if len(sys.argv) > 1 else 'info' message = ' '.join(sys.argv[2:]) or 'Hello World!' channel.basic_publish(exchange='direct_logs', routing_key=severity, body=message) print(" [x] Sent %r:%r" % (severity, message)) connection.close() #發送消息 執行這個py+error +內容
subscriber
import pika import sys connection = pika.BlockingConnection(pika.ConnectionParameters( host='192.168.184.128')) channel = connection.channel() channel.exchange_declare(exchange='direct_logs', type='direct') result = channel.queue_declare(exclusive=True) queue_name = result.method.queue severities = sys.argv[1:] if not severities: sys.stderr.write("Usage: %s [info] [warning] [error]\n" % sys.argv[0]) sys.exit(1) for severity in severities: channel.queue_bind(exchange='direct_logs', queue=queue_name, routing_key=severity) print(' [*] Waiting for logs. To exit press CTRL+C') def callback(ch, method, properties, body): print(" [x] %r:%r" % (method.routing_key, body)) channel.basic_consume(callback, queue=queue_name, no_ack=True) channel.start_consuming() #執行py+error
Although using the direct exchange improved our system, it still has limitations - it can't do routing based on multiple criteria.
In our logging system we might want to subscribe to not only logs based on severity, but also based on the source which emitted the log. You might know this concept from the syslog unix tool, which routes logs based on both severity (info/warn/crit...) and facility (auth/cron/kern...).
That would give us a lot of flexibility - we may want to listen to just critical errors coming from 'cron' but also all logs from 'kern'.
publisher
import pika import sys connection = pika.BlockingConnection(pika.ConnectionParameters( host='localhost')) channel = connection.channel() channel.exchange_declare(exchange='topic_logs', type='topic') routing_key = sys.argv[1] if len(sys.argv) > 1 else 'anonymous.info' message = ' '.join(sys.argv[2:]) or 'Hello World!' channel.basic_publish(exchange='topic_logs', routing_key=routing_key, body=message) print(" [x] Sent %r:%r" % (routing_key, message)) connection.close()
subscriber
import pika import sys connection = pika.BlockingConnection(pika.ConnectionParameters( host='localhost')) channel = connection.channel() channel.exchange_declare(exchange='topic_logs', type='topic') result = channel.queue_declare(exclusive=True) queue_name = result.method.queue binding_keys = sys.argv[1:] if not binding_keys: sys.stderr.write("Usage: %s [binding_key]...\n" % sys.argv[0]) sys.exit(1) for binding_key in binding_keys: channel.queue_bind(exchange='topic_logs', queue=queue_name, routing_key=binding_key) print(' [*] Waiting for logs. To exit press CTRL+C') def callback(ch, method, properties, body): print(" [x] %r:%r" % (method.routing_key, body)) channel.basic_consume(callback, queue=queue_name, no_ack=True) channel.start_consuming()
To illustrate how an RPC service could be used we're going to create a simple client class. It's going to expose a method named call which sends an RPC request and blocks until the answer is received:
fibonacci_rpc
=
FibonacciRpcClient()
result
=
fibonacci_rpc.call(
4
)
print
(
"fib(4) is %r"
%
result)
import pika import time connection = pika.BlockingConnection(pika.ConnectionParameters( host='localhost')) channel = connection.channel() channel.queue_declare(queue='rpc_queue') def fib(n): if n == 0: return 0 elif n == 1: return 1 else: return fib(n - 1) + fib(n - 2) def on_request(ch, method, props, body): n = int(body) print(" [.] fib(%s)" % n) response = fib(n) ch.basic_publish(exchange='', routing_key=props.reply_to, properties=pika.BasicProperties(correlation_id= \ props.correlation_id), body=str(response)) ch.basic_ack(delivery_tag=method.delivery_tag) channel.basic_qos(prefetch_count=1) channel.basic_consume(on_request, queue='rpc_queue') print(" [x] Awaiting RPC requests") channel.start_consuming()
client
import pika import uuid class FibonacciRpcClient(object): def __init__(self): self.connection = pika.BlockingConnection(pika.ConnectionParameters( host='localhost')) self.channel = self.connection.channel() result = self.channel.queue_declare(exclusive=True) self.callback_queue = result.method.queue self.channel.basic_consume(self.on_response, no_ack=True, queue=self.callback_queue) def on_response(self, ch, method, props, body): if self.corr_id == props.correlation_id: self.response = body def call(self, n): self.response = None self.corr_id = str(uuid.uuid4()) self.channel.basic_publish(exchange='', routing_key='rpc_queue', properties=pika.BasicProperties( reply_to=self.callback_queue, correlation_id=self.corr_id, ), body=str(n)) while self.response is None: self.connection.process_data_events() return int(self.response) fibonacci_rpc = FibonacciRpcClient() print(" [x] Requesting fib(30)") response = fibonacci_rpc.call(30) print(" [.] Got %r" % response)