多線程的學習與python實現

時間 2019-11-18

標籤多線程學習 python 實現欄目 Java 简体版

原文原文鏈接

學習了進程與線程，現對本身的學習進行記錄。html

目錄：python

一.進程與線程的概念，以及聯繫與區別程序員

二.多線程編程

三.python中多線程的應用服務器

四.python實例網絡

五.參考文獻數據結構

一.進程與線程的概念。以及聯繫與區別多線程

進程能夠被稱爲執行的程序，一個進程擁有完整的數據空間和代碼空間，每個進程的地址空間都是獨立的，進程之間不能共享數據。app

線程是進程的一部分，也能夠稱爲mini 進程。在同一個進程中的線程共用同一個地址空間，單有本身獨立的堆棧和局部變量。因此除了堆棧中的數據，其他全部數據均可以共享。socket

若是再形象點就引用一哥們的比喻，很形象：

好比一個公司，有不少不一樣的部門，每一個部門不在同一個城市，而每一個部門都有許多的員工。

公司就比如是一個CPU，不一樣的部門就至關於不一樣的進程，他們是你幹你的，我幹個人，所利用的空間不一樣。想共享些什麼數據，須要email與傳真；而一個部門的每一個員工，至關於不一樣的線程，共在同一個部門，全部東西均可以共享。假如說一我的在用打印機，也就是數據進入堆棧了，我再用打印機，你就用不了了，得等。

聯繫：

進程包括線程，能夠有一個或者多個

區別：

1.進程有獨立的地址空間，多進程較穩定，由於其中一個出現情況不影響另一個；同一個進程的多個線程，共用地址空間，多線程相比於多進程，穩定性要差，由於一個線程出現問題會嚴重影響其餘線程。

2.進程之間須要共享數據，要利用進程間通信；同一個進程中的線程不須要。

3.進程只是資源分配的最小單位；線程是執行的最小單位，也就是說實際執行的是線程。

二.多線程

我主要是針對多線程的學習，記錄以下。

多個線程運行在同一個進程中，線程之間能夠共享數據。每一個線程都有開始，順序執行和結束3部分，也就是在一個線程內部，代碼會按照順序依次執行的。它有一個本身的指令指針，記錄本身運行到什麼位置，線程在運行時可能被強佔或暫時的掛起。

線程中也要有一個主線程，該線程須要每一個線程要作什麼，線程須要什麼數據和參數，以及線程結束的時候，它們都提供了什麼結果，主線程能夠把各個線程的結果組成有意義的結果。

同一個進程中的線程之間能夠共享數據以及相互通信，但這種共享，也會帶來危險。若是多個線程共同訪問同一片數據，則頗有致使數據結果不一致的問題，這叫競態條件。由此大多數線程都帶有一系列的同步源語，來控制線程的執行和數據的訪問。

若是操做不當，就會產生死鎖，死鎖好比說有兩個線程x,y,都要利用資源A和B，線程x先得到到A，要獲取B，y先得到到B,要得到A，他倆都在等在資源，但誰都沒有丟掉本身已經得到的資源，這樣就陷入了互相無線等待的局面。

產生死鎖的條件：

1.互斥條件：多個線程不能同時使用統一資源。
2.請求與保持條件：一個線程必須擁有N個資源才能完成任務，它會一直佔用已經得到的資源部防守
3.不剝奪條件:對於某個線程已經得到的資源，其餘線程不能強行剝奪。
4.循環等待條件:若干進程之間造成一種頭尾相接的循環等待資源關係。

死鎖通常是很難發現的。

如下對同步的緣由講解，摘自 http://buaawhl.iteye.com/blog/164905

同步這個詞是從英文synchronize（使同時發生）翻譯過來的。我也不明白爲何要用這個很容易引發誤解的詞。既然你們都這麼用，我們也就只好這麼將就。
線程同步的真實意思和字面意思剛好相反。線程同步的真實意思，實際上是「排隊」：幾個線程之間要排隊，一個一個對共享資源進行操做，而不是同時進行操做。

所以，關於線程同步，須要緊緊記住的第一點是：線程同步就是線程排隊。同步就是排隊。線程同步的目的就是避免線程「同步」執行。這可真是個無聊的繞口令。
關於線程同步，須要緊緊記住的第二點是「共享」這兩個字。只有共享資源的讀寫訪問才須要同步。若是不是共享資源，那麼就根本沒有同步的必要。
關於線程同步，須要緊緊記住的第三點是，只有「變量」才須要同步訪問。若是共享的資源是固定不變的，那麼就至關於「常量」，線程同時讀取常量也不須要同步。至少一個線程修改共享資源，這樣的狀況下，線程之間就須要同步。
關於線程同步，須要緊緊記住的第四點是：多個線程訪問共享資源的代碼有多是同一份代碼，也有多是不一樣的代碼；不管是否執行同一份代碼，只要這些線程的代碼訪問同一份可變的共享資源，這些線程之間就須要同步。

爲了加深理解，下面舉幾個例子。
有兩個採購員，他們的工做內容是相同的，都是遵循以下的步驟：
（1）到市場上去，尋找併購買有潛力的樣品。
（2）回到公司，寫報告。
這兩我的的工做內容雖然同樣，他們都須要購買樣品，他們可能買到一樣種類的樣品，可是他們絕對不會購買到同一件樣品，他們之間沒有任何共享資源。因此，他們能夠各自進行本身的工做，互不干擾。
這兩個採購員就至關於兩個線程；兩個採購員遵循相同的工做步驟，至關於這兩個線程執行同一段代碼。

下面給這兩個採購員增長一個工做步驟。採購員須要根據公司的「布告欄」上面公佈的信息，安排本身的工做計劃。
這兩個採購員有可能同時走到布告欄的前面，同時觀看布告欄上的信息。這一點問題都沒有。由於布告欄是隻讀的，這兩個採購員誰都不會去修改布告欄上寫的信息。

下面增長一個角色。一個辦公室行政人員這個時候，也走到了布告欄前面，準備修改布告欄上的信息。
若是行政人員先到達布告欄，而且正在修改布告欄的內容。兩個採購員這個時候，剛好也到了。這兩個採購員就必須等待行政人員完成修改以後，才能觀看修改後的信息。
若是行政人員到達的時候，兩個採購員已經在觀看布告欄了。那麼行政人員須要等待兩個採購員把當前信息記錄下來以後，纔可以寫上新的信息。
上述這兩種狀況，行政人員和採購員對布告欄的訪問就須要進行同步。由於其中一個線程（行政人員）修改了共享資源（布告欄）。並且咱們能夠看到，行政人員的工做流程和採購員的工做流程（執行代碼）徹底不一樣，可是因爲他們訪問了同一份可變共享資源（佈告欄），因此他們之間須要同步。

3、python對多線程的應用

執行 Python 程序的時候, 是按照從主模塊頂端向下執行的. 循環用於重複執行部分代碼, 函數和方法會將控制臨時移交到程序的另外一部分. 經過線程, 你的程序能夠在同時處理多個任務. 每一個線程都有它本身的控制流. 因此你能夠在一個線程裏從文件讀取數據, 另個向屏幕輸出內容. 爲了保證兩個線程能夠同時訪問相同的內部數據, Python 使用了 global interpreter lock (全局解釋器鎖) . 在同一時間只可能有一個線程執行 Python 代碼; Python 其實是自動地在一段很短的時間後切換到下個線程執行, 或者等待一個線程執行一項須要時間的操做(例如等待經過 socket 傳輸的數據, 或是從文件中讀取數據).

python虛擬機執行過程：

一、設置GIL

二、切換到一個線程執行

三、運行：指定數量的字節碼指令，線程主動讓出控制（能夠調用time.sleep()，也就是若是利用sleep，該線程就會進入休眠狀態，而後切換到其餘的線程上，若是利用了Lock，那麼就無論是否調用該函數，都很差使，都要等到release後，才能切換到其餘線程去)

四、把線程設置爲睡眠狀態

五、解鎖GIL

在調用外部代碼時，GIL會被鎖定，由於沒有python代碼執行，能夠主動解鎖。

2.python的多線程模塊，Threading模塊，Queue，Multiprocess（多進程模塊）

thread和threading模塊容許程序員建立和管理線程，thread模塊提供基本的線程和鎖的支持，而threading提供了更高級別的線程管理的功能。

thread模塊，當主線程退出時，其餘線程被強制退出，可能尚未清楚，而threading模塊可以確保全部線程退出後，進程才退出。thread模塊不支持守護線程，只要主線程運行完，就直接退出，而無論是否有其餘線程在運行。

thread模塊利用start_new_thread建立線程後，是當即執行的，這樣就很很差控制同步；threading模塊建立線程對象後，若是不啓動start是不執行的。

queque模塊容許用戶建立一個能夠應用於多個線程之間共享數據的隊列數據結構。

在threading模塊中：

['activeCount', 'active_count', 'Condition', 'currentThread',

'current_thread', 'enumerate', 'Event','Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Thread', 'Timer', 'setprofile', 'settrace', 'local', 'stack_size']

守護線程：通常是等待客戶請求的服務器，若是沒有客戶請求，就一直等着，就如同socket編程中的server同樣。

若是你的主線程退出時，不用等待那些子線程，你能夠設定線程的daemon屬性。即在線程開始前，調用setDaemon函數表示不重要。

class threading. Thread ( group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None )

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when a ThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.

name is the thread name. By default, a unique name is constructed of the form 「Thread-N」 where N is a small decimal number.

args is the argument tuple for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

If not None, daemon explicitly sets whether the thread is daemonic. If None (the default), the daemonic property is inherited from the current thread.

If the subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.

Changed in version 3.3: Added the daemon argument.

start ( )

Start the thread’s activity.

It must be called at most once per thread object. It arranges for the object’s run() method to be invoked in a separate thread of control.

This method will raise a RuntimeError if called more than once on the same thread object.

run ( )

Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as thetarget argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

join ( timeout=None )

Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception –, or until the optional timeout occurs.

When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call is_alive() after join() to decide whether a timeout happened – if the thread is still alive, the join() call timed out.

When the timeout argument is not present or None, the operation will block until the thread terminates.

A thread can be join()ed many times.

join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raise the same exception.

name: A string used for identification purposes only. It has no semantics. Multiple threads may be given the same name. The initial name is set by the constructor.

getName ( )
setName ( ): Old getter/setter API for name; use it directly as a property instead.

ident: The ‘thread identifier’ of this thread or None if the thread has not been started. This is a nonzero integer. See the _thread.get_ident() function. Thread identifiers may be recycled when a thread exits and another thread is created. The identifier is available even after the thread has exited.

is_alive ( )

Return whether the thread is alive.

This method returns True just before the run() method starts until just after the run() method terminates. The module function enumerate() returns a list of all alive threads.

daemon

A boolean value indicating whether this thread is a daemon thread (True) or not (False). This must be set before start() is called, otherwiseRuntimeError is raised. Its initial value is inherited from the creating thread; the main thread is not a daemon thread and therefore all threads created in the main thread default to daemon = False.

The entire Python program exits when no alive non-daemon threads are left.

isDaemon ( ) ,該函數能夠用來判斷是不是守護線程
setDaemon ( ): Old getter/setter API for daemon; use it directly as a property instead.

setDaemon函數能夠用來設定某個線程是不是守護線程。True即爲守護線程，表示該線程不重要，在主線程結束後，能夠直接退出；若是是False就不是守護線程，主線程要等待子線程結束後再退出。

Note:

1.關於共享變量

假如要共享全局變量（global），若是不分配好，就會出現錯誤，甚至意想不到的後果。

好比：

在線程的run()方法內，有以下語句：a=3 global a a=a+10,

有兩個線程，A 線程先讀a,讀取的值爲3，在未執行下一個語句時，B線程也讀取了a，也爲3,結果最後結果咱們但願獲得23，輸出值卻爲13.

不過若是是以下形式，a+=10,不用擔憂出現紊亂，由於它是原子的，系統會保護該操做在其餘線程開始前結束。

假如要共享局部變量，那麼局部變量是每一個線程私有的

對出共享全局變量出現的問題，要利用鎖。

類是Lock（）。對Thread類進行重構。

import time
import threading
array=[4,2,1]
def Func(secs,k):
time.sleep(secs)
print 'No %d starts at'%k,time.ctime()
lock=threading.Lock()
class MyThread(threading.Thread):
def __init__(self,secs):
self.secs=secs
super(MyThread,self).__init__()
def run(self):
lock.acquire()
time.sleep(self.secs)
print 'Done'
lock.release()
 
 
def main():
 
ths=[]
Len=range(len(array))
for i in [2,3]:
m=MyThread(i)
ths.append(m)
 
for i in range(2):
print ths[i].getName()
ths[i].start()
ths[0].join()
ths[1].join()
print 'all done,ok! it costs:',time.clock()
 
main()

3.3. RLock（）

RLock（可重入鎖）是一個能夠被同一個線程請求屢次的同步指令。RLock使用了「擁有的線程」和「遞歸等級」的概念，處於鎖定狀態時，RLock被某個線程擁有。擁有RLock的線程能夠再次調用acquire()，釋放鎖時須要調用release()相同次數。

能夠認爲RLock包含一個鎖定池和一個初始值爲0的計數器，每次成功調用 acquire()/release()，計數器將+1/-1，爲0時鎖處於未鎖定狀態。

構造方法：
RLock()

實例方法：
acquire([timeout])/release(): 跟Lock差很少。

3.4. Condition

Condition（條件變量）一般與一個鎖關聯。須要在多個Contidion中共享一個鎖時，能夠傳遞一個Lock/RLock實例給構造方法，不然它將本身生成一個RLock實例。

能夠認爲，除了Lock帶有的鎖定池外，Condition還包含一個等待池，池中的線程處於狀態圖中的等待阻塞狀態，直到另外一個線程調用notify()/notifyAll()通知；獲得通知後線程進入鎖定池等待鎖定。

構造方法：
Condition([lock/rlock])

實例方法：
acquire([timeout])/release(): 調用關聯的鎖的相應方法。
wait([timeout]): 調用這個方法將使線程進入Condition的等待池等待通知，並釋放鎖。使用前線程必須已得到鎖定，不然將拋出異常。
notify(): 調用這個方法將從等待池挑選一個線程並通知，收到通知的線程將自動調用acquire()嘗試得到鎖定（進入鎖定池）；其餘線程仍然在等待池中。調用這個方法不會釋放鎖定。使用前線程必須已得到鎖定，不然將拋出異常。
notifyAll(): 調用這個方法將通知等待池中全部的線程，這些線程都將進入鎖定池嘗試得到鎖定。調用這個方法不會釋放鎖定。使用前線程必須已得到鎖定，不然將拋出異常。

例子是很常見的生產者/消費者模式：

3.5. Semaphore/BoundedSemaphore

Semaphore（信號量）是計算機科學史上最古老的同步指令之一。Semaphore管理一個內置的計數器，每當調用acquire()時-1，調用release() 時+1。計數器不能小於0；當計數器爲0時，acquire()將阻塞線程至同步鎖定狀態，直到其餘線程調用release()。

基於這個特色，Semaphore常常用來同步一些有「訪客上限」的對象，好比鏈接池。

BoundedSemaphore 與Semaphore的惟一區別在於前者將在調用release()時檢查計數器的值是否超過了計數器的初始值，若是超過了將拋出一個異常。

構造方法：
Semaphore(value=1): value是計數器的初始值。

實例方法：
acquire([timeout]): 請求Semaphore。若是計數器爲0，將阻塞線程至同步阻塞狀態；不然將計數器-1並當即返回。
release(): 釋放Semaphore，將計數器+1，若是使用BoundedSemaphore，還將進行釋放次數檢查。release()方法不檢查線程是否已得到 Semaphore。

3.6. Event

Event（事件）是最簡單的線程通訊機制之一：一個線程通知事件，其餘線程等待事件。Event內置了一個初始爲False的標誌，當調用set()時設爲True，調用clear()時重置爲 False。wait()將阻塞線程至等待阻塞狀態。

Event其實就是一個簡化版的 Condition。Event沒有鎖，沒法使線程進入同步阻塞狀態。

構造方法：
Event()

實例方法：
isSet(): 當內置標誌爲True時返回True。
set(): 將標誌設爲True，並通知全部處於等待阻塞狀態的線程恢復運行狀態。
clear(): 將標誌設爲False。
wait([timeout]): 若是標誌爲True將當即返回，不然阻塞線程至等待阻塞狀態，等待其餘線程調用set()。

3.7. Timer

Timer（定時器）是Thread的派生類，用於在指定時間後調用一個方法。

構造方法：
Timer(interval, function, args=[], kwargs={})
interval: 指定的時間
function: 要執行的方法
args/kwargs: 方法的參數

實例方法：
Timer從Thread派生，沒有增長實例方法。

3.8. local

local是一個小寫字母開頭的類，用於管理 thread-local（線程局部的）數據。對於同一個local，線程沒法訪問其餘線程設置的屬性；線程設置的屬性不會被其餘線程設置的同名屬性替換。

能夠把local當作是一個「線程-屬性字典」的字典，local封裝了從自身使用線程做爲 key檢索對應的屬性字典、再使用屬性名做爲key檢索屬性值的細節。

熟練掌握Thread、Lock、Condition就能夠應對絕大多數須要使用線程的場合，某些狀況下local也是很是有用的東西。本文的最後使用這幾個類展現線程基礎中提到的場景：

十3、Queque模塊

容許建立一個能夠用於多個線程之間共享數據的隊列數據結構。

Queue Objects

Queue objects (Queue, LifoQueue, or PriorityQueue) provide the public methods described below.

Queue. qsize ( ): Return the approximate size of the queue. Note, qsize() > 0 doesn’t guarantee that a subsequent get() will not block, nor will qsize() < maxsize guarantee that put() will not block.

Queue. empty ( ): Return True if the queue is empty, False otherwise. If empty() returns True it doesn’t guarantee that a subsequent call to put() will not block. Similarly, if empty() returns False it doesn’t guarantee that a subsequent call to get() will not block.

Queue. full ( ): Return True if the queue is full, False otherwise. If full() returns True it doesn’t guarantee that a subsequent call to get() will not block. Similarly, if full() returns False it doesn’t guarantee that a subsequent call to put() will not block.

Queue. put ( item, block=True, timeout=None ): Put item into the queue. If optional args block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case).

Queue. put_nowait ( item ): Equivalent to put(item, False).

Queue. get ( block=True, timeout=None ): Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. Iftimeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).

Queue. get_nowait ( ): Equivalent to get(False).

Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads.

Queue. task_done ( )

Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call totask_done() tells the queue that the processing on the task is complete.

If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).

Raises a ValueError if called more times than there were items placed in the queue.

Queue. join ( )

Blocks until all items in the queue have been gotten and processed.

The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer thread calls task_done()to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, join() unblocks.

semphore與Queque差很少，QueQue更加的方便。

queque.join() 就是直到隊列中全部的任務都完成了才能夠開始下一個線程，就是一直阻塞到全部任務都完成。

4、python實例

功能是利用多線程，一塊兒搜索一個文件，獲得給定的某一個字節出現次數。

環境：Cygwin

解釋器：Python2.7.6

import timeit
  2 import threading
  3 import sys
  4 import os
  5
  6 def GetFileSize(filename):
  7
  8     FileSize=os.path.getsize(filename)
  9     return FileSize
 10
 11
 12
 13 array=[]
 14 lock=threading.Lock()
 15 cur=0
 16 sum=0
 17
 18
 19 def Print_InColor(color,msg):
 20
 21     print '\033[0;%dm%s\033[0m' %(color,msg)


 22
 23 class MyThread(threading.Thread):
 24
 25     def __init__(self,filename):
 26
 27         self.filename=filename
 28         super(MyThread,self).__init__()
 29
 30     def run(self):
 31
 32         global cur
 33         global sum
 34         Done=True
 35         size=GetFileSize(self.filename)
 36         f=file(self.filename,'r')
 37
 38         lock.acquire()
 39         start=cur
 40            Print_InColor(31,threading.currentThread().getName())
 41         Print_InColor(33,start)
 42         length=start+int(size/3)
 43         cur=end=length if length<size else size
 44         Print_InColor(33,cur)
 45         lock.release()
 46
 47         if start==size:
 48             f.close()
 49
 50         f.seek(start,0)
 51         print 'the new start position',f.tell(),self.getName()
 52
 53         while Done:
 54             p=f.read(1)
 55             c=f.tell()
 56             if c>end:
 57                 print 'has ended,the posoition is ',c,self.getName()
 58                 Done=False
 59             elif p=='i':
 60                 array.append(p)
 61                 sum+=1
 62                 #print sum
 63             else:
 64                 continue
 65
 66
 67         f.close()
 68
 69
 70
 71
    def main():
 73
 74     thds=[]
 75     filename=sys.argv[1]
 76
 77
 78     for i in range(3):
 79         t=MyThread(filename)
 80         thds.append(t)
 81
 82     for t in thds:
 83         t.start()
 84
 85     for t in thds:
 86         t.join()
 87
 88
 89
 90
 91
 92 if __name__=="__main__":
 93
 94     t=timeit.Timer("main()","from __main__ import main")
 95     print t.timeit(1)

下一步計劃是實現多線程的socket服務器，Come on!

5、參考文獻

1.python核心編程

2.www.docs.python.org

3.python網絡編程基礎