see:html
若是其餘條件不變,Python程序的執行速度直接與解釋器的「速度」相關。無論你怎樣優化本身的程序,你的程序的執行速度仍是依賴於解釋器執行你的程序的效率。python
目前來講,多線程執行仍是利用多核系統最經常使用的方式。儘管多線程編程大大好於「順序」編程,不過即使是仔細的程序員也無法在代碼中將併發性作到最好。git
對於任何Python程序,無論有多少的處理器,任什麼時候候都老是隻有一個線程在執行。程序員
事實上,這個問題被問得如此頻繁以致於Python的專家們精心製做了一個標準答案:」不要使用多線程,請使用多進程。「但這個答案比那個問題更加讓人困惑。github
GIL對諸如當前線程狀態和爲垃圾回收而用的堆分配對象這樣的東西的訪問提供着保護。然而,這對Python語言來講沒什麼特殊的,它須要使用一個GIL。這是該實現的一種典型產物。如今也有其它的Python解釋器(和編譯器)並不使用GIL。雖然,對於CPython來講,自其出現以來已經有不少不使用GIL的解釋器。web
無論某一我的對Python的GIL感受如何,它仍然是Python語言裏最困難的技術挑戰。想要理解它的實現須要對操做系統設計、多線程編程、C語言、解釋器設計和CPython解釋器的實現有着很是完全的理解。單是這些所需準備的就妨礙了不少開發者去更完全的研究GIL。編程
threading
模塊提供比/基於 thread
模塊更高層次的接口;若是此模塊因爲 thread
丟失而沒法使用,可使用 dummy_threading
來代替。segmentfault
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.多線程
舉例:併發
import threading, zipfile class AsyncZip(threading.Thread): def __init__(self, infile, outfile): threading.Thread.__init__(self) self.infile = infile self.outfile = outfile def run(self): f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED) f.write(self.infile) f.close() print 'Finished background zip of: ', self.infile background = AsyncZip('mydata.txt', 'myarchive.zip') background.start() print 'The main program continues to run in foreground.' background.join() # Wait for the background task to finish print 'Main program waited until background was done.'
import threading import datetime class ThreadClass(threading.Thread): def run(self): now = datetime.datetime.now() print "%s says Hello World at time: %s" % (self.getName(), now) for i in range(2): t = ThreadClass() t.start()
import Queue import threading import urllib2 import time from BeautifulSoup import BeautifulSoup hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com", "http://ibm.com", "http://apple.com"] queue = Queue.Queue() out_queue = Queue.Queue() class ThreadUrl(threading.Thread): """Threaded Url Grab""" def __init__(self, queue, out_queue): threading.Thread.__init__(self) self.queue = queue self.out_queue = out_queue def run(self): while True: #grabs host from queue host = self.queue.get() #grabs urls of hosts and then grabs chunk of webpage url = urllib2.urlopen(host) chunk = url.read() #place chunk into out queue self.out_queue.put(chunk) #signals to queue job is done self.queue.task_done() class DatamineThread(threading.Thread): """Threaded Url Grab""" def __init__(self, out_queue): threading.Thread.__init__(self) self.out_queue = out_queue def run(self): while True: #grabs host from queue chunk = self.out_queue.get() #parse the chunk soup = BeautifulSoup(chunk) print soup.findAll(['title']) #signals to queue job is done self.out_queue.task_done() start = time.time() def main(): #spawn a pool of threads, and pass them queue instance for i in range(5): t = ThreadUrl(queue, out_queue) t.setDaemon(True) t.start() #populate queue with data for host in hosts: queue.put(host) for i in range(5): dt = DatamineThread(out_queue) dt.setDaemon(True) dt.start() #wait on the queue until everything has been processed queue.join() out_queue.join() main() print "Elapsed Time: %s" % (time.time() - start)
dummy_threading
模塊提供徹底複製了threading模塊的接口,若是沒法使用thread,則能夠用這個模塊替代.
使用方法:
try: import threading as _threading except ImportError: import dummy_threading as _threading
在Python3中叫 _thread
,應該儘可能使用 threading
模塊替代。
dummy_thread
模塊提供徹底複製了thread模塊的接口,若是沒法使用thread,則能夠用這個模塊替代.
在Python3中叫 _dummy_thread
, 使用方法:
try: import thread as _thread except ImportError: import dummy_thread as _thread
最好使用 dummy_threading
來代替.
see:
使用 multiprocessing
模塊建立子進程而不是線程來克服GIL引發的問題.
舉例:
from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': p = Pool(5) print(p.map(f, [1, 2, 3]))
建立進程是使用Process類:
from multiprocessing import Process def f(name): print 'hello', name if __name__ == '__main__': p = Process(target=f, args=('bob',)) p.start() p.join()
Queue
方式:
from multiprocessing import Process, Queue def f(q): q.put([42, None, 'hello']) if __name__ == '__main__': q = Queue() p = Process(target=f, args=(q,)) p.start() print q.get() # prints "[42, None, 'hello']" p.join()
Pipe
方式:
from multiprocessing import Process, Pipe def f(conn): conn.send([42, None, 'hello']) conn.close() if __name__ == '__main__': parent_conn, child_conn = Pipe() p = Process(target=f, args=(child_conn,)) p.start() print parent_conn.recv() # prints "[42, None, 'hello']"
添加鎖:
from multiprocessing import Process, Lock def f(l, i): l.acquire() print 'hello world', i l.release() if __name__ == '__main__': lock = Lock() for num in range(10): Process(target=f, args=(lock, num)).start()
應該儘可能避免共享狀態.
共享內存方式:
from multiprocessing import Process, Value, Array def f(n, a): n.value = 3.1415927 for i in range(len(a)): a[i] = -a[i] if __name__ == '__main__': num = Value('d', 0.0) arr = Array('i', range(10)) p = Process(target=f, args=(num, arr)) p.start() p.join() print num.value print arr[:]
Server進程方式:
from multiprocessing import Process, Manager def f(d, l): d[1] = '1' d['2'] = 2 d[0.25] = None l.reverse() if __name__ == '__main__': manager = Manager() d = manager.dict() l = manager.list(range(10)) p = Process(target=f, args=(d, l)) p.start() p.join() print d print l
第二種方式支持更多的數據類型,如list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value ,Array.
經過Pool類能夠創建進程池:
from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': pool = Pool(processes=4) # start 4 worker processes result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously print result.get(timeout=1) # prints "100" unless your computer is *very* slow print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
在官方文檔只有一句話:
multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.
multiprocessing.dummy
是 multiprocessing 模塊的完整克隆,惟一的不一樣在於 multiprocessing 做用於進程,而 dummy 模塊做用於線程;IO 密集型任務選擇multiprocessing.dummy,CPU 密集型任務選擇multiprocessing.
舉例:
import urllib2 from multiprocessing.dummy import Pool as ThreadPool urls = [ 'http://www.python.org', 'http://www.python.org/about/', 'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html', 'http://www.python.org/doc/', 'http://www.python.org/download/', 'http://www.python.org/getit/', 'http://www.python.org/community/', 'https://wiki.python.org/moin/', 'http://planet.python.org/', 'https://wiki.python.org/moin/LocalUserGroups', 'http://www.python.org/psf/', 'http://docs.python.org/devguide/', 'http://www.python.org/community/awards/' # etc.. ] # Make the Pool of workers pool = ThreadPool(4) # Open the urls in their own threads # and return the results results = pool.map(urllib2.urlopen, urls) #close the pool and wait for the work to finish pool.close() pool.join() results = [] for url in urls: result = urllib2.urlopen(url) results.append(result)
- 若是選擇多線程,則應該儘可能使用
threading
模塊,同時注意GIL的影響- 若是多線程沒有必要,則使用多進程模塊
multiprocessing
,此模塊也經過multiprocessing.dummy
支持多線程.- 分析具體任務是I/O密集型,仍是CPU密集型
- https://docs.python.org/2/library/threading.html
- https://docs.python.org/2/library/thread.html#module-thread
- http://segmentfault.com/a/1190000000414339
- http://www.oschina.net/translate/pythons-hardest-problem
- http://www.w3cschool.cc/python/python-multithreading.html
- Python threads: communication and stopping
- Python - parallelizing CPU-bound tasks with multiprocessing
- Python Multithreading Tutorial: Concurrency and Parallelism
- An introduction to parallel programming–using Python's multiprocessing module
- multiprocessing Basics
- Python多進程模塊Multiprocessing介紹
- Multiprocessing vs Threading Python
- Parallelism in one line–A Better Model for Day to Day Threading Tasks
- 一行 Python 實現並行化 – 平常多線程操做的新思路
- 使用 Python 進行線程編程