urllib,urllib2是客戶端http協議的實現,urllib2底層使用httplib,socket庫,它主要包含urlopen, build_opener, install_opener等func。python2.7使用urllib2庫中的urlopen會出現內存泄漏的現象,能夠經過gc模塊來視察內存泄漏狀況。python
# -*- coding: utf-8 -*- #!usr/bin/python import urllib2 import socket import gc # check memory on memory leaks def get_unreachable_memory_len(): #當設置DEBUG_SAVEALL後,全部unreachable對象會append到garbage中,不會被銷燬,從而進行視察,測試時使用。 gc.set_debug(gc.DEBUG_SAVEALL) gc.collect() unreachableL = [] for it in gc.garbage: unreachableL.append(it) #print(str(it)) print str(unreachableL) def task(): try: req = urllib2.urlopen('http://www.baidu.com/', timeout=3) text = req.read() #req.fp._sock.recv = None req.close() except urllib2.HTTPError, e: print e.code except urllib2.URLError, e: print e.reason else: print("urlopen success") if __name__ == '__main__': get_unreachable_memory_len() print("-------------------------") task() print("-------------------------") get_unreachable_memory_len()
運行程序肯定urlopen存在內存泄漏:app
python垃圾回收機制基於對象的引用計數,因此先找到形成循環引用的代碼。採用objgraph模塊打印出增長的對象。示例代碼以下:python2.7
# -*- coding: utf-8 -*- #!usr/bin/python import urllib2 import socket import gc import objgraph # check memory on memory leaks def get_unreachable_memory_len(): #當設置DEBUG_SAVEALL後,全部unreachable對象會append到garbage中,不會被銷燬,從而進行視察,測試時使用。 gc.set_debug(gc.DEBUG_SAVEALL) gc.collect() unreachableL = [] for it in gc.garbage: unreachableL.append(it) #print(str(it)) print str(unreachableL) def task(): try: req = urllib2.urlopen('http://www.baidu.com/', timeout=3) text = req.read() #req.fp._sock.recv = None req.close() except urllib2.HTTPError, e: print e.code except urllib2.URLError, e: print e.reason else: print("urlopen success") #class HTTPResponse(object): # pass if __name__ == '__main__': gc.set_debug(gc.DEBUG_SAVEALL) objgraph.show_growth() print("-------------------------") for i in range(5): task() print("-------------------------") objgraph.show_growth()
看到引用計數加5的三個字段,以及觀察到上一次運行結果首先出現的是httplib.HTTPResponse。socket
使用objgraph.show_backrefs對httplib.HTTPResponse進行分析:測試
# -*- coding: utf-8 -*- #!usr/bin/python import urllib2 import socket import gc import objgraph # check memory on memory leaks def get_unreachable_memory_len(): #當設置DEBUG_SAVEALL後,全部unreachable對象會append到garbage中,不會被銷燬,從而進行視察,測試時使用。 gc.set_debug(gc.DEBUG_SAVEALL) gc.collect() unreachableL = [] for it in gc.garbage: unreachableL.append(it) #print(str(it)) print str(unreachableL) def task(): try: req = urllib2.urlopen('http://www.baidu.com/', timeout=3) text = req.read() #req.fp._sock.recv = None req.close() except urllib2.HTTPError, e: print e.code except urllib2.URLError, e: print e.reason else: print("urlopen success") #class HTTPResponse(object): # pass if __name__ == '__main__': gc.set_debug(gc.DEBUG_SAVEALL) print("-------------------------") for i in range(5): task() print("-------------------------") objgraph.show_backrefs(objgraph.by_type('HTTPResponse')[0], max_depth = 10, filename = 'obj.dot')
將生成的obj.dot轉化爲obj.png(使用命令dot obj.dot -Tpng -o obj.png)圖示以下,記錄下形成循環引用的recv引用和read方法。ui
查看urllib2類圖能夠使用pycharm自動生成UML類圖,這裏須要分析urllib2.urlopen的調用流程,能夠引入pycallgraph模塊來分析,示例代碼入下:url
# -*- coding: utf-8 -*- #!usr/bin/python import urllib2 import socket import gc from pycallgraph import PyCallGraph from pycallgraph.output import GraphvizOutput def task(): graphviz = GraphvizOutput() graphviz.output_file = 'urlopen.png' with PyCallGraph(output=graphviz): try: req = urllib2.urlopen('http://www.baidu.com/', timeout=3) #text = req.read() #req.fp._sock.recv = None #req.close() except urllib2.HTTPError, e: print e.code except urllib2.URLError, e: print e.reason else: print("urlopen success") if __name__ == '__main__': task()
截取部分生成的調用流程圖:spa
在HTTPHandler類中的do_open方法中有這一行代碼:debug
這個r指的是HTTPResopnse類,它只有read方法而沒有recv方法,這個引用在urlopen調用結束後並無釋放。解決內存泄漏問題就須要消除改引用。3d
1)上述示例當中調用task()以後使用gc.collect()進行手動內存回收。
2)http鏈接close以前手動解決r.recv這個引用。
req = urllib2.urlopen('http://www.baidu.com/', timeout=3) text = req.read() #對於調用urlopen正常返回的狀況手動解除r.recv = r.read這個引用 req.fp._sock.recv = None req.close()
注:當返回錯誤狀態碼urllib2.HTTPError時沒法生效,須要修改urllib2.py源碼爲
3)改用更底層的socket,httplib庫。
參考資料:
1)http://python.jobbole.com/88827/
2)https://bugs.python.org/issue1208304
3)https://stackoverflow.com/questions/4214224/how-to-solve-python-memory-leak-when-using-urrlib2#