python urllib2與urllib

時間 2019-11-10

標籤 python urllib2 urllib 欄目 Python 简体版

原文原文鏈接

1.urllib2能夠接受一個Request對象，並以此能夠來設置一個URL的headers，可是urllib只接收一個URL。html

2.urllib模塊能夠提供進行urlencode的方法，該方法用於GET查詢字符串的生成，urllib2的不具備這樣的功能。python

1) urllib2.urlopen(url[, data][, timeout])服務器

3.urlopen方法是urllib2模塊最經常使用也最簡單的方法，它打開URL網址，url參數能夠是一個字符串url或者是一個Request對象。cookie

4.urlopen方法也可經過創建了一個Request對象來明確指明想要獲取的url。網絡

2) class urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable])app

Request類是一個抽象的URL請求。5個參數的說明以下函數

　　URL——是一個字符串，其中包含一個有效的URL。ui

　　data——是一個字符串，指定額外的數據發送到服務器，若是沒有data須要發送能夠爲「None」。這些數據須要被以標準的格式編碼（encode），而後做爲一個數據參數傳送給Request對象。Encoding是在urlib模塊中完成的，而不是在urlib2中完成的。this

　　headers——是字典類型，頭字典能夠做爲參數在request時直接傳入，也能夠把每一個鍵和值做爲參數調用add_header()方法來添加。標準的headers組成是(Content-Length, Content-Type and Host)，只有在Request對象調用urlopen()或者OpenerDirector.open()時加入。編碼

origin_req_host——是RFC2965定義的源交互的request-host。默認的取值是cookielib.request_host(self)。這是由用戶發起的原始請求的主機名或IP地址。例如，若是請求的是一個HTML文檔中的圖像，這應該是包含該圖像的頁面請求的request-host。

　　unverifiable ——表明請求是不是沒法驗證的，它也是由RFC2965定義的。默認值爲false。一個沒法驗證的請求是，其用戶的URL沒有足夠的權限來被接受。例如，若是請求的是在HTML文檔中的圖像，可是用戶沒有自動抓取圖像的權限，unverifiable的值就應該是true。

import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

5.調用urlopen函數對請求的url返回一個response對象。這個response相似於一個file對象，因此用.read()函數能夠操做這個response對象。

response對象的幾個經常使用的方法：

　　geturl() — 返回檢索的URL資源，這個是返回的真正url，一般是用來鑑定是否重定向的。

　　info() — 返回頁面的原信息就像一個字段的對象，如headers，它以mimetools.Message實例爲格式(能夠參考HTTP Headers說明)。

　　getcode() — 返回響應的HTTP狀態代碼。

　　　　當不能處理一個response時，urlopen拋出一個URLError（對於python APIs，內建異常如，ValueError, TypeError 等也會被拋出。）
　　HTTPError是HTTP URL在特別的狀況下被拋出的URLError的一個子類。

　　URLError——handlers當運行出現問題時（一般是由於沒有網絡鏈接也就是沒有路由到指定的服務器，或在指定的服務器不存在），拋出這個異常.它是IOError的子類.這個拋出的異常包括一個‘reason’ 屬性,他包含一個錯誤編碼和一個錯誤文字描述。

　　HTTPError——HTTPError是URLError的子類。每一個來自服務器HTTP的response都包含「status code」. 有時status code不能處理這個request. 默認的處理程序將處理這些異常的responses。例如，urllib2發現response的URL與你請求的URL不一樣時也就是發生了重定向時，會自動處理。對於不能處理的請求, urlopen將拋出HTTPError異常. 典型的錯誤包含‘404’ (沒有找到頁面), ‘403’ (禁止請求),‘401’ (須要驗證)等。它包含2個重要的屬性reason和code。

　　若是咱們想同時處理HTTPError和URLError，由於HTTPError是URLError的子類，因此應該把捕獲HTTPError放在URLError前面，如否則URLError也會捕獲一個HTTPError錯誤，代碼參考以下：

import urllib2
req = urllib2.Request('http://www.python.org/fish.html')
try:
　　response=urllib2.urlopen(req)
except urllib2.HTTPError,e:
　　print 'The server couldn\'t fulfill the request.'
　　print 'Error code: ',e.code
　　print 'Error reason: ',e.reason
except urllib2.URLError,e:
　　print 'We failed to reach a server.'
　　print 'Reason: ', e.reason
else:
　　# everything is fine
　　response.read()

代碼改進以下：

import urllib2
req = urllib2.Request('http://www.python.org/fish.html')
try:
　　response=urllib2.urlopen(req)
except urllib2.URLError as e:
　　if hasattr(e, 'reason'):
　　　　#HTTPError and URLError all have reason attribute.
　　　　print 'We failed to reach a server.'
　　　　print 'Reason: ', e.reason
　　elif hasattr(e, 'code'):
　　　　#Only HTTPError has code attribute.
　　　　print 'The server couldn\'t fulfill the request.'
　　　　print 'Error code: ', e.code
　　else:
　　# everything is fine
　　response.read()

# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {
100: ('Continue', 'Request received, please continue'),
101: ('Switching Protocols',
'Switching to new protocol; obey Upgrade header'),

200: ('OK', 'Request fulfilled, document follows'),
201: ('Created', 'Document created, URL follows'),
202: ('Accepted',
'Request accepted, processing continues off-line'),
203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
204: ('No Content', 'Request fulfilled, nothing follows'),
205: ('Reset Content', 'Clear input form for further input.'),
206: ('Partial Content', 'Partial content follows.'),

300: ('Multiple Choices',
'Object has several resources -- see URI list'),
301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
302: ('Found', 'Object moved temporarily -- see URI list'),
303: ('See Other', 'Object moved -- see Method and URL list'),
304: ('Not Modified',
'Document has not changed since given time'),
305: ('Use Proxy',
'You must use proxy specified in Location to access this '
'resource.'),
307: ('Temporary Redirect',
'Object moved temporarily -- see URI list'),

400: ('Bad Request',
'Bad request syntax or unsupported method'),
401: ('Unauthorized',
'No permission -- see authorization schemes'),
402: ('Payment Required',
'No payment -- see charging schemes'),
403: ('Forbidden',
'Request forbidden -- authorization will not help'),
404: ('Not Found', 'Nothing matches the given URI'),
405: ('Method Not Allowed',
'Specified method is invalid for this server.'),
406: ('Not Acceptable', 'URI not available in preferred format.'),
407: ('Proxy Authentication Required', 'You must authenticate with '
'this proxy before proceeding.'),
408: ('Request Timeout', 'Request timed out; try again later.'),
409: ('Conflict', 'Request conflict.'),
410: ('Gone',
'URI no longer exists and has been permanently removed.'),
411: ('Length Required', 'Client must specify Content-Length.'),
412: ('Precondition Failed', 'Precondition in headers is false.'),
413: ('Request Entity Too Large', 'Entity is too large.'),
414: ('Request-URI Too Long', 'URI is too long.'),
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
416: ('Requested Range Not Satisfiable',
'Cannot satisfy request range.'),
417: ('Expectation Failed',
'Expect condition could not be satisfied.'),

500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), }

1. Python urllib與urllib2
2. python urllib 和 urllib2
3. Python urllib urllib2
4. [python] urllib 和 urllib2
5. python-urllib/urllib2模塊
6. urllib urllib2
7. urllib與urllib2的區別
8. httplib,urllib和urllib2
9. URLLIB,URLLIB2,HTTPLIB
10. Python: difference between urllib and urllib2
更多相關文章...
• SQLite - Python - SQLite教程
• Docker 安裝 Python - Docker教程
• Composer 安裝與使用
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

python+urllib+beautifulsoup

python+urllib+beautifusoup

python+urllib+beautifulsoup+pymysql

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。