1. urlopen能夠給一個Request Object返回一個response object,read()讀取相應對象的內容,這時候的print(the_page)能夠輸出網頁的html內容html
1 import urllib2 2 3 req = urllib2.Request('http://www.voidspace.org.uk') 4 response = urllib2.urlopen(req) 5 the_page = response.read() 6 7 print(the_page)
2. Request對象能夠給server傳輸數據,還能夠傳輸一些額外信息(metadata),如HTTP"headers"python
3.如咱們所知request能夠用POST方式給server傳輸數據,這些數據能夠經過標準方式進行編碼以後進行傳輸,這裏用了urlencode函數進行編碼瀏覽器
1 import urllib2 2 import urllib 3 4 url = 'http://www.someserver.com/cgi-bin/register.cgi' 5 6 values = {'name':'Michael Foord', 7 'location': 'Northampton', 8 'language': 'Python' 9 } 10 11 data = urllib.urlencode(values) 12 req = urllib2.Request(url, data) 13 response = urllib2.urlopen(req) 14 15 the_page = response.read()
固然也能夠用GET模式來傳輸數據,默認沒有加data參數的時候就是使用GET模式,實際上咱們知道POST是將數據編碼後打包發送,GET相似與將數據加在url的末尾進行傳輸服務器
1 import urllib2 2 import urllib 3 4 5 values = {'name':'Michael Foord', 6 'location': 'Northampton', 7 'language': 'Python' 8 } 9 10 data = urllib.urlencode(values) 11 print(data) # encoded data 12 13 url = 'http://www.example.com/example.cgi' 14 full_url = url + '?' + data #use '?' to add data at the end 15 req = urllib2.Request(full_url) 16 response = urllib2.urlopen(req) 17 18 the_page = response.read() 19 print(the_page)
4.Headersapp
一些服務器只提供給瀏覽器訪問,而上面的方式默認以名字python-urllib/2.7進行訪問,因此須要將本身「假裝」成瀏覽器的名字socket
1 import urllib 2 import urllib2 3 4 url = 'http://www.someserver.com/cgi-bin/register.cgi' 5 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 6 7 values = {'name' : 'Michael Foord', 8 'location' : 'Northampton', 9 'language' : 'Python' } 10 11 headers = { 'User-Agent' : user_agent } 12 data = urllib.urlencode(values) 13 14 req = urllib2.Request(url, data, headers) 15 response = urllib2.urlopen(req) 16 the_page = response.read()
5. URLError with a "reason" attribute函數
1 import urllib 2 import urllib2 3 from urllib2 import URLError 4 5 req = urllib2.Request('http://www.pretend_server.org') 6 7 try: 8 urllib2.urlopen(req) 9 except URLError as e: 10 print e.reason
6. HTTPError with a "code" attribute, codes in the 100-299 range indicatesuccess, you will usually only see error codes in the 400-599 range.
ui
1 import urllib 2 import urllib2 3 from urllib2 import URLError 4 5 req = urllib2.Request('http://www.python.org/fish.html') 6 7 try: 8 urllib2.urlopen(req) 9 except urllib2.HTTPError as e: 10 print e.code 11 print e.read()
7. Two basic approaches編碼
1 #1 2 from urllib2 import Request, urlopen, URLError, HTTPError 3 4 req = Request(someurl) 5 6 try: 7 response = urlopen(req) 8 except HTTPError as e: 9 print 'The server couldn\'t fulfill the request.' 10 print 'Error code: ', e.code 11 except URLError as e: 12 print 'We failed to reach a server.' 13 print 'Reason: ', e.reason 14 else: 15 print('everything is fine') 16 17 #2 18 from urllib2 import Request, urlopen, URLError 19 20 req = Request(someurl) 21 try: 22 response = urlopen(req) 23 except URLError as e: 24 if hasattr(e, 'reason'): 25 print 'We failed to reach a server.' 26 print 'Reason: ', e.reason 27 elif hasattr(e, 'code'): 28 print 'The server couldn\'t fulfill the request.' 29 print 'Error code: ', e.code 30 else: 31 # everything is fine
8. Basic Authentication
當須要認證的時候,服務器會發出一個header來請求認證,如WWW-Authenticate: Basic realm="cPanel Users",而後用戶能夠把用戶名和密碼做爲一個header加在requese中再次請求.
通常不須要考慮格式範圍的話能夠直接用HTTPPasswordMgrWithDefaultRealm來設定某個URL的用戶和密碼
url
1 from urllib2 import Request, urlopen, URLError 2 import urllib2 3 4 #create a password manager 5 password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() 6 7 username = 'Prime' 8 password = 'Bee' 9 10 top_level_url = "http://example.com/foo/" 11 password_mgr.add_password(None, top_level_url, username, password) 12 13 handler = urllib2.HTTPBasicAuthHandler(password_mgr) 14 15 opener = urllib2.build_opener(handler) 16 opener.open(someurl) 17 18 # Install the opener, not necessarily 19 urllib2.install_opener(opener)
9. 設置socket的默認等待時間
1 import socket 2 3 timeout = 10 4 socket.setdefaulttimeout(timeout)