Some in urllib2 - python2.7

1. urlopen能夠給一個Request Object返回一個response object,read()讀取相應對象的內容,這時候的print(the_page)能夠輸出網頁的html內容html

1 import urllib2
2 
3 req = urllib2.Request('http://www.voidspace.org.uk')
4 response = urllib2.urlopen(req)
5 the_page = response.read()
6 
7 print(the_page)

 

2. Request對象能夠給server傳輸數據,還能夠傳輸一些額外信息(metadata),如HTTP"headers"python

 

3.如咱們所知request能夠用POST方式給server傳輸數據,這些數據能夠經過標準方式進行編碼以後進行傳輸,這裏用了urlencode函數進行編碼瀏覽器

 1 import urllib2
 2 import urllib
 3 
 4 url = 'http://www.someserver.com/cgi-bin/register.cgi'
 5 
 6 values = {'name':'Michael Foord',
 7           'location': 'Northampton',
 8           'language': 'Python'
 9           }
10 
11 data = urllib.urlencode(values)
12 req = urllib2.Request(url, data)
13 response = urllib2.urlopen(req)
14 
15 the_page = response.read()

 

 

  固然也能夠用GET模式來傳輸數據,默認沒有加data參數的時候就是使用GET模式,實際上咱們知道POST是將數據編碼後打包發送,GET相似與將數據加在url的末尾進行傳輸服務器

 1 import urllib2
 2 import urllib
 3 
 4 
 5 values = {'name':'Michael Foord',
 6           'location': 'Northampton',
 7           'language': 'Python'
 8           }
 9 
10 data = urllib.urlencode(values)
11 print(data) # encoded data
12 
13 url = 'http://www.example.com/example.cgi'
14 full_url = url + '?' + data #use '?' to add data at the end
15 req = urllib2.Request(full_url)
16 response = urllib2.urlopen(req)
17 
18 the_page = response.read()
19 print(the_page)

 

4.Headersapp

  一些服務器只提供給瀏覽器訪問,而上面的方式默認以名字python-urllib/2.7進行訪問,因此須要將本身「假裝」成瀏覽器的名字socket

 

 1 import urllib
 2 import urllib2
 3 
 4 url = 'http://www.someserver.com/cgi-bin/register.cgi'
 5 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
 6 
 7 values = {'name' : 'Michael Foord',
 8 'location' : 'Northampton',
 9 'language' : 'Python' }
10 
11 headers = { 'User-Agent' : user_agent }
12 data = urllib.urlencode(values)
13 
14 req = urllib2.Request(url, data, headers)
15 response = urllib2.urlopen(req)
16 the_page = response.read()

 

 

5. URLError with a "reason" attribute函數

 1 import urllib
 2 import urllib2
 3 from urllib2 import URLError
 4 
 5 req = urllib2.Request('http://www.pretend_server.org')
 6 
 7 try:
 8     urllib2.urlopen(req)
 9 except URLError as e:
10     print e.reason

 

 

6. HTTPError with a "code" attribute, codes in the 100-299 range indicatesuccess, you will usually only see error codes in the 400-599 range.

ui

 1 import urllib
 2 import urllib2
 3 from urllib2 import URLError
 4 
 5 req = urllib2.Request('http://www.python.org/fish.html')
 6 
 7 try:
 8     urllib2.urlopen(req)
 9 except urllib2.HTTPError as e:
10     print e.code
11     print e.read()

 

 

7. Two basic approaches編碼

 1 #1
 2 from urllib2 import Request, urlopen, URLError, HTTPError
 3 
 4 req = Request(someurl)
 5 
 6 try:
 7     response = urlopen(req)
 8 except HTTPError as e:
 9     print 'The server couldn\'t fulfill the request.'
10     print 'Error code: ', e.code
11 except URLError as e:
12     print 'We failed to reach a server.'
13     print 'Reason: ', e.reason
14 else:
15     print('everything is fine')
16 
17 #2
18 from urllib2 import Request, urlopen, URLError
19 
20 req = Request(someurl)
21 try:
22     response = urlopen(req)
23 except URLError as e:
24     if hasattr(e, 'reason'):
25         print 'We failed to reach a server.'
26         print 'Reason: ', e.reason
27     elif hasattr(e, 'code'):
28         print 'The server couldn\'t fulfill the request.'
29         print 'Error code: ', e.code
30 else:
31     # everything is fine

 

8. Basic Authentication
  當須要認證的時候,服務器會發出一個header來請求認證,如WWW-Authenticate: Basic realm="cPanel Users",而後用戶能夠把用戶名和密碼做爲一個header加在requese中再次請求.
通常不須要考慮格式範圍的話能夠直接用HTTPPasswordMgrWithDefaultRealm來設定某個URL的用戶和密碼

url

 1 from urllib2 import Request, urlopen, URLError
 2 import urllib2
 3 
 4 #create a password manager
 5 password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
 6 
 7 username = 'Prime'
 8 password = 'Bee'
 9 
10 top_level_url = "http://example.com/foo/"
11 password_mgr.add_password(None, top_level_url, username, password)
12 
13 handler = urllib2.HTTPBasicAuthHandler(password_mgr)
14 
15 opener = urllib2.build_opener(handler)
16 opener.open(someurl)
17 
18 # Install the opener, not necessarily
19 urllib2.install_opener(opener)

 

 

9. 設置socket的默認等待時間

1 import socket
2 
3 timeout = 10
4 socket.setdefaulttimeout(timeout)
相關文章
相關標籤/搜索