python2使用urllib/urllib2實現Http請求

在Http請求中，最爲常見的兩種請求爲GET、POST請求，下面實現方式主要是以urllib/urilib2方式實現。
urllib/urllib2是python中兩個內置的模塊，要實現Http功能，實現方式以urllib2爲主，urllib爲輔，在urllib2中提供了一個完整的基礎函數urllib2.urlopen(url)，經過向指定的url發出請求來獲取數據一、GET請求的實現python

    import urllib2
    response = urllib2.urlopen(「127.0.0.1:8800」)
    content = resonse.read()
    print content

在上述的實現方式中，能夠對分爲請求、響應兩步，形式以下： import urllib2 #生成一個請求 requset = urllib2.Requset("127.0.0.1:8800") #請求與響應 response = urllib2.urlopen(requset) content = response.read()瀏覽器

二、POST請求的實現服務器

import urllib
import urllib2
url = "127.0.0.1:8800"
#請求數據
postdata = {
	'username':  'lxn',
	'password': '888888888'
}
#將數據編碼
data = urllib.urllencode(postdata)
#生成一個請求而且寫入頭信息
req = urllib.Request(url, data)
#請求與響應
response = urllib2.urlopen(req)
content = response.read()

上面實現方式就是一個簡單的post請求，可是有時可能會出現這種狀況：即便POST請求的數據是對的，可是服務器仍是拒絕你的訪問。這是爲何呢？問題出如今請求中的頭信息中，由於服務器會校驗請求頭來判斷是否來自瀏覽器的訪問，好比在反爬蟲的引用中。咱們能夠經過加上請求頭信息：socket

import urllib
import urllib2
url = "127.0.0.1:8800"
headers = {
	'User-Agent':"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)",
	'Referer':'127.0.0.1:8800'
}
#請求數據
postdata = {
	'username':  'lxn',
	'password': '888888888'
}
#將數據編碼
data = urllib.urllencode(postdata)
#生成一個請求而且寫入頭信息
req = urllib.Request(url, data，headers)
#請求與響應
response = urllib2.urlopen(req)
content = response.read()

咱們也可使用add_header方式來添加頭信息：函數

import urllib
     import urllib2
     url = '127.0.0.1:8800/login'
     postdata = {'username' : 'lxn',
                        'password' : '88888888'}
     data = urllib.urlencode(postdata)
     req = urllib2.Request(url)
     # 將user_agent,referer寫入頭信息
     req.add_header('User-Agent', 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)')
     req.add_header('Referer','http://www.xxxxxx.com/')
     req.add_data(data)
     response = urllib2.urlopen(req)
     content = response.read()

三、Timeout超時設定在Python2.6以前的版本，urllib2的API中並無開放Timeout超時接口，要設定Timeout值，只能更改Socket的全局Timeout值，實例以下：post

import urllib2
     import socket
     socket.setdefaulttimeout(10) # 10 秒鐘後超時
     urllib2.socket.setdefaulttimeout(10) # 另外一種方式

在Python2.6及新的版本中，urlopen函數提供了對Timeout的設置，示例以下：編碼

import urllib2
     request=urllib2.Request('127.0.0.1:8800/login')
     response = urllib2.urlopen(request,timeout=2) #2秒後超時
     content=response.read()

四、獲取HTTP響應碼對於200OK來講，只要使用urlopen返回的response對象的getcode()方法就能夠獲得HTTP的返回碼（只針對返回碼爲200的請求）。但對其餘返回碼來講，urlopen會拋出異常。這時候，就要檢查異常對象的code屬性了，示例以下：url

import urllib2
     try:
        response = urllib2.urlopen('127.0.0.1:8800')
        print response
     except urllib2.HTTPError as e:
        if hasattr(e, 'code'):
                print 'Error code:',e.code

參考書籍：《Python爬蟲開發與項目實戰》 — 範傳輝編著code