python網絡訪問的標準模塊html
urllib與urllib2並非升級版的關係,具體可見谷歌文章:difference between urllib and urllib2
urllib2的官方文檔:https://docs.python.org/2.7/library/urllib2.html#module-urllib2
最簡單的應用:
urllib2.urlopen(url,data,timeout)
data:以post提交url時用的
urllib2.Request(url, data=None, headers={},origin_req_host=None, unverifiable=False)python
headers:發給服務器的身份證號,默認狀況下urllib2的身份證號爲本身的版本號Python-urllib/x.y。瀏覽器
網站經過瀏覽器發送過來的User-Agent的值來確認瀏覽器身份,所以用urllib2建立一個請求對象,並給它一個包含頭數據的字典來欺騙網站。服務器
#!/usr/bin/env python # _*_ coding: utf-8 _*_ # __author__ ='kong' import urllib2 import userAgents class Urllib2ModifyHeader(object): def __init__(self): PIUA = {"User-Agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"} MUUA = {"User-Agent":"NOKIA5700/ UCWEB7.0.2.37/28/999"} self.url = "http://fanyi.youdao.com" self.useUserAgent(PIUA,1) self.useUserAgent(MUUA,2) def useUserAgent(self,userAgent,name): request = urllib2.Request(self.url,headers=userAgent) # request.add_header(userAgent.split(":")[0],userAgent.split(":")[1]) response = urllib2.urlopen(request) fileName = str(name) + '.html' with open(fileName,'a') as fp: fp.write("%s\n\n"%userAgent) fp.write(response.read()) if __name__ == '__main__': umh = Urllib2ModifyHeader()
解釋:網絡