Python學習之urlib模塊和urllib2模塊學習

時間 2019-11-25

標籤 python 學習 urlib 模塊 urllib2 urllib 欄目 Python 简体版

原文原文鏈接

原創做品，容許轉載，轉載時請務必以超連接形式標明文章原始出處、做者信息和本聲明。不然將追究法律責任。http://john88wang.blog.51cto.com/2165294/1441495 php

一 urlib模塊
html

利用urllib模塊能夠打開任意個url。python

urlopen() 打開一個url返回一個文件對象，能夠進行相似文件對象的操做。json

In [308]: import urllib 
In [309]: file=urllib.urlopen(' 
In [310]: file.readline()
Out[310]: '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8

能夠用read(),readlines(),fileno(),close()這些函數api

In [337]: file.info()
Out[337]: <httplib.HTTPMessage instance at 0x2394a70>
 
In [338]: file.getcode()
Out[338]: 200
 
In [339]: file.geturl()
Out[339]: 'http://www.baidu.com/'

2.urlretrieve() 將url對應的html頁面保存爲文件瀏覽器

In [404]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')
 
In [405]: type (filename)
Out[405]: <type 'tuple'>
 
In [406]: filename[0]
Out[406]: '/tmp/baidu.html'
 
In [407]: filename
Out[407]: ('/tmp/baidu.html', <httplib.HTTPMessage instance at 0x23ba878>)
 
In [408]: filename[1]
Out[408]: <httplib.HTTPMessage instance at 0x23ba878>

3.urlcleanup() 清除由urlretrieve()產生的緩存緩存

In [454]: filename=urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu.html')
 
In [455]: urllib.urlcleanup()

4.urllib.quote()和urllib.quote_plus() 將url進行編碼
服務器

In [483]: urllib.quote('http://www.baidu.com')
Out[483]: 'http%3A//www.baidu.com'
 
In [484]: urllib.quote_plus('http://www.baidu.com')
Out[484]: 'http%3A%2F%2Fwww.baidu.com'

5.urllib.unquote()和urllib.unquote_plus() 將編碼後的url解碼cookie

In [514]: urllib.unquote('http%3A//www.baidu.com')
Out[514]: 'http://www.baidu.com'
 
In [515]: urllib.unquote_plus('http%3A%2F%2Fwww.baidu.com')
Out[515]: 'http://www.baidu.com'

6.urllib.urlencode() 將url中的鍵值對以&劃分，能夠結合urlopen()實現POST方法和GET方法函數

In [560]: import urllib 
In [561]: params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0}) ]
In [562]: f=urllib.urlopen("http://python.org/query?%s" %params) 
In [563]: f.readline()
Out[563]: '<!doctype html>\n' 
In [564]: f.readlines()
Out[564]: ['<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->\n',
 '<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->\n', 
 '<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->\n', 
 '<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->\n',
  '\n',

二 urllib2模塊

urllib2比urllib多了些功能，例如提供基本的認證，重定向，cookie等功能

https://docs.python.org/2/library/urllib2.html

https://docs.python.org/2/howto/urllib2.html

In [566]: import urllib2
 
In [567]: f=urllib2.urlopen('http://www.python.org/')
 
In [568]: print f.read(100)
--------> print(f.read(100))
<!doctype html>
<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->

打開python的官網並返回頭100個字節內容

HTTP基於請求和響應，客戶端發送請求，服務器響應請求。urllib2使用一個Request對象表明發送的請求，調用urlopen()打開Request對象能夠返回一個response對象。reponse對象是一個相似文件的對象，能夠像文件同樣進行操做

In [630]: import urllib2
 
In [631]: req=urllib2.Request('http://www.baidu.com')
 
In [632]: response=urllib2.urlopen(req)
 
In [633]: the_page=response.read()
 
In [634]: the_page
Out[634]: '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.

一般狀況下須要向一個url以POST的方式發送數據。

In [763]: import urllib
 
In [764]: import urllib2
 
In [765]: url='http://xxxxxx/login.php'
 
In [766]: values={'ver' : '1.7.1', 'email' : 'xxxxx', 'password' : 'xxxx', 'mac' : '111111111111'}
 
In [767]: data=urllib.urlencode(values)
 
In [768]: req=urllib2.Request(url,data)
 
In [769]: response=urllib2.urlopen(req)
 
In [770]: the_page=response.read()
 
In [771]: the_page

若是不使用urllib2.Request()發送data參數，urllib2使用GET請求，GET請求和POST請求差異在於POST請求常有反作用，POST請求會經過某些方式改變系統的狀態。也能夠經過GET請求發送數據。

In [55]: import urllib2
 
In [56]: import urllib
 
In [57]: url='http://xxx/login.php'
 
In [58]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxx'}
 
In [59]: data=urllib.urlencode(values)
 
In [60]: full_url=url + '?' + data
 
In [61]: the_page=urllib2.urlopen(full_url)
 
In [63]: the_page.read()
Out[63]: '{"result":0,"data":0}'

默認狀況下,urllib2使用Python-urllib/2.6 代表瀏覽器類型，能夠經過增長User-Agent HTTP頭

In [107]: import urllib
 
In [108]: import urllib2
 
In [109]: url='http://xxx/login.php'
 
In [110]: user_agent='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
 
In [111]: values={'ver' : 'xxx', 'email' : 'xxx', 'password' : 'xxx', 'mac' : 'xxxx'}
 
In [112]: headers={'User-Agent' : user_agent}
 
In [114]: data=urllib.urlencode(values)
 
In [115]: req=urllib2.Request(url,data,headers)
 
In [116]: response=urllib2.urlopen(req)
 
In [117]: the_page=response.read()
 
In [118]: the_page

當給定的url不能鏈接時，urlopen()將報URLError異常，當給定的url內容不能訪問時，urlopen()會報HTTPError異常

#/usr/bin/python
 
from urllib2 import Request,urlopen,URLError,HTTPError
req=Request('http://10.10.41.42/index.html')
try:
   response=urlopen(req)
except HTTPError as e:
   print 'The server couldn\'t fulfill the request.'
   print 'Error code:',e.code
 
except URLError as e:
   print 'We failed to fetch a server.'
   print 'Reason:',e.reason
else:
   print "Everything is fine"

這裏須要注意的是在寫異常處理時，HTTPError必需要寫在URLError前面

#/usr/bin/python
 
from urllib2 import Request,urlopen,URLError,HTTPError
req=Request('http://10.10.41.42')
try:
   response=urlopen(req)
 
except URLError as e:
   if hasattr(e,'reason'):
      print 'We failed to fetch a server.'
      print 'Reason:',e.reason
   elif hasattr(e,'code'):
      print 'The server couldn\'t fulfill the request.'
      print 'Error code:',e.code
else:
   print "Everything is fine"

hasattr()函數判斷一個對象是否有給定的屬性

使用urllib2模塊登陸須要基本認證的頁面

登陸RabbitMQ的管理頁面須要進行用戶名和密碼驗證。

In [63]: url='http://172.30.25.179:15672/api/aliveness-test/%2f'
 
In [64]: username='guest'
 
In [65]: password='guest'
 
In [66]: mgr=urllib2.HTTPPasswordMgrWithDefaultRealm()
 
In [67]: s=mgr.add_password(None,url,username,password)
 
In [68]: handler=urllib2.HTTPBasicAuthHandler(mgr)
 
In [69]: opener=urllib2.build_opener(handler)
 
In [70]: opener.open(url).read()
Out[70]: '{"status":"ok"}'
 
json.loads(opener.open(url).read())

參考文檔：

https://docs.python.org/2/library/urllib2.html#module-urllib2

本文出自「Linux SA John」博客，請務必保留此出處http://john88wang.blog.51cto.com/2165294/1441495