python URLObject url處理模塊

一、需求來源

給一個url串,例如https://github.com/zacharyvoase/urlobject?spam=eggs#foo,想要截取串中某個部分,好比傳輸協議(https)、服務器名稱、用戶名密碼、路徑信息、後面query等。本身能想到的主要由如下幾種方法:
   (1)正則
   (2)使用字符串處理函數
   (3)使用urlobject模塊
   (4)使用urlparser模塊
 
  第一次接觸urlobject,總結一下其使用方法。

二、urlobject安裝

    pip install urlobject
    

三、urlobject基礎使用  

    urlobject的基本使用
   (1){經過建立URLObject 來表示URL,URLObject 是unicode(Python3中是str)的普通子類。下面幾種簡單的方法能夠獲取想要獲取的部分。
 
>>>from urlobject import URLOBject   
>>>url = URLObject("https://github.com/zacharyvoase/urlobject?spam=eggs#foo")  
>>> print(url)  
https://github.com/zacharyvoase/urlobject?spam=eggs#foo  
>>> print(url.scheme)  #獲取傳輸協議  
https  
>>> print(url.netloc) #獲取服務器主機,全網絡地址,包括username,password,port等  
github.com  
>>> print(url.hostname)#獲取服務器主機  
github.com  
>>> (url.username, url.password)#用戶名、密碼  
(None, None)  
>>> print(url.port) #端口號  
None  
>>> url.default_port  
443  
>>> print(url.path)#獲取路徑  
/zacharyvoase/urlobject  
>>> print(url.query)#獲取query  
spam=eggs  
>>> print(url.fragment) #獲取fragment  

 

   (2)能夠經過使用with_*()方法,替換任何想替換的部分,由於unicode是不可變的,固然URLObject也是如此,所以下面的方法不會改變原有的URLObject,而是會返回新的URLObject:html

 

>>> print(url.with_scheme('http'))  
http://github.com/zacharyvoase/urlobject?spam=eggs#foo  
>>> print(url.with_netloc('example.com'))  
https://example.com/zacharyvoase/urlobject?spam=eggs#foo  
>>> print(url.with_auth('alice', '1234'))  
https://alice:1234@github.com/zacharyvoase/urlobject?spam=eggs#foo  
>>> print(url.with_path('/some_page'))  
https://github.com/some_page?spam=eggs#foo  
>>> print(url.with_query('funtimes=yay'))  
https://github.com/zacharyvoase/urlobject?funtimes=yay#foo  
>>> print(url.with_fragment('example'))  
https://github.com/zacharyvoase/urlobject?spam=eggs#example 

(3)對於url中的query來講,可使用without_方法:node

>>> print(url.without_query())  
https://github.com/zacharyvoase/urlobject#foo  
>>> print(url.without_fragment())  

https:
//github.com/zacharyvoase/urlobject?spam=eggs

 

 

四、relative函數相對URL的處理

   針對某 一 url,例如https://github.com/zacharyvoase/urlobject?spam=eggs#foo,若是隻想換掉urlobject?spam=eggs#foo,或者換掉zacharyvoase/urlobject?spam=eggs#foo。能夠經過relative函數來處理
 
>>> print(url.relative('another-project'))  
https://github.com/zacharyvoase/another-project  
>>> print(url.relative('?different-query-string'))  
https://github.com/zacharyvoase/urlobject?different-query-string  
>>> print(url.relative('#frag'))  
https://github.com/zacharyvoase/urlobject?spam=eggs#frag  

 若是relative參數爲徹底的url則,返回全新的urlpython

>>> print(url.relative('http://example.com/foo'))  
http://example.com/foo  

 

根據本身的需求能夠任意設置相對路徑的起始位置來知足需求git

>>> print(url.relative('//example.com/foo'))  
https://example.com/foo  
>>> print(url.relative('/dvxhouse/intessa'))  
https://github.com/dvxhouse/intessa  
>>> print(url.relative('/dvxhouse/intessa?foo=bar'))  
https://github.com/dvxhouse/intessa?foo=bar  
>>> print(url.relative('/dvxhouse/intessa?foo=bar#baz'))  
https://github.com/dvxhouse/intessa?foo=bar#baz  

 

五、Path

 url的path屬性實際上是一個URLPath對象,有一下方法和屬性,來處理path
>>> print(url.path)  
/zacharyvoase/urlobject  
>>> print(url.path.parent)  
/zacharyvoase/  
>>> print(url.path.segments)  
('zacharyvoase', 'urlobject')  
>>> print(url.path.add_segment('subnode'))  
/zacharyvoase/urlobject/subnode  
>>> print(url.path.root)  

   這些方法一樣適用於URLObject,一樣將會返回全心的URLObject對象,而不是對原有url進行更改,以下:github

 
>>> print(url.parent)  
https://github.com/zacharyvoase/?spam=eggs#foo  
>>> print(url.add_path_segment('subnode'))  
https://github.com/zacharyvoase/urlobject/subnode?spam=eggs#foo  
>>> print(url.add_path('tree/urlobject2'))  
https://github.com/zacharyvoase/urlobject/tree/urlobject2?spam=eggs#foo  
>>> print(url.root)  
https://github.com/?spam=eggs#foo  

六、針對url串中query的處理

URLObject中的query屬性是一個QueryString對象,能夠調用對象中的方法對query進行處理
 
>>> print(url.query)  
spam=eggs  
>>> url.query.list  # aliased as url.query_list  
[('spam', 'eggs')]  
>>> url.query.dict  # aliased as url.query_dict  
{'spam': 'eggs'}  
>>> url.query.multi_dict  # aliased as url.query_multi_dict  
{'spam': ['eggs']}  

在query基礎上修改也是比較簡單的,能夠add或者setquery參數,以add開頭的,能夠設置某個key爲多個value:api

>>> print(url.query.add_param('spam', 'ham'))  
spam=eggs&spam=ham  

而以set爲開頭的函數,會使某個key只有一個value服務器

>>> print(url.query.set_param('spam', 'ham'))  
spam=ham  

 


參數能夠是一個字典網絡

>>> print(url.query.add_params({'spam': 'ham', 'foo': 'bar'}))  
spam=eggs&foo=bar&spam=ham  
>>> print(url.query.set_params({'spam': 'ham', 'foo': 'bar'}))  
foo=bar&spam=ham  

可使用del_param()或者del_params()刪除query中的參數函數

>>> print(url.query.del_param('spam')) # Result is empty  
  
>>> print(url.query.add_params({'foo': 'bar', 'baz': 'blah'}).del_params(['spam', 'foo']))  
baz=blah  

 

上面針對query對象的方法,是URLObject對象的方法的別名,能夠直接在URLObject對象上直接操做,實際上是調用的同一段代碼:url

>>> print(url.add_query_param('spam', 'ham'))  
https://github.com/zacharyvoase/urlobject?spam=eggs&spam=ham#foo  
>>> print(url.set_query_param('spam', 'ham'))  
https://github.com/zacharyvoase/urlobject?spam=ham#foo  
>>> print(url.del_query_param('spam'))  
https://github.com/zacharyvoase/urlobject#foo  

 

七、總結

 
具體的URLObject類的函數api 具體參考這個網址: https://urlobject.readthedocs.io/en/latest/api.html
相關文章
相關標籤/搜索