下面的方法主要的功能:php
解析url的各個部分,並可以獲取url的query部分,並把query部分構建成dict。html
具體的代碼實現:python
>>> import urlparse >>> url = "http://www.example.org/default.html?ct=32&op=92&item=98" >>> urlparse.urlsplit(url) SplitResult(scheme='http', netloc='www.example.org', path='/default.html', query='ct=32&op=92&item=98', fragment='') >>> urlparse.parse_qs(urlparse.urlsplit(url).query) {'item': ['98'], 'op': ['92'], 'ct': ['32']} >>> dict(urlparse.parse_qsl(urlparse.urlsplit(url).query)) {'item': '98', 'op': '92', 'ct': '32'} >>>
注意:json
- 在Python3中, urlparse已經被移動到
urllib.parse
中。- 在
urlparse
中有兩個函數:urlparse.parse_qs()
和urlparse.parse_qsl()
。這兩個函數都能解析url中的query字段。若是url的query中有同一個key對應多個value,其中urlparse.parse_qs()
能夠把該相同key的value放在一個list中。- 有時間測試一下,若是url的query中有同一個key對應多個value,那麼服務端要怎樣接收。
import urlparse url=urlparse.urlparse('http://www.baidu.com/index.php?username=guol') >>> print url ParseResult(scheme='http', netloc='www.baidu.com', path='/index.php', params='', query='username=guol', fragment='') >>> print url.netloc www.baidu.com
有時url會進行編碼,例如搜索的中文關鍵詞會進行簡單的編碼,具體的解碼方法:函數
>>> import urlparse >>> from urlparse import unquote >>> url = "http://www.google.com/support/contact/bin/request.py?entity=%7B%22author%22:%22AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4%22,%22groups%22:%5B%22general%22,%2254296%7C700726330%22%5D,%22trustedMerchantId%22:%22MID_54316%22%7D&client=242&contact_type=anno&hl=en_US" >>> a = urlparse.urlparse(url).query >>> b = unquote(a) >>> b 'entity={"author":"AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4","groups":["general","54296|700726330"],"trustedMerchantId":"MID_54316"}&client=242&contact_type=anno&hl=en_US' >>> import HTMLParser >>> html_parser = HTMLParser.HTMLParser() >>> txt = html_parser.unescape(b) >>> txt u'entity={"author":"AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4","groups":["general","54296|700726330"],"trustedMerchantId":"MID_54316"}&client=242&contact_type=anno&hl=en_US' >>> c = urlparse.parse_qsl(txt, True) >>> c # c是一個list [(u'entity', u'{"author":"AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4","groups":["general","54296|700726330"],"trustedMerchantId":"MID_54316"}'), (u'client', u'242'), (u'contact_type', u'anno'), (u'hl', u'en_US')] >>> import json >>> c = dict(c) >>> d = json.loads(c['entity']) >>> d {u'trustedMerchantId': u'MID_54316', u'groups': [u'general', u'54296|700726330'], u'author': u'AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4'} >>> print d['groups'][-1] 54296|700726330 >>>
注意:測試
- 使用
urlparse.unquote
把編碼的url解碼。- 使用HTMLParser對url的特殊符號進行解碼。
- 把元組組成的list轉換成dict,每一個元組的第一個元素爲dict的key,第二個元素爲dict的value。