【python】爬蟲實踐

時間 2019-12-14

標籤 python 爬蟲實踐欄目 Python 简体版

原文原文鏈接

參考連接

http://www.javashuo.com/article/p-zyvlfxyh-d.html
詳解 python3 urllib
https://www.jianshu.com/p/2e190438bd9chtml

須要的包

requests

官方文檔：
https://docs.python.org/3/library/urllib.htmlpython

urllib.request for opening and reading URLs
- 函數原型：urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
- data: 發送數據，
  - params 須要被轉碼成字節流。而 params 是一個字典
  - 使用 urllib.parse.urlencode() 將字典轉化爲字符串。\n
  - 再使用 bytes() 轉爲字節流。最後使用 urlopen() 發起請求，請求是模擬用 POST 方式提交表單數據。
  - data = bytes(urllib.parse.urlencode(params), encoding='utf8')
  - response = urllib.request.urlopen(url, data=data)
  - 使用 data 參數，請求方式變成以 POST 方式提交表單。使用標準格式是application/x-www-form-urlencoded
- timeout 參數是用於設置請求超時時間。單位是秒。
- cafile和capath表明 CA 證書和 CA 證書的路徑。若是使用HTTPS則須要用到。
- context參數必須是ssl.SSLContext類型，用來指定SSL設置
- cadefault參數已經被棄用，能夠不用管了。
- 該方法也能夠單獨傳入urllib.request.Request對象
- 該函數返回結果是一個http.client.HTTPResponse對象。
- 函數原型：urllib.request.Request(url, data=None, headers={},origin_req_host=None,unverifiable=False, method=None)
urllib.error containing the exceptions raised by urllib.request
urllib.parse for parsing URLs
urllib.robotparser for parsing robots.txt files

相關文章

相關標籤/搜索

python爬蟲實戰

爬蟲－反爬蟲

python爬蟲-爬微博

python 網絡爬蟲

紅包項目實戰

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

最新文章

本站公眾號

歡迎關注本站公眾號,獲取更多信息

相關文章

>>更多相關文章<<