Python requests 模塊

時間 2019-11-06

標籤 python requests 模塊欄目 Python 简体版

原文原文鏈接

Python requests 模塊html

　　requests 模塊是咱們使用的 python爬蟲模塊能夠完成市場進80%的爬蟲需求。python

安裝ajax

pip install requests

使用json

requests模塊代碼編寫的流程：瀏覽器

- 指定url
- 發起請求
- 獲取響應對象中的數據
- 持久化存儲

-------------案例-------------------------
import requests
# 指定url
url="https://www.sogou.com/"
# 發起請求
response = requests.get(url)
# 獲取響應對象中的數據
page_text = response.text
# 持久化存儲
with open('./sogou.html','w',encoding='utf-8') as fp:
    fp.write()
-------------------------------------------

參數緩存

# post 數據
response = requests.post(url=url,data=data,headers=headers)

# get 數據
response = requests.get(url=url,data=data,headers=headers)

# 返回二進制數據
response.content  

# 返回字符串數據    
response.text    

# 返回json對象     
response.json()

其餘瞭解服務器

一、該模塊實現爬取數據前須要查找須要爬取數據的指定URL，可經過瀏覽器自帶抓包功能。python爬蟲

# 瀏覽器抓取 Ajax 請求
F12 --> Network --> XHR --> Name --> Response

二、上面的headers參數是進行UA假裝爲了反反爬post

反爬機制：UA檢測 --> UA假裝

三、下面是http咱們爬包是經常使用的請求頭參數url

　　- accept: 瀏覽器經過這個頭告訴服務器，他所支持的數據類型

　　- Accept-Charset：瀏覽器經過這個頭告訴服務器，它支持那種字符集

　　- Accept-Encoding：瀏覽器經過這個頭告訴服務器，支持的壓縮格式

　　- Accept-Language：瀏覽器經過這個頭告訴服務器，他的語言環境

　　- Host：瀏覽器同過這個頭告訴服務器，想訪問哪臺主機

　　- If-ModifiedSince：瀏覽器經過這個頭告訴服務器，緩存數據的時間

　　- Heferer：瀏覽器經過這個頭告訴服務器，客戶及時那個頁面來的，防盜鏈

　　- Connection：瀏覽器經過這個頭告訴服務器，請求完後是斷開連接仍是保持連接

　　- X-Requested-With：XMLHttpRequest 表明經過ajax方式進行訪問

　　- User-Agent：請求載體的身份標識