爬蟲requests爬去網頁亂碼問題

時間 2019-12-02

原文原文鏈接

1:res.apparent_encodinghtml

2:res.encoding='utf-8'app

r.encoding	從HTTP header中猜想的響應內容編碼方式
r.apparent_encoding	從內容中分析出的響應內容編碼方式（備選編碼方式）

　　　　　　　　　　　　　　　　r.encoding：若是header中不存在charset，則認爲編碼爲ISO-8859-1
　　　　　　　　　　　　　　　　r.apparent_encoding：根據網頁內容分析出的編碼方式編碼

res.encoding=res.apparent_encodingurl

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　https://www.jianshu.com/p/d78982126318code

import requests
url = 'http://www.chinanews.com/gn/2018/05-24/8521399.shtml'
res = requests.get(url)
res.content
res.encoding #獲取res的編碼格式
res.headers #獲取Content-Type內容
res.apparent_encoding #獲取網頁正確的編碼格式
#html = res.text# 返回的結果是處理過的Unicode類型的數據
print(res.encoding)#得到網頁源碼的格式打印顯示 ISO-8859-1
print(res.content){網上一個大神的實驗}htm

相關標籤/搜索