Python爬蟲連載4-Error模塊、Useragent詳解

時間 2020-01-21

標籤 python 爬蟲連載 error 模塊 useragent 詳解欄目 Python 简体版

原文原文鏈接

1、errorhtml

1.URLError產生的緣由：（1）沒有網絡；（2）服務器鏈接失敗；（3）不知道指定服務器；（4）是OSError的子類git

from urllib import request,error



if __name__ == "__main__":

    url = "http://www.baidu.comfdsfdfsf"

    try:

        req = request.Request(url)

        rsp = request.urlopen(req)

        html = rsp.read().decode()

        print(html)

    except error.URLError as e:

        print("URLError:{0}".format(e.reason))

        print("URLError:{0}".format(e))

    except Exception as e:

        print(e)

2.HTTPError是URLError的一個子類github

3.二者區別：HTTPError是對應的HTTP請求的返回碼錯誤，若是返回錯誤碼碼是400以上的，則引起HTTPError；URLError對應的通常時網絡出現問題，包括url問題；關係區別：OSError-URLError-HTTPError瀏覽器

2、useragent服務器

1.UserAgent：用戶代理，簡稱UA，屬於heads的一部分，服務器經過UA來判斷訪問者身份；常見的UA值，使用的時候能夠直接複製粘貼，也能夠用瀏覽器訪問的時候抓包。以下面的連接：微信

https://blog.csdn.net/wangqing84411433/article/details/89600335網絡

2.設置UA能夠經過兩種方式：heads\ide

 

    url2 = "http://www.baiu.com"

    try:

        #使用head方法假裝UA

        headers = {}

        headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"

        req2 = request.Request(url2,headers=headers)

        rsp2 =  request.urlopen(req2)

        html2 = rsp2.read().decode()

        print(html2)

    except error.HTTPError as e:

        print("URLError:{0}".format(e.reason))

        print("URLError:{0}".format(e))

    except error.URLError as e:

        print("URLError:{0}".format(e.reason))

        print("URLError:{0}".format(e))

    except Exception as e:

        print(e)

也能夠把學習

req2 = request.Request(url2,headers=headers)

改爲以下形式也能夠大數據

 

        req2 = request.Request(url2)

        req2.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko")

3、源碼

Reptile4_ErrrorAndUserAgent.py

https://github.com/ruigege66/PythonReptile/blob/master/Reptile4_ErrrorAndUserAgent.py

2.CSDN：https://blog.csdn.net/weixin_44630050

3.博客園：https://www.cnblogs.com/ruigege0000/

4.歡迎關注微信公衆號：傅里葉變換，我的公衆號，僅用於學習交流，後臺回覆」禮包「，獲取大數據學習資料