一、web爬蟲，requests請求

時間 2019-12-05

標籤 web 爬蟲 requests 請求欄目 HTML 简体版

原文原文鏈接

【百度雲搜索，搜各類資料:http://bdy.lqkweb.com】

【搜網盤，搜各類資料:http://www.swpan.cn】

requests請求，就是用python的requests模塊模擬瀏覽器請求，返回html源碼html

模擬瀏覽器請求有兩種，一種是不須要用戶登陸或者驗證的請求，一種是須要用戶登陸或者驗證的請求python

1、不須要用戶登陸或者驗證的請求web

這種比較簡單，直接利用requests模塊發一個請求便可拿到html源碼瀏覽器

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模擬瀏覽器請求模塊

http =requests.get(url="http://www.iqiyi.com/")     #發送http請求
http.encoding = "utf-8"                             #http請求編碼
neir = http.text                                    #獲取http字符串代碼
print(neir)

獲得html源碼服務器

<!DOCTYPE html>
<html>
<head>
<title>抽屜新熱榜-聚合每日熱門、搞笑、有趣資訊</title>
        <meta charset="utf-8" />
        <meta name="keywords" content="抽屜新熱榜,資訊,段子,圖片,公衆場合不宜,科技,新聞,節操,搞笑" />

        <meta name="description" content="
            抽屜新熱榜，匯聚每日搞笑段子、熱門圖片、有趣新聞。它將微博、門戶、社區、bbs、社交網站等海量內容聚合在一塊兒，經過用戶推薦生成最熱榜單。看抽屜新熱榜，每日熱門、有趣資訊一覽無餘。
            " />

        <meta name="robots" content="index,follow" />
        <meta name="GOOGLEBOT" content="index,follow" />
        <meta name="Author" content="搞笑" />
        <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE8">
        <link type="image/x-icon" href="/images/chouti.ico" rel="icon"/>
        <link type="image/x-icon" href="/images/chouti.ico" rel="Shortcut Icon"/>
        <link type="image/x-icon" href="/images/chouti.ico" rel="bookmark"/>
    <link type="application/opensearchdescription+xml"
          href="opensearch.xml" title="抽屜新熱榜" rel="search" />

2、須要用戶登陸或者驗證的請求cookie

獲取這種頁面時，咱們首先要了解整個登陸過程，通常登陸過程是，當用戶第一次訪問時，會自動在瀏覽器生成cookie文件，當用戶輸入登陸信息後會攜帶着生成的cookie文件，若是登陸信息正確會給這個cookieapp

受權，受權後之後訪問須要登陸的頁面時攜帶受權後cookie便可post

一、首先訪問一下首頁，而後查看是否有自動生成cookie網站

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模擬瀏覽器請求模塊

### 一、在沒登陸以前訪問一下首頁，獲取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer': 'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #返回獲取到的cookie
#返回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

能夠看到生成了cookie，說明若是登錄信息正確，後臺會給這裏的cookie受權，之後訪問須要登陸的頁面攜帶受權後的cookie便可ui

二、讓程序自動去登陸受權cookie

首先咱們用瀏覽器訪問登陸頁面，隨便亂輸入一下登陸密碼和帳號，獲取登陸頁面url，和登陸所須要的字段

攜帶cookie登陸受權

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模擬瀏覽器請求模塊

### 一、在沒登陸以前訪問一下首頁，獲取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #返回獲取到的cookie
#返回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 二、用戶登錄，攜帶上一次的cookie，後臺對cookie中的隨機字符進行受權
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登陸url
    data={                                          #登陸字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #攜帶cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #查看登陸後服務器的響應
#返回：{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登陸成功

三、登陸成功後，說明後臺已經給cookie受權，這樣咱們訪問須要登陸的頁面時，攜帶這個cookie便可，好比獲取我的中心

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模擬瀏覽器請求模塊

### 一、在沒登陸以前訪問一下首頁，獲取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #返回獲取到的cookie
#返回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 二、用戶登錄，攜帶上一次的cookie，後臺對cookie中的隨機字符進行受權
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登陸url
    data={                                          #登陸字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #攜帶cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #查看登陸後服務器的響應
#返回：{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登陸成功

### 三、訪問須要登陸才能查看的頁面，攜帶着受權後的cookie訪問
shouquan_cookie = i1_cookie
i3 = requests.get(
    url="http://dig.chouti.com/user/link/saved/1",
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=shouquan_cookie                        #攜帶着受權後的cookie訪問
)
i3.encoding = "utf-8"
print(i3.text)                                     #查看須要登陸才能查看的頁面

獲取須要登陸頁面的html源碼成功

所有代碼

get()方法，發送get請求
encoding屬性，設置請求編碼
cookies.get_dict()獲取cookies
post()發送post請求
text獲取服務器響應信息

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模擬瀏覽器請求模塊

### 一、在沒登陸以前訪問一下首頁，獲取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #返回獲取到的cookie
#返回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 二、用戶登錄，攜帶上一次的cookie，後臺對cookie中的隨機字符進行受權
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登陸url
    data={                                          #登陸字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #攜帶cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #查看登陸後服務器的響應
#返回：{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登陸成功

### 三、訪問須要登陸才能查看的頁面，攜帶着受權後的cookie訪問
shouquan_cookie = i1_cookie
i3 = requests.get(
    url="http://dig.chouti.com/user/link/saved/1",
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=shouquan_cookie                        #攜帶着受權後的cookie訪問
)
i3.encoding = "utf-8"
print(i3.text)                                     #查看須要登陸才能查看的頁面

注意：若是登陸須要驗證碼，那就須要作圖像處理，根據驗證碼圖片，識別出驗證碼，將驗證碼寫入登陸字段

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。