驗證碼處理----模擬登錄---cookie的應用和處理

時間 2019-11-24

原文原文鏈接

　　　　　　使用雲打碼平臺識別驗證碼

相關的門戶網站在進行登陸的時候，若是用戶連續登陸的次數超過3次或者5次的時候，就會在登陸頁中動態生成驗證碼。經過驗證碼達到分流和反爬的效果。php

雲打碼平臺處理驗證碼的實現流程：html

雲打碼平臺處理驗證碼的實現流程：python

- 1.對攜帶驗證碼的頁面數據進行抓取
- 2.能夠將頁面數據中驗證碼進行解析，驗證碼圖片下載到本地
- 3.能夠將驗證碼圖片提交給三方平臺進行識別，返回驗證碼圖片上的數據值
    - 雲打碼平臺：
        - 1.在官網中進行註冊（普通用戶和開發者用戶）
        - 2.登陸開發者用戶：
            - 1.實例代碼的下載（開發文檔-》調用實例及最新的DLL-》PythonHTTP實例下載）
            - 2.建立一個軟件：個人軟件-》添加新的軟件
        -3.使用示例代碼中的源碼文件中的代碼進行修改，讓其識別驗證碼圖片中的數據值

代碼展現：　　

import http.client, mimetypes, urllib, json, time, requests



class YDMHttp:

    apiurl = 'http://api.yundama.com/api.php'
    username = ''
    password = ''
    appid = ''
    appkey = ''

    def __init__(self, username, password, appid, appkey):
        self.username = username  
        self.password = password
        self.appid = str(appid)
        self.appkey = appkey

    def request(self, fields, files=[]):
        response = self.post_url(self.apiurl, fields, files)
        response = json.loads(response)
        return response
    
    def balance(self):
        data = {'method': 'balance', 'username': self.username, 'password': self.password, 'appid': self.appid, 'appkey': self.appkey}
        response = self.request(data)
        if (response):
            if (response['ret'] and response['ret'] < 0):
                return response['ret']
            else:
                return response['balance']
        else:
            return -9001
    
    def login(self):
        data = {'method': 'login', 'username': self.username, 'password': self.password, 'appid': self.appid, 'appkey': self.appkey}
        response = self.request(data)
        if (response):
            if (response['ret'] and response['ret'] < 0):
                return response['ret']
            else:
                return response['uid']
        else:
            return -9001

    def upload(self, filename, codetype, timeout):
        data = {'method': 'upload', 'username': self.username, 'password': self.password, 'appid': self.appid, 'appkey': self.appkey, 'codetype': str(codetype), 'timeout': str(timeout)}
        file = {'file': filename}
        response = self.request(data, file)
        if (response):
            if (response['ret'] and response['ret'] < 0):
                return response['ret']
            else:
                return response['cid']
        else:
            return -9001

    def result(self, cid):
        data = {'method': 'result', 'username': self.username, 'password': self.password, 'appid': self.appid, 'appkey': self.appkey, 'cid': str(cid)}
        response = self.request(data)
        return response and response['text'] or ''

    def decode(self, filename, codetype, timeout):
        cid = self.upload(filename, codetype, timeout)
        if (cid > 0):
            for i in range(0, timeout):
                result = self.result(cid)
                if (result != ''):
                    return cid, result
                else:
                    time.sleep(1)
            return -3003, ''
        else:
            return cid, ''

    def report(self, cid):
        data = {'method': 'report', 'username': self.username, 'password': self.password, 'appid': self.appid, 'appkey': self.appkey, 'cid': str(cid), 'flag': '0'}
        response = self.request(data)
        if (response):
            return response['ret']
        else:
            return -9001

    def post_url(self, url, fields, files=[]):
        for key in files:
            files[key] = open(files[key], 'rb');
        res = requests.post(url, files=files, data=fields)
        return res.text


import requests
from lxml import etree
from urllib import request


# 封裝識別驗證碼圖片的函數
def getCodeText(codeType,filePath):
    result = None
        # 普通用戶名
    username    = 'jeremy0820'

    # 密碼
    password    = '0820_ab'                            

    # 軟件ＩＤ，開發者分紅必要參數。登陸開發者後臺【個人軟件】得到！
    appid       = 6003                                    

    # 軟件密鑰，開發者分紅必要參數。登陸開發者後臺【個人軟件】得到！
    appkey      = '1f4b564483ae5c907a1d34f8e2f2776c'    

    # 圖片文件
    filename    = filePath                    

    # 驗證碼類型，# 例：1004表示4位字母數字，不一樣類型收費不一樣。請準確填寫，不然影響識別率。在此查詢全部類型 http://www.yundama.com/price.html
    codetype    = codeType

    # 超時時間，秒
    timeout     = 30                                    

    # 檢查
    if (username == 'username'):
        print('請設置好相關參數再測試')
    else:
        # 初始化
        yundama = YDMHttp(username, password, appid, appkey)

        # 登錄雲打碼
        uid = yundama.login();
        print('uid: %s' % uid)

        # 查詢餘額
        balance = yundama.balance();
        print('balance: %s' % balance)

        # 開始識別，圖片路徑，驗證碼類型ID，超時時間（秒），識別結果
        cid, result = yundama.decode(filename, codetype, timeout);
        print('cid: %s, result: %s' % (cid, result))
    return result

模擬登錄

url = 'http://www.renren.com/'
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}

page_text = requests.get(url=url,headers=headers,proxies={'http':'121.8.98.196:80'}).text
# 解析出驗證碼圖片的地址
tree = etree.HTML(page_text)

code_img_url = tree.xpath('//*[@id="verifyPic_login"]/@src')[0]
# 保存圖片
request.urlretrieve(url=code_img_url,filename='./code.jpg')

# 使用打碼平臺識別驗證碼
code_text = getCodeText(2004,'./code.jpg')
print(code_text)

#模擬登錄
#抓不到就是時間戳超時
login_url = 'http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2019402136904'

data = {
    "email": "www.zhangbowudi@qq.com",
    "icode": code_text,
    "origURL": "http://www.renren.com/home",
    "domain": "renren.com",
    "key_id": "1",
    "captcha_type": "web_login",
    "password": "b5b7cc084ec2c8b2fa9ec88ebb55dddb07ce2809f14e98db78ddcfa7159b8ae2",
    "rkey": "449e2cdaaefe6364b26d5b62baab86f5",
    "f": "http%3A%2F%2Fwww.renren.com%2F970683046",
}


#建立一個會話對象
session = requests.Session()
# 產生cookie

response = requests.post(url=login_url,headers=headers,data=data,proxies={'http':'121.8.98.196:80'})
# print(response.status_code)

# page_text = response.text
# print(page_text)

# 該次請求發送必須攜帶cookie
detail_url = 'http://www.renren.com/970683046/profile'
requests.get(url=detail_url,headers=headers).text

　cookie的應用和處理　

- cookie:服務器端記錄客戶端的相關狀態
- 處理cookie的方式：
    - 手動處理：不建議
    - 自動處理：回話對象Session，該對象能夠像requests模塊同樣進行網絡請求的發送（get，post）。session進行的請求發送能夠自動攜帶和處理cookie。

#基於cookie的案例分析：https://xueqiu.com/
#1.從首頁中獲取詳情頁的url
    #發現：首頁中的新聞數據是動態加載出來（ajax） json數據中taget對應的value值就是詳情頁的url
import requests
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
#自動獲取cookie，cookie就會自動存儲到session中
session = requests.Session()
session.get('https://xueqiu.com/',headers=headers)

#捕獲ajax數據包獲取的url
url = 'https://xueqiu.com/v4/statuses/public_timeline_by_category.json?since_id=-1&max_id=-1&count=10&category=-1'
#攜帶cookie進行的請求發送
dic_json = session.get(url=url,headers=headers).json()
print(dic_json)
#從響應數據中獲取詳情頁的url
# for dic in dic_json['list']:
# #     print(dic)
#     d = dic['data']
#     detail_url = 'https://xueqiu.com'+d['target']
#     print(detail_url)

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。