爬蟲—GEETEST滑動驗證碼識別

時間 2019-12-07

標籤爬蟲 geetest 滑動驗證碼識別欄目網絡爬蟲简体版

原文原文鏈接

1、準備工做

　　本次使用Selenium，瀏覽器爲Chrome，並配置好ChromDriverweb

2、分析

　　1.模擬點擊驗證按鈕：能夠直接使用Selenium完成。chrome

　 2.識別滑塊的缺口位置：先觀察圖片中缺口的位置以及周圍邊緣，利用原圖與其對比檢測來識別缺口位置。canvas

　　　　同時獲取原圖與缺口圖片，設定一個對比閥值，而後對兩張圖片進行遍歷，找出相同位置像素RGB差距超過此閥值的像素點。即缺口的位置瀏覽器

　　3.模擬拖動滑塊：極驗增長了機械軌跡識別與速度檢測，只有徹底模擬人的操做才能經過驗證。運動軌跡通常先加速，而後在減速。app

三，代碼實現

　　1.初始化ide

　　　　使用魅族登錄註冊頁面進行測試，https://i.flyme.cn/register?。在這裏先初始化一些配置測試

import time
from io import BytesIO
from PIL import Image
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


phone = '18888888888'


class GeetestSpider():

    def __init__(self):
        self.url = "https://i.flyme.cn/register?"
        self.browser =webdriver.Chrome(executable_path=r'D:\Google\Chrome\Application\chromedriver')
        self.wait = WebDriverWait(self.browser, 20)
        self.phone = phone

　　2.模擬點擊　　網站

    def get_button(self):
        """
        獲取初始驗證按鈕，模擬點擊
        :return:按鈕對象
        """
        button = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'geetest_radar_tip')))
        return button

　　3.識別缺口ui

　　獲取先後兩張比對圖片，不一致的地方即爲缺口位置。利用Selenium獲取圖片元素，獲得位置和寬高，而後獲取整個網頁的截圖，再將圖片剪裁出來便可。　　url

    def get_screen_image(self):
        """
        獲取網頁截圖
        :return: 截圖對象
        """
        screen_img = self.browser.get_screenshot_as_png()
        screen_img = Image.open(BytesIO(screen_img))
        return screen_img

    def get_position(self):
        """
        獲取驗證碼的位置
        :return: 驗證碼位置元祖
        """
        img = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'geetest_canvas_img')))
        time.sleep(2)
        location = img.location
        size = img.size
        top, bottom, left, right = location['y'], location['y'] + size['height'], location['x'], location['x'] + size[
            'width']
        return top, bottom, left, right

    def get_geetest_img(self, name='captcha.png'):
        """
        獲取驗證碼圖片
        :return: 圖片對象，Image對象
        """
        top, bottom, left, right = self.get_position()
        print('驗證碼位置：', top, bottom, left, right)
        screen_img = self.get_screen_image()
        # 剪裁圖片
        captcha = screen_img.crop((left, top, right, bottom))
　　　　 captcha.save(name)
        return captcha

    def get_slider(self):
        """
        獲取滑塊
        :return:滑塊對象
        """
        slider = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'geetest_slider_button')))
        return slider

　　獲取滑塊slider以後，調用click（）方法便可觸發點擊，缺口圖片就好呈現出來。再調用get_geetest_img（）方法將獲取第二章圖片。

　　將獲取到的兩張圖片分別賦值給img1，img2。進行圖片對比，遍歷圖片的每一個座標點，獲取兩張圖片對應像素點的RGB數據。若是兩者的RGB數據差在必定的範圍內，則表明兩個像素相同，繼續進行下一個點的對比。若是差距超過必定範圍，則表明像素點不相同，即缺口的位置。

    def equal_rgb(self, img1, img2, x, y):
        """
        判斷兩個像素點是否相同
        :param img1: 圖片1
        :param img2: 圖片2
        :param x: 位置x
        :param y: 位置y
        :return: 是否相同
        """
        # 獲取兩張圖片的像素點
        pixel1 = img1.load()[x, y]
        pixel2 = img2.load()[x, y]
        threshold = 60
        if abs(pixel1[0] - pixel2[0]) < threshold and abs(pixel1[1] - pixel2[1]) < threshold and abs(
                pixel1[2] - pixel2[2]) < threshold:
            return True
        else:
            return False

    def get_gap(self, img1, img2):
        """
        獲取偏移量
        :param img1:不帶缺口的圖片
        :param img2: 帶缺口的圖片
        :return: 偏移量
        """
        left = 60
        for i in range(left, img1.size[0]):
            for j in range(img1.size[1]):
                if not self.equal_rgb(img1,img2,i,j):
                    left = i
                    return left
        return left

　　4.模擬拖動滑塊

　　爲了模擬人的操做，滑塊的拖動速度不該該是勻速，也不該該是保持在某一速度上下抖動。滑塊的速度變化應該是一開始在加速，接近缺口就會減速，想象一下咱們手動拖動滑塊的狀況。

　　滑塊的加速度用a表示，當前速度用v表示，初始速度用v0表示，移動距離用x表示，運動時間t表示，知足如下關係：

　　x = v0 * t + a * t * t / 2

　　v = v0 + a * t　　

    def get_track(self, d):
        """
        根據偏移量獲取運動軌跡
        :param d: 偏移量
        :return: 運動軌跡,每次移動距離
        """
        # 運動軌跡
        track = []
        # 當前位移
        current = 0
        # 開始減速的偏移量,設位移達到偏移量的2/3時開始減速
        deviation = d * 2 / 3
        # 　間隔時間
        t = 0.2
        # 初始速度
        v = 0

        while current < d:
            if current < deviation:
                # 加速階段
                a = 2
            else:
                # 減速階段
                a = -2

            # 初始速度
            v0 = v
            # 當前速度
            v = v0 + a * t
            # 位移
            move = v0 * t + a * t * t / 2
            # 當前位移
            current += move
            # 添加到軌跡，保留整數
            track.append(round(move))
        return track

    def move_slider(self, slider, track):
        """
        按照軌跡移動滑塊至缺口
        :param slider:滑塊
        :param track: 軌跡
        :return:
        """
        # 鼠標按住滑塊
        ActionChains(self.browser).click_and_hold(slider).perform()
        for i in track:
            # 遍歷軌跡元素，每次移動對應位移
            ActionChains(self.browser).move_by_offset(xoffset=i, yoffset=0).perform()
        time.sleep(0.5)
        # 移動完成後，鬆開鼠標
        ActionChains(self.browser).release().perform()

　　驗證成功：

5.完整代碼

# _*_ coding=utf-8 _*_


import time
from io import BytesIO
from PIL import Image
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


# 初始化
class GeetestSpider():
    def __init__(self):
        self.url = "https://i.flyme.cn/register"
        self.browser = webdriver.Chrome(executable_path=r'D:\Google\Chrome\Application\chromedriver')
        self.wait = WebDriverWait(self.browser, 20)

    def get_button(self):
        """
        獲取初始驗證按鈕，模擬點擊
        :return:按鈕對象
        """
        button = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'geetest_radar_tip_content')))
        return button

    def get_screen_image(self):
        """
        獲取網頁截圖
        :return: 截圖對象
        """
        screen_img = self.browser.get_screenshot_as_png()
        screen_img = Image.open(BytesIO(screen_img))
        return screen_img

    def get_position(self):
        """
        獲取驗證碼的位置
        :return: 驗證碼位置元祖
        """
        img = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'geetest_canvas_img')))
        time.sleep(2)
        location = img.location
        size = img.size
        top, bottom, left, right = location['y'], location['y'] + size['height'], location['x'], location['x'] + size[
            'width']
        return top, bottom, left, right

    def get_geetest_img(self, name='captcha.png'):
        """
        獲取驗證碼圖片
        :return: 圖片對象,Image對象
        """
        top, bottom, left, right = self.get_position()
        print('驗證碼位置：', top, bottom, left, right)
        screen_img = self.get_screen_image()
        # 剪裁圖片
        captcha = screen_img.crop((left, top, right, bottom))
        captcha.save(name)
        return captcha

    def get_slider(self):
        """
        獲取滑塊
        :return:滑塊對象
        """
        slider = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'geetest_slider_button')))
        return slider

    def equal_rgb(self, img1, img2, x, y):
        """
        判斷兩個像素點是否相同
        :param img1: 圖片1
        :param img2: 圖片2
        :param x: 位置x
        :param y: 位置y
        :return: 是否相同
        """
        # 獲取兩張圖片的像素點
        pixel1 = img1.load()[x, y]
        pixel2 = img2.load()[x, y]
        threshold = 60
        if abs(pixel1[0] - pixel2[0]) < threshold and abs(pixel1[1] - pixel2[1]) < threshold and abs(
                pixel1[2] - pixel2[2]) < threshold:
            return True
        else:
            return False

    def get_gap(self, img1, img2):
        """
        獲取偏移量
        :param img1:不帶缺口的圖片
        :param img2: 帶缺口的圖片
        :return: 偏移量
        """
        # 直接從滑塊的右側開始遍歷
        left = 60
        for i in range(left, img1.size[0]):
            for j in range(img1.size[1]):
                if not self.equal_rgb(img1, img2, i, j):
                    left = i
                    return left
        return left

    def get_track(self, d):
        """
        根據偏移量獲取運動軌跡
        :param d: 偏移量
        :return: 運動軌跡,每次移動距離
        """
        # 運動軌跡
        track = []
        # 當前位移
        current = 0
        # 開始減速的偏移量,設位移達到偏移量的2/3時開始減速
        deviation = d * 2 / 3
        # 　間隔時間
        t = 0.2
        # 初始速度
        v = 0

        while current < d:
            if current < deviation:
                # 加速階段
                a = 2
            else:
                # 減速階段
                a = -2

            # 初始速度
            v0 = v
            # 當前速度
            v = v0 + a * t
            # 位移
            move = v0 * t + a * t * t / 2
            # 當前位移
            current += move
            # 添加到軌跡，保留整數
            track.append(round(move))
        return track

    def move_slider(self, slider, track):
        """
        按照軌跡移動滑塊至缺口
        :param slider:滑塊
        :param track: 軌跡
        :return:
        """
        # 鼠標按住滑塊
        ActionChains(self.browser).click_and_hold(slider).perform()
        for i in track:
            # 遍歷軌跡元素，每次移動對應位移
            ActionChains(self.browser).move_by_offset(xoffset=i, yoffset=0).perform()
        time.sleep(0.5)
        # 移動完成後，鬆開鼠標
        ActionChains(self.browser).release().perform()

    def crack(self):
        """
        模擬驗證的各類操做
        :return:None
        """
        # 打開網站，輸入註冊手機號
        self.browser.get(self.url)
        self.wait.until(EC.presence_of_element_located((By.ID, 'phone')))
        # 點擊驗證
        button = self.get_button()
        button.click()
        # 獲取驗證碼圖片
        img1 = self.get_geetest_img('captcha1.png')
        # 獲取滑塊
        slider = self.get_slider()
        slider.click()
        # 獲取帶缺口的圖片
        img2 = self.get_geetest_img('captcha2.png')
        # 獲取缺口位置
        gap = self.get_gap(img1, img2)
        print("缺口位置：", gap)
        # 減去缺口位移
        gap -= 6
        # 獲取軌跡
        track = self.get_track(gap)
        print("軌跡：", track)
        # 拖動滑塊
        self.move_slider(slider, track)

        success = self.wait.until(
            EC.text_to_be_present_in_element((By.ID, 'geetest_success_radar_tip_content'), '驗證成功'))

        # 失敗重試
        if not success:
            self.crack()


if __name__ == '__main__':
    crack = GeetestSpider()
    crack.crack()