爬蟲進階教程:極驗(GEETEST)驗證碼破解教程

原文連接及原做者:爬蟲進階教程:極驗(GEETEST)驗證碼破解教程 | Jack Cuihtml

圖片描述

1、前言

爬蟲最大的敵人之一是什麼?沒錯,驗證碼![Geetest]做爲提供驗證碼服務的行家,市場佔有率仍是蠻高的。遇到Geetest提供的滑動驗證碼怎麼破?python

一種方法是分析它的js加密方法,經過大量抓包分析找到它的返回參數,直接自動生成須要的參數便可,這種方法工程量大一些,而且官方js腳本一升級,就得從新分析,耗時耗力。git

今天爲你們介紹的一種方法是,經過Selenium模擬用戶滑動解鎖。這個方法的優點在於簡單,方便更新。可是它的缺點也很明顯,速度慢,而且不能製做成api接口的形式。程序員

授人予魚不如授人予漁,接下來就爲你們呈現本教程的精彩內容。不過,在閱讀本篇文章以前,請確保你已經掌握網絡爬蟲基礎,若是不具有爬蟲基礎,請到個人CSDN專欄學習。而後,再來閱讀本文,個人專欄地址:點我查看github

2、先睹爲快

圖片描述

左側顯示的爲自動識別過程,右邊是一些打印信息。web

3、實戰分析

咱們以國家企業信用信息公式系統爲例,這是一個企業信息查詢的網站,在每次查詢都須要進行一次驗證碼識別。它所使用的就是GEETEST驗證碼,它的URL:點我查看chrome

這個網站是這個樣子的:json

圖片描述

一、過程分析

要想把大象裝冰箱,總共分幾步?api

  1. 把冰箱門打開
  2. 把大象賽冰箱裏
  3. 把冰箱門關上

那麼,如今思考一個問題,經過Selenium模擬用戶滑動解鎖,總共分幾步?請停在這裏,思考五分鐘,再繼續閱讀!瀏覽器

咱們先公佈一個粗率的答案:

  1. 使用Selenium打開頁面。
  2. 匹配到輸入框,輸入要查詢的信息,並點擊查詢按鈕。
  3. 讀取驗證碼圖片,並作缺口識別。
  4. 根據缺口位置,計算滑動距離。
  5. 根據滑動距離,拖拽滑塊到須要匹配的位置。

其實,將每一個步驟拆分開來一點一點實現並不難,接下來進入正文。

二、實戰開始

第一步:使用Selenium打開網頁,並輸入信息,點擊查詢按鈕。

這部份內容很簡單,Selenium基礎性的東西我再也不講解,若有不懂,請看我專欄的Selenium相關內容。

編寫代碼以下:

# -*-coding:utf-8 -*-
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium import webdriver
class Crack():
    def __init__(self,keyword):
        self.url = 'http://bj.gsxt.gov.cn/sydq/loginSydqAction!sydq.dhtml';
        self.browser = webdriver.Chrome('D:\\chromedriver.exe')
        self.wait = WebDriverWait(self.browser, 100)
        self.keyword = keyword
    def open(self):
        """
        打開瀏覽器,並輸入查詢內容
        """
        self.browser.get(self.url)
        keyword = self.wait.until(EC.presence_of_element_located((By.ID, 'keyword_qycx')))
        bowton = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'btn')))
        keyword.send_keys(self.keyword)
        bowton.click()
    def crack(self):
        # 打開瀏覽器
        self.open()
if __name__ == '__main__':
    print('開始驗證')
    crack = Crack(u'中國移動')
    crack.crack()

運行效果以下:

圖片描述

第二步:保存驗證碼圖片

咱們審查元素找打圖片的地址,審查結果以下:

圖片描述

能夠看到,圖片是不少圖片合成的,也就是說你只保存全部地址的圖片是不行的。它是經過background-position的方法進行合成的。每個圖片是亂的,這個怎麼搞?很簡單,抓取這些圖片的連接,而後根據連接的圖片,再合成這張沒有缺口的圖片,獲取缺口圖的方法也是如此,都是本身合成。

編寫代碼以下:

# -*-coding:utf-8 -*-
import time, random
import PIL.Image as image
from io import BytesIO
from PIL import Image
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests, json, re, urllib
from bs4 import BeautifulSoup
from urllib.request import urlretrieve
class Crack():
    def __init__(self,keyword):
        self.url = 'http://bj.gsxt.gov.cn/sydq/loginSydqAction!sydq.dhtml';
        self.browser = webdriver.Chrome('D:\\chromedriver.exe')
        self.wait = WebDriverWait(self.browser, 100)
        self.keyword = keyword
        self.BORDER = 6
    def __del__(self):
        time.sleep(2)
        self.browser.close()
    def get_screenshot(self):
        """
        獲取網頁截圖
        :return: 截圖對象
        """
        screenshot = self.browser.get_screenshot_as_png()
        screenshot = Image.open(BytesIO(screenshot))
        return screenshot
    def get_position(self):
        """
        獲取驗證碼位置
        :return: 驗證碼位置元組
        """
        img = self.browser.find_element_by_class_name("gt_box")
        time.sleep(2)
        location = img.location
        size = img.size
        top, bottom, left, right = location['y'], location['y'] + size['height'], location['x'], location['x'] + size['width']
        return (top, bottom, left, right)
    def get_image(self, name='captcha.png'):
        """
        獲取驗證碼圖片
        :return: 圖片對象
        """
        top, bottom, left, right = self.get_position()
        print('驗證碼位置', top, bottom, left, right)
        screenshot = self.get_screenshot()
        captcha = screenshot.crop((left, top, right, bottom))
        captcha.save(name)
        return captcha
    def get_images(self, bg_filename = 'bg.jpg', fullbg_filename = 'fullbg.jpg'):
        """
        獲取驗證碼圖片
        :return: 圖片的location信息
        """
        bg = []
        fullgb = []
        while bg == [] and fullgb == []:
            bf = BeautifulSoup(self.browser.page_source, 'lxml')
            bg = bf.find_all('div', class_ = 'gt_cut_bg_slice')
            fullgb = bf.find_all('div', class_ = 'gt_cut_fullbg_slice')
        bg_url = re.findall('url\(\"(.*)\"\);', bg[0].get('style'))[0].replace('webp', 'jpg')
        fullgb_url = re.findall('url\(\"(.*)\"\);', fullgb[0].get('style'))[0].replace('webp', 'jpg')
        bg_location_list = []
        fullbg_location_list = []
        for each_bg in bg:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][6])
            bg_location_list.append(location)
        for each_fullgb in fullgb:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][7])
            fullbg_location_list.append(location)
        urlretrieve(url = bg_url, filename = bg_filename)
        print('缺口圖片下載完成')
        urlretrieve(url = fullgb_url, filename = fullbg_filename)
        print('背景圖片下載完成')
        return bg_location_list, fullbg_location_list
    def get_merge_image(self, filename, location_list):
        """
        根據位置對圖片進行合併還原
        :filename:圖片
        :location_list:圖片位置
        """
        im = image.open(filename)
        new_im = image.new('RGB', (260,116))
        im_list_upper=[]
        im_list_down=[]
        for location in location_list:
            if location['y']==-58:
                im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x'])+10,166)))
            if location['y']==0:
                im_list_down.append(im.crop((abs(location['x']),0,abs(location['x'])+10,58)))
        new_im = image.new('RGB', (260,116))
        x_offset = 0
        for im in im_list_upper:
            new_im.paste(im, (x_offset,0))
            x_offset += im.size[0]
        x_offset = 0
        for im in im_list_down:
            new_im.paste(im, (x_offset,58))
            x_offset += im.size[0]
        new_im.save(filename)
        return new_im
    def open(self):
        self.browser.get(self.url)
        keyword = self.wait.until(EC.presence_of_element_located((By.ID, 'keyword_qycx')))
        bowton = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'btn')))
        keyword.send_keys(self.keyword)
        bowton.click()
    def get_slider(self):
        """
        獲取滑塊
        :return: 滑塊對象
        """
        while True:
            try:
                slider = self.browser.find_element_by_xpath("//div[@class='gt_slider_knob gt_show']")
                break
            except:
                time.sleep(0.5)
        return slider
    def get_gap(self, img1, img2):
        """
        獲取缺口偏移量
        :param img1: 不帶缺口圖片
        :param img2: 帶缺口圖片
        :return:
        """
        left = 43
        for i in range(left, img1.size[0]):
            for j in range(img1.size[1]):
                if not self.is_pixel_equal(img1, img2, i, j):
                    left = i
                    return left
        return left    
    
    def is_pixel_equal(self, img1, img2, x, y):
        """
        判斷兩個像素是否相同
        :param image1: 圖片1
        :param image2: 圖片2
        :param x: 位置x
        :param y: 位置y
        :return: 像素是否相同
        """
        # 取兩個圖片的像素點
        pix1 = img1.load()[x, y]
        pix2 = img2.load()[x, y]
        threshold = 60
        if (abs(pix1[0] - pix2[0] < threshold) and abs(pix1[1] - pix2[1] < threshold) and abs(pix1[2] - pix2[2] < threshold)):
            return True
        else:
            return False
    def get_track(self, distance):
        """
        根據偏移量獲取移動軌跡
        :param distance: 偏移量
        :return: 移動軌跡
        """
        # 移動軌跡
        track = []
        # 當前位移
        current = 0
        # 減速閾值
        mid = distance * 4 / 5
        # 計算間隔
        t = 0.2
        # 初速度
        v = 0
        
        while current < distance:
            if current < mid:
                # 加速度爲正2
                a = 2
            else:    
                # 加速度爲負3
                a = -3
            # 初速度v0
            v0 = v
            # 當前速度v = v0 + at
            v = v0 + a * t
            # 移動距離x = v0t + 1/2 * a * t^2
            move = v0 * t + 1 / 2 * a * t * t
            # 當前位移
            current += move
            # 加入軌跡
            track.append(round(move))
        return track
    def move_to_gap(self, slider, track):
        """
        拖動滑塊到缺口處
        :param slider: 滑塊
        :param track: 軌跡
        :return:
        """
        ActionChains(self.browser).click_and_hold(slider).perform()
        while track:
            x = random.choice(track)
            ActionChains(self.browser).move_by_offset(xoffset=x, yoffset=0).perform()
            track.remove(x)
        time.sleep(0.5)
        ActionChains(self.browser).release().perform()
    def crack(self):
        # 打開瀏覽器
        self.open()
        
        
        # 保存的圖片名字
        bg_filename = 'bg.jpg'
        fullbg_filename = 'fullbg.jpg'
        # 獲取圖片
        bg_location_list, fullbg_location_list = self.get_images(bg_filename, fullbg_filename)
        # 根據位置對圖片進行合併還原
        bg_img = self.get_merge_image(bg_filename, bg_location_list)
        fullbg_img = self.get_merge_image(fullbg_filename, fullbg_location_list)
        
        # 點按呼出缺口
        slider = self.get_slider()
        
        # 獲取缺口位置
        gap = self.get_gap(fullbg_img, bg_img)
        print('缺口位置', gap)
        track = self.get_track(gap-self.BORDER)
        print('滑動滑塊')
        print(track)
        self.move_to_gap(slider, track)
if __name__ == '__main__':
    print('開始驗證')
    crack = Crack(u'中國移動')
    crack.crack()
    print('驗證成功')

運行效果以下:

圖片描述

能夠看到,運行以後,咱們已經順利生成了兩張圖片,一個是缺口圖,另外一個是非缺口圖。

第三步:計算缺口距離

根據缺口圖和非缺口圖,經過比對圖像的像素點的大小區別,找到缺口位置。

編寫代碼以下:

# -*-coding:utf-8 -*-
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from urllib.request import urlretrieve
from selenium import webdriver
from bs4 import BeautifulSoup
import PIL.Image as image
import re
 
class Crack():
    def __init__(self,keyword):
        self.url = 'http://bj.gsxt.gov.cn/sydq/loginSydqAction!sydq.dhtml'
        self.browser = webdriver.Chrome('D:\\chromedriver.exe')
        self.wait = WebDriverWait(self.browser, 100)
        self.keyword = keyword
 
    def open(self):
        """
        打開瀏覽器,並輸入查詢內容
        """
        self.browser.get(self.url)
        keyword = self.wait.until(EC.presence_of_element_located((By.ID, 'keyword_qycx')))
        bowton = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'btn')))
        keyword.send_keys(self.keyword)
        bowton.click()
 
    def get_images(self, bg_filename = 'bg.jpg', fullbg_filename = 'fullbg.jpg'):
        """
        獲取驗證碼圖片
        :return: 圖片的location信息
        """
        bg = []
        fullgb = []
        while bg == [] and fullgb == []:
            bf = BeautifulSoup(self.browser.page_source, 'lxml')
            bg = bf.find_all('div', class_ = 'gt_cut_bg_slice')
            fullgb = bf.find_all('div', class_ = 'gt_cut_fullbg_slice')
        bg_url = re.findall('url\(\"(.*)\"\);', bg[0].get('style'))[0].replace('webp', 'jpg')
        fullgb_url = re.findall('url\(\"(.*)\"\);', fullgb[0].get('style'))[0].replace('webp', 'jpg')
        bg_location_list = []
        fullbg_location_list = []
        for each_bg in bg:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][9])
            bg_location_list.append(location)
        for each_fullgb in fullgb:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][10])
            fullbg_location_list.append(location)
 
        urlretrieve(url = bg_url, filename = bg_filename)
        print('缺口圖片下載完成')
        urlretrieve(url = fullgb_url, filename = fullbg_filename)
        print('背景圖片下載完成')
        return bg_location_list, fullbg_location_list
 
    def get_merge_image(self, filename, location_list):
        """
        根據位置對圖片進行合併還原
        :filename:圖片
        :location_list:圖片位置
        """
        im = image.open(filename)
        new_im = image.new('RGB', (260,116))
        im_list_upper=[]
        im_list_down=[]
 
        for location in location_list:
            if location['y'] == -58:
                im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x']) + 10, 166)))
            if location['y'] == 0:
                im_list_down.append(im.crop((abs(location['x']),0,abs(location['x']) + 10, 58)))
 
        new_im = image.new('RGB', (260,116))
 
        x_offset = 0
        for im in im_list_upper:
            new_im.paste(im, (x_offset,0))
            x_offset += im.size[0]
 
        x_offset = 0
        for im in im_list_down:
            new_im.paste(im, (x_offset,58))
            x_offset += im.size[0]
 
        new_im.save(filename)
 
        return new_im
 
    def get_merge_image(self, filename, location_list):
        """
        根據位置對圖片進行合併還原
        :filename:圖片
        :location_list:圖片位置
        """
        im = image.open(filename)
        new_im = image.new('RGB', (260,116))
        im_list_upper=[]
        im_list_down=[]
 
        for location in location_list:
            if location['y']==-58:
                im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x'])+10,166)))
            if location['y']==0:
                im_list_down.append(im.crop((abs(location['x']),0,abs(location['x'])+10,58)))
 
        new_im = image.new('RGB', (260,116))
 
        x_offset = 0
        for im in im_list_upper:
            new_im.paste(im, (x_offset,0))
            x_offset += im.size[0]
 
        x_offset = 0
        for im in im_list_down:
            new_im.paste(im, (x_offset,58))
            x_offset += im.size[0]
 
        new_im.save(filename)
 
        return new_im
 
    def get_gap(self, img1, img2):
        """
        獲取缺口偏移量
        :param img1: 不帶缺口圖片
        :param img2: 帶缺口圖片
        :return:
        """
        left = 43
        for i in range(left, img1.size[0]):
            for j in range(img1.size[1]):
                if not self.is_pixel_equal(img1, img2, i, j):
                    left = i
                    return left
        return left   
 
    def crack(self):
        # 打開瀏覽器
        self.open()
 
        # 保存的圖片名字
        bg_filename = 'bg.jpg'
        fullbg_filename = 'fullbg.jpg'
 
        # 獲取圖片
        bg_location_list, fullbg_location_list = self.get_images(bg_filename, fullbg_filename)
 
        # 根據位置對圖片進行合併還原
        bg_img = self.get_merge_image(bg_filename, bg_location_list)
        fullbg_img = self.get_merge_image(fullbg_filename, fullbg_location_list)
 
        # 獲取缺口位置
        gap = self.get_gap(fullbg_img, bg_img)
        print('缺口位置', gap)
 
if __name__ == '__main__':
    print('開始驗證')
    crack = Crack(u'中國移動')
    crack.crack()

運行結果以下:

圖片描述

這樣咱們就計算除了缺口位置,接下來就是根據缺口位置,滑動滑塊到相應位置。

第四步:計算滑動軌跡

咱們可使用瞬間移動,直接在1s內移動到目標位置,結果就是"被吃了"。

圖片描述

勻速直線運動,勻速直線運動大法好!果不其然,仍是"被吃了",繼續嘗試。

圖片描述

模仿抖抖病患者運動,顫顫巍巍,如履薄冰,估計geetest服務器認爲是我外婆在操做吧。

圖片描述

然這個方法偶爾會成功,但成功率極低。最好的方法是什麼呢?

模擬人的運動!你想一下,人在滑動滑塊的初期是否是速度快,可是當要接近缺口位置的時候,會減速,由於我得對準缺口位置啊!這怎麼實現呢?使用咱們初中學過的物理知識:

當前速度公式爲:

v = v0 + a * t

其中,v是當前速度,v0是初始速度,a是加速度,t是時間。咱們剛開始的讓加速大,當過了中間位置,下降加速度。使用這個移動過程,移動滑塊到缺口位置。

編寫代碼以下:

# -*-coding:utf-8 -*-
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from urllib.request import urlretrieve
from selenium import webdriver
from bs4 import BeautifulSoup
import PIL.Image as image
import re
 
class Crack():
    def __init__(self,keyword):
        self.url = 'http://bj.gsxt.gov.cn/sydq/loginSydqAction!sydq.dhtml'
        self.browser = webdriver.Chrome('D:\\chromedriver.exe')
        self.wait = WebDriverWait(self.browser, 100)
        self.keyword = keyword
        self.BORDER = 6
 
    def open(self):
        """
        打開瀏覽器,並輸入查詢內容
        """
        self.browser.get(self.url)
        keyword = self.wait.until(EC.presence_of_element_located((By.ID, 'keyword_qycx')))
        bowton = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'btn')))
        keyword.send_keys(self.keyword)
        bowton.click()
 
    def get_images(self, bg_filename = 'bg.jpg', fullbg_filename = 'fullbg.jpg'):
        """
        獲取驗證碼圖片
        :return: 圖片的location信息
        """
        bg = []
        fullgb = []
        while bg == [] and fullgb == []:
            bf = BeautifulSoup(self.browser.page_source, 'lxml')
            bg = bf.find_all('div', class_ = 'gt_cut_bg_slice')
            fullgb = bf.find_all('div', class_ = 'gt_cut_fullbg_slice')
        bg_url = re.findall('url\(\"(.*)\"\);', bg[0].get('style'))[0].replace('webp', 'jpg')
        fullgb_url = re.findall('url\(\"(.*)\"\);', fullgb[0].get('style'))[0].replace('webp', 'jpg')
        bg_location_list = []
        fullbg_location_list = []
        for each_bg in bg:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][15])
            bg_location_list.append(location)
        for each_fullgb in fullgb:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][16])
            fullbg_location_list.append(location)
 
        urlretrieve(url = bg_url, filename = bg_filename)
        print('缺口圖片下載完成')
        urlretrieve(url = fullgb_url, filename = fullbg_filename)
        print('背景圖片下載完成')
        return bg_location_list, fullbg_location_list
 
    def get_merge_image(self, filename, location_list):
        """
        根據位置對圖片進行合併還原
        :filename:圖片
        :location_list:圖片位置
        """
        im = image.open(filename)
        new_im = image.new('RGB', (260,116))
        im_list_upper=[]
        im_list_down=[]
 
        for location in location_list:
            if location['y'] == -58:
                im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x']) + 10, 166)))
            if location['y'] == 0:
                im_list_down.append(im.crop((abs(location['x']),0,abs(location['x']) + 10, 58)))
 
        new_im = image.new('RGB', (260,116))
 
        x_offset = 0
        for im in im_list_upper:
            new_im.paste(im, (x_offset,0))
            x_offset += im.size[0]
 
        x_offset = 0
        for im in im_list_down:
            new_im.paste(im, (x_offset,58))
            x_offset += im.size[0]
 
        new_im.save(filename)
 
        return new_im
 
    def get_merge_image(self, filename, location_list):
        """
        根據位置對圖片進行合併還原
        :filename:圖片
        :location_list:圖片位置
        """
        im = image.open(filename)
        new_im = image.new('RGB', (260,116))
        im_list_upper=[]
        im_list_down=[]
 
        for location in location_list:
            if location['y']==-58:
                im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x'])+10,166)))
            if location['y']==0:
                im_list_down.append(im.crop((abs(location['x']),0,abs(location['x'])+10,58)))
 
        new_im = image.new('RGB', (260,116))
 
        x_offset = 0
        for im in im_list_upper:
            new_im.paste(im, (x_offset,0))
            x_offset += im.size[0]
 
        x_offset = 0
        for im in im_list_down:
            new_im.paste(im, (x_offset,58))
            x_offset += im.size[0]
 
        new_im.save(filename)
 
        return new_im
 
    def is_pixel_equal(self, img1, img2, x, y):
        """
        判斷兩個像素是否相同
        :param image1: 圖片1
        :param image2: 圖片2
        :param x: 位置x
        :param y: 位置y
        :return: 像素是否相同
        """
        # 取兩個圖片的像素點
        pix1 = img1.load()[x, y]
        pix2 = img2.load()[x, y]
        threshold = 60
        if (abs(pix1[0] - pix2[0] < threshold) and abs(pix1[1] - pix2[1] < threshold) and abs(pix1[2] - pix2[2] < threshold)):
            return True
        else:
            return False
 
    def get_gap(self, img1, img2):
        """
        獲取缺口偏移量
        :param img1: 不帶缺口圖片
        :param img2: 帶缺口圖片
        :return:
        """
        left = 43
        for i in range(left, img1.size[0]):
            for j in range(img1.size[1]):
                if not self.is_pixel_equal(img1, img2, i, j):
                    left = i
                    return left
        return left   
 
    def get_track(self, distance):
        """
        根據偏移量獲取移動軌跡
        :param distance: 偏移量
        :return: 移動軌跡
        """
        # 移動軌跡
        track = []
        # 當前位移
        current = 0
        # 減速閾值
        mid = distance * 4 / 5
        # 計算間隔
        t = 0.2
        # 初速度
        v = 0
 
        while current < distance:
            if current < mid:
                # 加速度爲正2
                a = 2
            else:   
                # 加速度爲負3
                a = -3
            # 初速度v0
            v0 = v
            # 當前速度v = v0 + at
            v = v0 + a * t
            # 移動距離x = v0t + 1/2 * a * t^2
            move = v0 * t + 1 / 2 * a * t * t
            # 當前位移
            current += move
            # 加入軌跡
            track.append(round(move))
        return track
 
    def crack(self):
        # 打開瀏覽器
        self.open()
 
        # 保存的圖片名字
        bg_filename = 'bg.jpg'
        fullbg_filename = 'fullbg.jpg'
 
        # 獲取圖片
        bg_location_list, fullbg_location_list = self.get_images(bg_filename, fullbg_filename)
 
        # 根據位置對圖片進行合併還原
        bg_img = self.get_merge_image(bg_filename, bg_location_list)
        fullbg_img = self.get_merge_image(fullbg_filename, fullbg_location_list)
 
        # 獲取缺口位置
        gap = self.get_gap(fullbg_img, bg_img)
        print('缺口位置', gap)
 
        track = self.get_track(gap-self.BORDER)
        print('滑動滑塊')
        print(track)
 
 
if __name__ == '__main__':
    print('開始驗證')
    crack = Crack(u'中國移動')
    crack.crack()

運行效果以下:

圖片描述

第五步:移動滑塊

根據返回的每次滑動的距離,咱們移動滑塊至缺口位置。

編寫代碼以下:

# -*-coding:utf-8 -*-
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from urllib.request import urlretrieve
from selenium import webdriver
from bs4 import BeautifulSoup
import PIL.Image as image
import re
 
class Crack():
    def __init__(self,keyword):
        self.url = 'http://bj.gsxt.gov.cn/sydq/loginSydqAction!sydq.dhtml'
        self.browser = webdriver.Chrome('D:\\chromedriver.exe')
        self.wait = WebDriverWait(self.browser, 100)
        self.keyword = keyword
        self.BORDER = 6
 
    def open(self):
        """
        打開瀏覽器,並輸入查詢內容
        """
        self.browser.get(self.url)
        keyword = self.wait.until(EC.presence_of_element_located((By.ID, 'keyword_qycx')))
        bowton = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'btn')))
        keyword.send_keys(self.keyword)
        bowton.click()
 
    def get_images(self, bg_filename = 'bg.jpg', fullbg_filename = 'fullbg.jpg'):
        """
        獲取驗證碼圖片
        :return: 圖片的location信息
        """
        bg = []
        fullgb = []
        while bg == [] and fullgb == []:
            bf = BeautifulSoup(self.browser.page_source, 'lxml')
            bg = bf.find_all('div', class_ = 'gt_cut_bg_slice')
            fullgb = bf.find_all('div', class_ = 'gt_cut_fullbg_slice')
        bg_url = re.findall('url\(\"(.*)\"\);', bg[0].get('style'))[0].replace('webp', 'jpg')
        fullgb_url = re.findall('url\(\"(.*)\"\);', fullgb[0].get('style'))[0].replace('webp', 'jpg')
        bg_location_list = []
        fullbg_location_list = []
        for each_bg in bg:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_bg.get('style'))[0][18])
            bg_location_list.append(location)
        for each_fullgb in fullgb:
            location = {}
            location['x'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][0])
            location['y'] = int(re.findall('background-position: (.*)px (.*)px;',each_fullgb.get('style'))[0][19])
            fullbg_location_list.append(location)
 
        urlretrieve(url = bg_url, filename = bg_filename)
        print('缺口圖片下載完成')
        urlretrieve(url = fullgb_url, filename = fullbg_filename)
        print('背景圖片下載完成')
        return bg_location_list, fullbg_location_list
 
    def get_merge_image(self, filename, location_list):
        """
        根據位置對圖片進行合併還原
        :filename:圖片
        :location_list:圖片位置
        """
        im = image.open(filename)
        new_im = image.new('RGB', (260,116))
        im_list_upper=[]
        im_list_down=[]
 
        for location in location_list:
            if location['y'] == -58:
                im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x']) + 10, 166)))
            if location['y'] == 0:
                im_list_down.append(im.crop((abs(location['x']),0,abs(location['x']) + 10, 58)))
 
        new_im = image.new('RGB', (260,116))
 
        x_offset = 0
        for im in im_list_upper:
            new_im.paste(im, (x_offset,0))
            x_offset += im.size[0]
 
        x_offset = 0
        for im in im_list_down:
            new_im.paste(im, (x_offset,58))
            x_offset += im.size[0]
 
        new_im.save(filename)
 
        return new_im
 
    def get_merge_image(self, filename, location_list):
        """
        根據位置對圖片進行合併還原
        :filename:圖片
        :location_list:圖片位置
        """
        im = image.open(filename)
        new_im = image.new('RGB', (260,116))
        im_list_upper=[]
        im_list_down=[]
 
        for location in location_list:
            if location['y']==-58:
                im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x'])+10,166)))
            if location['y']==0:
                im_list_down.append(im.crop((abs(location['x']),0,abs(location['x'])+10,58)))
 
        new_im = image.new('RGB', (260,116))
 
        x_offset = 0
        for im in im_list_upper:
            new_im.paste(im, (x_offset,0))
            x_offset += im.size[0]
 
        x_offset = 0
        for im in im_list_down:
            new_im.paste(im, (x_offset,58))
            x_offset += im.size[0]
 
        new_im.save(filename)
 
        return new_im
 
    def is_pixel_equal(self, img1, img2, x, y):
        """
        判斷兩個像素是否相同
        :param image1: 圖片1
        :param image2: 圖片2
        :param x: 位置x
        :param y: 位置y
        :return: 像素是否相同
        """
        # 取兩個圖片的像素點
        pix1 = img1.load()[x, y]
        pix2 = img2.load()[x, y]
        threshold = 60
        if (abs(pix1[0] - pix2[0] < threshold) and abs(pix1[1] - pix2[1] < threshold) and abs(pix1[2] - pix2[2] < threshold)):
            return True
        else:
            return False
 
    def get_gap(self, img1, img2):
        """
        獲取缺口偏移量
        :param img1: 不帶缺口圖片
        :param img2: 帶缺口圖片
        :return:
        """
        left = 43
        for i in range(left, img1.size[0]):
            for j in range(img1.size[1]):
                if not self.is_pixel_equal(img1, img2, i, j):
                    left = i
                    return left
        return left   
 
    def get_track(self, distance):
        """
        根據偏移量獲取移動軌跡
        :param distance: 偏移量
        :return: 移動軌跡
        """
        # 移動軌跡
        track = []
        # 當前位移
        current = 0
        # 減速閾值
        mid = distance * 4 / 5
        # 計算間隔
        t = 0.2
        # 初速度
        v = 0
 
        while current < distance:
            if current < mid:
                # 加速度爲正2
                a = 2
            else:   
                # 加速度爲負3
                a = -3
            # 初速度v0
            v0 = v
            # 當前速度v = v0 + at
            v = v0 + a * t
            # 移動距離x = v0t + 1/2 * a * t^2
            move = v0 * t + 1 / 2 * a * t * t
            # 當前位移
            current += move
            # 加入軌跡
            track.append(round(move))
        return track
 
    def get_slider(self):
        """
        獲取滑塊
        :return: 滑塊對象
        """
        while True:
            try:
                slider = self.browser.find_element_by_xpath("//div[@class='gt_slider_knob gt_show']")
                break
            except:
                time.sleep(0.5)
        return slider
 
    def move_to_gap(self, slider, track):
        """
        拖動滑塊到缺口處
        :param slider: 滑塊
        :param track: 軌跡
        :return:
        """
        ActionChains(self.browser).click_and_hold(slider).perform()
        while track:
            x = random.choice(track)
            ActionChains(self.browser).move_by_offset(xoffset=x, yoffset=0).perform()
            track.remove(x)
        time.sleep(0.5)
        ActionChains(self.browser).release().perform()
 
    def crack(self):
        # 打開瀏覽器
        self.open()
 
        # 保存的圖片名字
        bg_filename = 'bg.jpg'
        fullbg_filename = 'fullbg.jpg'
 
        # 獲取圖片
        bg_location_list, fullbg_location_list = self.get_images(bg_filename, fullbg_filename)
 
        # 根據位置對圖片進行合併還原
        bg_img = self.get_merge_image(bg_filename, bg_location_list)
        fullbg_img = self.get_merge_image(fullbg_filename, fullbg_location_list)
 
        # 獲取缺口位置
        gap = self.get_gap(fullbg_img, bg_img)
        print('缺口位置', gap)
 
        track = self.get_track(gap-self.BORDER)
        print('滑動滑塊')
        print(track)
 
        # 點按呼出缺口
        slider = self.get_slider()
        # 拖動滑塊到缺口處
        self.move_to_gap(slider, track)
 
if __name__ == '__main__':
    print('開始驗證')
    crack = Crack(u'中國移動')
    crack.crack()
    print('驗證成功')

運行上述代碼,即實現滑動驗證碼破解,再看下那個nice的瞬間吧。

圖片描述

5、總結

  • 本文拋去了不少具體的實現過程,省略了每行代碼的講解,由於我感受,既然是進階教程,那些初級知識就不必再細講,學個人初級課程的朋友,應該已經具有了本身分析的能力。
  • 本文的破解方法僅用於學習交流,請勿用於任何非法用途。
  • 本文出現的全部代碼和,都可在個人github上下載,歡迎Follow、Star:https://github.com/Jack-Cheri...
  • 若有問題歡迎留言討論!

聯繫做者

圖片描述

分享技術,樂享生活:Jack Cui公衆號每週五推送「程序員歡樂送」系列資訊類文章,歡迎您的關注!


相關文章和視頻推薦

圓方圓學院聚集 Python + AI 名師,打造精品的 Python + AI 技術課程。 在各大平臺都長期有優質免費公開課,歡迎報名收看。
公開課地址:https://ke.qq.com/course/362788

相關文章
相關標籤/搜索