selenium WebDriver 截取網站的驗證碼

在作爬蟲項目的時候,有時候會遇到驗證碼的問題,因爲某些網站的驗證碼是動態生成的,即便是同一個連接,在不一樣的時間訪問可能產生不一樣的驗證碼,java

 

一 剛開始的思路就是打開這個驗證碼的連接,而後經過java代碼get請求保存驗證碼圖片到本地,而後用打碼工具解析驗證碼,將驗證碼自動輸入驗證框就node

能夠把驗證碼的問題解決了,可是問題來,每次的請求同一個地址,產生的驗證碼圖片是不同的,因此這種方法行不通。因此只能將圖片先用selenium  WebDriver linux

截取到本地,而後用打碼工具解析ok ,自動填寫驗證,很好把驗證碼的問題解決了。web

 

package com.entrym.main;

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Set;

import javax.imageio.ImageIO;

import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.StringUtils;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.openqa.selenium.By;
import org.openqa.selenium.Cookie;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.Point;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;

import com.entrym.crawler.util.verifyCode.Captcha;
import com.entrym.crawler.util.verifyCode.DamaUtil;
import com.entrym.domain.SogouInfo;
import com.entrym.domain.Wxinfo;
import com.entrym.util.ConfigUtil;
import com.entrym.util.DateUtil;
import com.entrym.util.HttpUtils;
import com.google.gson.Gson;
import com.vdurmont.emoji.EmojiParser;

public class WebTest {
	
	private static final String GET_TITLE="/titles/getxiaoshuo";
        private static final String PATH=new File("config/config.properties").getAbsolutePath();
	private static final String CHROME_HOME=new File("config/chromedriver.exe").getAbsolutePath();
	private static final String CHROME_HOME_LINUX=new File("config/chromedriver").getAbsolutePath();
	private static final String BASEURL=ConfigUtil.reads(PATH, "baseurl");
	
	public static void main(String[] args) throws IOException {
				
			WebDriver driver=null;
//			System.setProperty("webdriver.gecko.driver", FIREFOX_HOME);
				System.out.println(PATH);
			String osname=System.getProperty("os.name").toLowerCase();
			if(osname.indexOf("linux")>=0){
				System.setProperty("webdriver.chrome.driver", CHROME_HOME_LINUX);
//				driver = new MarionetteDriver();
			}else{
				System.setProperty("webdriver.chrome.driver", CHROME_HOME);
//				driver = new MarionetteDriver();
			}
				
			driver=new ChromeDriver();
			driver.get("http://weixin.sogou.com/antispider/?from=%2fweixin%3Ftype%3d2%26query%3dz+%26ie%3dutf8%26s_from%3dinput%26_sug_%3dy%26_sug_type_%3d");
			WebElement ele = driver.findElement(By.id("seccodeImage"));

			// Get entire page screenshot
			File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
			BufferedImage  fullImg = ImageIO.read(screenshot);

			// Get the location of element on the page
			Point point = ele.getLocation();

			// Get width and height of the element
			int eleWidth = ele.getSize().getWidth();
			int eleHeight = ele.getSize().getHeight();

			// Crop the entire page screenshot to get only element screenshot
			BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),
			    eleWidth, eleHeight);
			ImageIO.write(eleScreenshot, "png", screenshot);

			// Copy the element screenshot to disk
			File screenshotLocation = new File("D:/captcha/test.png");
			FileUtils.copyFile(screenshot, screenshotLocation);
			WebElement classelement = driver.findElement(By.className("p2"));
			String errorText=classelement.getText();
			System.out.println("輸出的內容是"+classelement.getText());
			if(errorText.indexOf("用戶您好,您的訪問過於頻繁,爲確認本次訪問爲正經常使用戶行爲")>=0){
				System.out.println("*********************");
				DamaUtil util=new DamaUtil();
		            System.out.println("===================");
		            String code="";           //驗證碼
					Captcha captcha=new Captcha();
					captcha.setFilePath("test.png");
					code = DamaUtil.getCaptchaResult(captcha);
					System.out.println("打碼處理出來的驗證碼是"+code);
					WebElement elementsumbit = driver.findElement(By.id("seccodeInput"));
			        // 輸入關鍵字
					elementsumbit.sendKeys(code);
					try {
						Thread.sleep(1000);
					} catch (InterruptedException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
					}
			        // 提交 input 所在的  form
					elementsumbit.submit();
					System.out.println("成功");
		          
			}
				
		}
}

  

 

以上就代碼,關鍵的代碼在Stack Overflow獲得的,不得不說谷歌仍是很強大的chrome

 

 

 

喜歡呼呼的文章的朋友,能夠關注呼呼的我的公衆號:apache

 

driver.get("http://www.google.com");
WebElement ele = driver.findElement(By.id("hplogo"));

// Get entire page screenshot
File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
BufferedImage  fullImg = ImageIO.read(screenshot);

// Get the location of element on the page
Point point = ele.getLocation();

// Get width and height of the element
int eleWidth = ele.getSize().getWidth();
int eleHeight = ele.getSize().getHeight();

// Crop the entire page screenshot to get only element screenshot
BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(),
    eleWidth, eleHeight);
ImageIO.write(eleScreenshot, "png", screenshot);

// Copy the element screenshot to disk
File screenshotLocation = new File("C:\\images\\GoogleLogo_screenshot.png");
FileUtils.copyFile(screenshot, screenshotLocation);
View Code
以上就是關鍵的截取代碼,在國外的連接是http://stackoverflow.com/questions/13832322/how-to-capture-the-screenshot-of-a-specific-element-rather-than-entire-page-usin感興趣的小夥伴能夠研究一下
相關文章
相關標籤/搜索