在作爬蟲項目的時候,有時候會遇到驗證碼的問題,因爲某些網站的驗證碼是動態生成的,即便是同一個連接,在不一樣的時間訪問可能產生不一樣的驗證碼,java
一 剛開始的思路就是打開這個驗證碼的連接,而後經過java代碼get請求保存驗證碼圖片到本地,而後用打碼工具解析驗證碼,將驗證碼自動輸入驗證框就node
能夠把驗證碼的問題解決了,可是問題來,每次的請求同一個地址,產生的驗證碼圖片是不同的,因此這種方法行不通。因此只能將圖片先用selenium WebDriver linux
截取到本地,而後用打碼工具解析ok ,自動填寫驗證,很好把驗證碼的問題解決了。web
package com.entrym.main; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Date; import java.util.HashMap; import java.util.List; import java.util.Set; import javax.imageio.ImageIO; import org.apache.commons.io.FileUtils; import org.apache.commons.lang3.StringUtils; import org.json.JSONObject; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.openqa.selenium.By; import org.openqa.selenium.Cookie; import org.openqa.selenium.OutputType; import org.openqa.selenium.Point; import org.openqa.selenium.TakesScreenshot; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.support.ui.ExpectedCondition; import org.openqa.selenium.support.ui.WebDriverWait; import com.entrym.crawler.util.verifyCode.Captcha; import com.entrym.crawler.util.verifyCode.DamaUtil; import com.entrym.domain.SogouInfo; import com.entrym.domain.Wxinfo; import com.entrym.util.ConfigUtil; import com.entrym.util.DateUtil; import com.entrym.util.HttpUtils; import com.google.gson.Gson; import com.vdurmont.emoji.EmojiParser; public class WebTest { private static final String GET_TITLE="/titles/getxiaoshuo"; private static final String PATH=new File("config/config.properties").getAbsolutePath(); private static final String CHROME_HOME=new File("config/chromedriver.exe").getAbsolutePath(); private static final String CHROME_HOME_LINUX=new File("config/chromedriver").getAbsolutePath(); private static final String BASEURL=ConfigUtil.reads(PATH, "baseurl"); public static void main(String[] args) throws IOException { WebDriver driver=null; // System.setProperty("webdriver.gecko.driver", FIREFOX_HOME); System.out.println(PATH); String osname=System.getProperty("os.name").toLowerCase(); if(osname.indexOf("linux")>=0){ System.setProperty("webdriver.chrome.driver", CHROME_HOME_LINUX); // driver = new MarionetteDriver(); }else{ System.setProperty("webdriver.chrome.driver", CHROME_HOME); // driver = new MarionetteDriver(); } driver=new ChromeDriver(); driver.get("http://weixin.sogou.com/antispider/?from=%2fweixin%3Ftype%3d2%26query%3dz+%26ie%3dutf8%26s_from%3dinput%26_sug_%3dy%26_sug_type_%3d"); WebElement ele = driver.findElement(By.id("seccodeImage")); // Get entire page screenshot File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE); BufferedImage fullImg = ImageIO.read(screenshot); // Get the location of element on the page Point point = ele.getLocation(); // Get width and height of the element int eleWidth = ele.getSize().getWidth(); int eleHeight = ele.getSize().getHeight(); // Crop the entire page screenshot to get only element screenshot BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(), eleWidth, eleHeight); ImageIO.write(eleScreenshot, "png", screenshot); // Copy the element screenshot to disk File screenshotLocation = new File("D:/captcha/test.png"); FileUtils.copyFile(screenshot, screenshotLocation); WebElement classelement = driver.findElement(By.className("p2")); String errorText=classelement.getText(); System.out.println("輸出的內容是"+classelement.getText()); if(errorText.indexOf("用戶您好,您的訪問過於頻繁,爲確認本次訪問爲正經常使用戶行爲")>=0){ System.out.println("*********************"); DamaUtil util=new DamaUtil(); System.out.println("==================="); String code=""; //驗證碼 Captcha captcha=new Captcha(); captcha.setFilePath("test.png"); code = DamaUtil.getCaptchaResult(captcha); System.out.println("打碼處理出來的驗證碼是"+code); WebElement elementsumbit = driver.findElement(By.id("seccodeInput")); // 輸入關鍵字 elementsumbit.sendKeys(code); try { Thread.sleep(1000); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } // 提交 input 所在的 form elementsumbit.submit(); System.out.println("成功"); } } }
以上就代碼,關鍵的代碼在Stack Overflow獲得的,不得不說谷歌仍是很強大的chrome
喜歡呼呼的文章的朋友,能夠關注呼呼的我的公衆號:apache
driver.get("http://www.google.com"); WebElement ele = driver.findElement(By.id("hplogo")); // Get entire page screenshot File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE); BufferedImage fullImg = ImageIO.read(screenshot); // Get the location of element on the page Point point = ele.getLocation(); // Get width and height of the element int eleWidth = ele.getSize().getWidth(); int eleHeight = ele.getSize().getHeight(); // Crop the entire page screenshot to get only element screenshot BufferedImage eleScreenshot= fullImg.getSubimage(point.getX(), point.getY(), eleWidth, eleHeight); ImageIO.write(eleScreenshot, "png", screenshot); // Copy the element screenshot to disk File screenshotLocation = new File("C:\\images\\GoogleLogo_screenshot.png"); FileUtils.copyFile(screenshot, screenshotLocation);
以上就是關鍵的截取代碼,在國外的連接是http://stackoverflow.com/questions/13832322/how-to-capture-the-screenshot-of-a-specific-element-rather-than-entire-page-usin感興趣的小夥伴能夠研究一下