原由:html
最近在練習解析驗證碼,看到了這個網站的驗證碼比較簡單,因而就拿來解析一下攢攢經驗值,並沒有任何冒犯之意...java
驗證碼所在網頁: https://www.w3cschool.cn/checkmphone?type=findpwd算法
驗證碼地址: https://www.w3cschool.cn/scodeapache
打開這個頁面: https://www.w3cschool.cn/scode,不斷的按F5刷新觀察,能夠發現,雖然每次字符內容、位置會變化,可是字體的樣式是一直不變的,對於這種字體樣式不變的,去噪去的好是能夠作到識別率100%的。json
而後再看噪音,下載下來一張圖在Windows自帶的畫圖中打開:後端
基本上都是噪點,對於噪點只須要判斷8鄰域判斷就能夠了,觀察了幾幅圖像應該都是噪點,可是我並不肯定到底有沒有噪塊,還有鑑於對於8鄰域我已經快寫吐了,因此這裏採用連通域來去除噪音。(沒有看到噪塊的狀況下可使用8鄰域試下,比較簡單這裏就不展開講啦。在我寫這段話的時候我以爲我真是太蠢了爲何放着簡單的8鄰域不用而非要用連通域呢...)app
而後就是注意到背景色還會變化,因此沒辦法直接肯定背景色究竟是啥色,這須要程序可以自動識別出背景色。這個比較簡單,只須要在計算連通域的時候將最大連通域標記爲背景色就能夠了。dom
總結:socket
1. 字體樣式無變化,意味着特徵極其穩定,識別率高post
2. 有噪音,可使用連通域來過濾
3. 背景色隨機,須要可以識別並統一白色,最大連通域標記爲背景色
提示:通常驗證碼的連接地址都沒有UA檢查,訪問次數限制之類的,能夠直接打開其所在連接快速刷新觀察規律。
無論三七二十一,先下載一些樣本到本地來慢慢觀察再說:
/** * 驗證碼下載路徑 */ public static final String CAPTCHA_URL = "https://www.w3cschool.cn/scode?rand="; public static void download(String saveDirectory, int howMany) { Random random = new Random(); ExecutorService executorService = Executors.newFixedThreadPool(10); while (howMany-- > 0) { executorService.submit(() -> { Response response = null; try { long currentMillis = System.currentTimeMillis(); Request request = Request.Get(CAPTCHA_URL + currentMillis); response = request.connectTimeout(2000).socketTimeout(2000).execute(); response.saveContent(new File(saveDirectory + random.nextLong() + ".png")); System.out.println("download..."); } catch (IOException e) { e.printStackTrace(); } finally { if (response != null) { response.discardContent(); } } }); } try { executorService.shutdown(); executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS); } catch (InterruptedException e) { e.printStackTrace(); } }
這裏下載了5000張圖片:
這裏下這麼可能是由於等下我要從這些圖片中自動生成一個字典,若是下得少了我怕會漏掉某些字符。
而後就是對下載下來的圖片進行處理,把噪音去掉:
/** * 去噪點,使用連通域大小來判斷 * * @param originalCaptcha 原始的驗證碼圖片 * @param areaSizeFilter 連通域小於等於此大小的將被過濾掉 * @return */ public static BufferedImage noiseClean(BufferedImage originalCaptcha, int areaSizeFilter) { // 會有一些干擾邊,把邊緣部分切割丟掉 int edgeDropWidth = 15; BufferedImage captcha = originalCaptcha.getSubimage(edgeDropWidth / 2, edgeDropWidth / 2, // originalCaptcha.getWidth() - edgeDropWidth, originalCaptcha.getHeight() - edgeDropWidth); int w = captcha.getWidth(); int h = captcha.getHeight(); int[][] book = new int[w][h]; // 連通域最大的色塊將被認爲是背景色,這樣實現了自動識別背景色 Map<Integer, Integer> flagAreaSizeMap = new HashMap<>(); int currentFlag = 1; int maxAreaSizeFlag = currentFlag; int maxAreaSizeColor = 0XFFFFFFFF; // 標記 for (int i = 0; i < w; i++) { for (int j = 0; j < h; j++) { if (book[i][j] != 0) { continue; } book[i][j] = currentFlag; int currentColor = captcha.getRGB(i, j); int areaSize = waterFlow(captcha, book, i, j, currentColor, currentFlag); if (areaSize > flagAreaSizeMap.getOrDefault(maxAreaSizeFlag, 0)) { maxAreaSizeFlag = currentFlag; maxAreaSizeColor = currentColor; } flagAreaSizeMap.put(currentFlag, areaSize); currentFlag++; } } // 複製 BufferedImage resultImage = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB); for (int i = 0; i < w; i++) { for (int j = 0; j < h; j++) { int currentColor = captcha.getRGB(i, j); if (book[i][j] == maxAreaSizeFlag // || (currentColor & 0XFFFFFF) == (maxAreaSizeColor & 0XFFFFFF) // || flagAreaSizeMap.get(book[i][j]) <= areaSizeFilter) { resultImage.setRGB(i, j, 0XFFFFFFFF); } else { resultImage.setRGB(i, j, currentColor); } } } return resultImage; } /** * 將圖像抽象爲顏色矩陣 * * @param img * @param book * @param x * @param y * @param color * @param flag * @return */ private static int waterFlow(BufferedImage img, int[][] book, int x, int y, int color, int flag) { if (x < 0 || x >= img.getWidth() || y < 0 || y >= img.getHeight()) { return 0; } // 這個1統計的是當前點 int areaSize = 1; for (int i = -1; i <= 1; i++) { for (int j = -1; j <= 1; j++) { int nextX = x + i; int nextY = y + j; if (nextX < 0 || nextX >= img.getWidth() || nextY < 0 || nextY >= img.getHeight()) { continue; } // 若是這一點沒有被訪問過,而且顏色相同 // if (book[nextX][nextY] == 0 && isSimilar(img.getRGB(nextX, nextY), color, 0)) { if (book[nextX][nextY] == 0 && (img.getRGB(nextX, nextY) & 0XFFFFFF) == (color & 0XFFFFFF)) { book[nextX][nextY] = flag; areaSize += waterFlow(img, book, nextX, nextY, color, flag); } } } return areaSize; }
這是前面那張圖通過去噪音以後的效果,由於噪音比較少,因此效果還能夠:
接下來就是將上面乾淨的圖片切割爲單個字符了,可是切割出來的結果會有不少,難道我要一個一個的去挑出來我須要的字典嗎,感受有點蠢,因此我決定讓程序自動推舉出字典來,只須要在切割出字符以後保存以前對字符圖片進行一個去重操做就能夠了,這裏爲了方便對圖片進行一個壓縮,將小圖壓縮爲了一個整數:
/** * 切割字符 * * @param img * @return */ public static List<BufferedImage> mattingCharacter(BufferedImage img) { List<BufferedImage> list = new ArrayList<>(); int w = img.getWidth(); int h = img.getHeight(); boolean lastColumnIsBlack = true; int beginColumn = -1; for (int i = 0; i < w; i++) { boolean currentColumnIsBlack = true; for (int j = 0; j < h; j++) { if ((img.getRGB(i, j) & 0XFFFFFF) != 0XFFFFFF) { currentColumnIsBlack = false; } } // 進入字符區域 if (lastColumnIsBlack && !currentColumnIsBlack) { beginColumn = i; } else if (!lastColumnIsBlack && currentColumnIsBlack) { // 離開字符區域 BufferedImage charImage = img.getSubimage(beginColumn, 0, i - beginColumn, h); BufferedImage trimCharImage = trimUpAndDown(charImage); list.add(trimCharImage); } lastColumnIsBlack = currentColumnIsBlack; } return list; } private static BufferedImage trimUpAndDown(BufferedImage img) { int w = img.getWidth(); int h = img.getHeight(); // 計算上方空白 int upBeginLine = -1; for (int i = 0; i < h; i++) { boolean currentColumnIsBlack = true; for (int j = 0; j < w; j++) { if ((img.getRGB(j, i) & 0XFFFFFF) != 0XFFFFFF) { currentColumnIsBlack = false; } } if (!currentColumnIsBlack) { upBeginLine = i; break; } } // 計算下方空白 int downBeginLine = -1; for (int i = h - 1; i >= 0; i--) { boolean currentColumnIsBlack = true; for (int j = 0; j < w; j++) { if ((img.getRGB(j, i) & 0XFFFFFF) != 0XFFFFFF) { currentColumnIsBlack = false; } } if (!currentColumnIsBlack) { downBeginLine = i; break; } } return img.getSubimage(0, upBeginLine, w, downBeginLine - upBeginLine + 1); } /** * 計算圖像的哈希值,即將圖片內容壓縮爲一個整數 * <p> * NOTE: 適用於小圖像 * * @param img * @return */ public static int imgHashCode(BufferedImage img) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < img.getWidth(); i++) { for (int j = 0; j < img.getHeight(); j++) { sb.append(i).append("|").append(j).append("|").append(img.getRGB(i, j) & 0XFFFFFF).append("|"); } } return sb.toString().hashCode(); }
下面是保存時去重的代碼:
/** * 獲得字符字典 * * @param srcDirectory * @param destDirectory */ public static void splitCharacter(String srcDirectory, String destDirectory) { File file = new File(srcDirectory); File[] imgFileArray = file.listFiles(); Map<Integer, BufferedImage> charDictionary = new HashMap<>(); for (File imgFile : imgFileArray) { BufferedImage image = null; try { image = ImageIO.read(imgFile); } catch (IOException e) { e.printStackTrace(); } List<BufferedImage> charList = W3cSchoolCaptchaUtil.mattingCharacter(image); charList.forEach(x -> { int hashcode = W3cSchoolCaptchaUtil.imgHashCode(x); System.out.println(hashcode); charDictionary.put(hashcode, x); }); System.out.println("split..."); } charDictionary.forEach((k, v) -> { try { ImageIO.write(v, "png", new File(destDirectory + k + ".png")); System.out.println("write..."); } catch (IOException e) { e.printStackTrace(); } }); }
這是自動推舉出來的字符,目前字符內容和文件名字尚未對應,等下須要手動標記:
接下來人工標記,將文件的名字改成圖片所表示的字符,改好以後的效果以下:
大寫字母+數字應該是36個的,這裏只有34個,是由於他們在生成驗證碼的時候講容易混淆的0和O去掉了,啊,看來仍是考慮到了用戶體驗的...
而後讀取這個目錄下的每一個文件,對每一個圖片的內容作hash將一個圖片映射爲文件名對應的整數:
/** * 根據字符圖片生成字符字典 * * @param charDirectory */ public static void genDictionary(String charDirectory) { File[] charImgs = new File(charDirectory).listFiles(); for (File charImgFile : charImgs) { try { BufferedImage charBufferedImage = ImageIO.read(charImgFile); int charHashCode = W3cSchoolCaptchaUtil.imgHashCode(charBufferedImage); System.out.printf("charMapping.put(%d, '%c');\n", charHashCode, charImgFile.getName().split("\\.")[0].charAt(0)); } catch (IOException e) { e.printStackTrace(); } } }
打印內容是初始化Map的代碼,直接粘過去初始化這個Map:
private static Map<Integer, Character> charMapping = new HashMap<>(); static { charMapping.put(1844796036, '1'); charMapping.put(1594429278, '2'); charMapping.put(-222305694, '3'); charMapping.put(452270032, '4'); charMapping.put(-1898118878, '5'); charMapping.put(999670338, '6'); charMapping.put(-965770966, '7'); charMapping.put(-337170896, '8'); charMapping.put(585835558, '9'); charMapping.put(-724014232, 'A'); charMapping.put(-428164778, 'B'); charMapping.put(-886387444, 'C'); charMapping.put(1946490946, 'D'); charMapping.put(416715843, 'E'); charMapping.put(-917974862, 'F'); charMapping.put(-764688176, 'G'); charMapping.put(28434468, 'H'); charMapping.put(10891004, 'I'); charMapping.put(-2084516900, 'J'); charMapping.put(259070252, 'K'); charMapping.put(1209338035, 'L'); charMapping.put(486706942, 'M'); charMapping.put(983181712, 'N'); charMapping.put(1065112842, 'P'); charMapping.put(183746070, 'Q'); charMapping.put(782513722, 'R'); charMapping.put(-984311436, 'S'); charMapping.put(-1276745734, 'T'); charMapping.put(-796848932, 'U'); charMapping.put(-967446486, 'V'); charMapping.put(331594374, 'W'); charMapping.put(1503060590, 'X'); charMapping.put(-507424510, 'Y'); charMapping.put(468466871, 'Z'); }
並基於以前寫的代碼編寫解析驗證碼圖片的方法:
/** * 解析傳入的驗證碼 * * @param captcha * @return */ public static String ocr(BufferedImage captcha) { BufferedImage noiseCleaned = noiseClean(captcha, 20); List<BufferedImage> charImageList = mattingCharacter(noiseCleaned); return charImageList.stream().map(x -> charMapping.get(imgHashCode(x)).toString()).collect(joining()); }
再寫點代碼驗證以前的解析算法的正確性:
package bar.ocr.w3cschool; import org.apache.http.client.fluent.Request; import org.apache.http.client.fluent.Response; import org.apache.http.message.BasicNameValuePair; import javax.imageio.ImageIO; import java.awt.image.BufferedImage; import java.io.IOException; /** * 用來驗證以前寫的代碼的正確性 * * @author CC11001100 */ public class VerifyAccuracy { /** * 發起一次驗證,將結果是否成功返回,這裏的結果只是爲了驗證驗證碼識別的結果 * * @return */ private static boolean once() { Request request = Request.Get(DownloadCaptcha.CAPTCHA_URL + System.currentTimeMillis()); Response response = null; String captchaString = ""; try { response = request.connectTimeout(2000).socketTimeout(2000).execute(); BufferedImage captchaImg = ImageIO.read(response.returnContent().asStream()); captchaString = W3cSchoolCaptchaUtil.ocr(captchaImg); System.out.printf("captcha is: %s\n", captchaString); } catch (IOException e) { e.printStackTrace(); return false; } finally { if (response != null) { response.discardContent(); } } Request postSms = Request.Post("https://www.w3cschool.cn/sendsmscode"); // 手機號改成不合法的,後端會有校驗這樣短信就不會被髮出去,不然.... - - postSms.bodyForm(new BasicNameValuePair("mphone", "123456789"), // new BasicNameValuePair("type", "findpwd"), // new BasicNameValuePair("scode", captchaString)); try { response = postSms.socketTimeout(2000).connectTimeout(2000).execute(); String json = response.returnContent().asString(); System.out.printf("response is: %s\n", json); return !json.contains("驗證碼錯誤"); } catch (IOException e) { e.printStackTrace(); } finally { if (response != null) { response.discardContent(); } } return false; } public static void main(String[] args) { int totalTimes = 100; int successCount = 0; for (int i = 0; i < totalTimes; i++) { System.out.printf("%d :\n", i + 1); if (once()) { successCount++; System.out.println("ocr success"); } else { System.out.println("ocr failed"); } System.out.println(); } System.out.printf("success times %d, accuracy is %g%%\n", successCount, 1.0 * successCount / totalTimes * 100); } }
跑一下看看效果:
由於字體並無任何的變化,因此經過直接比對是能夠作到準確率100%的。
總結: 對於字體樣式等沒有變化的,不該該炫技搞訓練啥的,直接比對就能夠作到準確率100%了,固然去噪要作得好。
下面貼上完整代碼:
DownloadCaptcha.java:
package bar.ocr.w3cschool; import org.apache.http.client.fluent.Request; import org.apache.http.client.fluent.Response; import javax.imageio.ImageIO; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Random; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; /** * @author CC11001100 */ public class DownloadCaptcha { /** * 驗證碼下載路徑 */ public static final String CAPTCHA_URL = "https://www.w3cschool.cn/scode?rand="; public static void download(String saveDirectory, int howMany) { Random random = new Random(); ExecutorService executorService = Executors.newFixedThreadPool(10); while (howMany-- > 0) { executorService.submit(() -> { Response response = null; try { long currentMillis = System.currentTimeMillis(); Request request = Request.Get(CAPTCHA_URL + currentMillis); response = request.connectTimeout(2000).socketTimeout(2000).execute(); response.saveContent(new File(saveDirectory + random.nextLong() + ".png")); System.out.println("download..."); } catch (IOException e) { e.printStackTrace(); } finally { if (response != null) { response.discardContent(); } } }); } try { executorService.shutdown(); executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS); } catch (InterruptedException e) { e.printStackTrace(); } } /** * 處理噪點噪塊等 * * @param srcDirectory * @param destDirectory */ public static void processNoise(String srcDirectory, String destDirectory) { File file = new File(srcDirectory); File[] imgFileArray = file.listFiles(); for (File imgFile : imgFileArray) { try { BufferedImage image = ImageIO.read(imgFile); BufferedImage noiseCleanImage = W3cSchoolCaptchaUtil.noiseClean(image, 20); ImageIO.write(noiseCleanImage, "png", new File(destDirectory + imgFile.getName())); System.out.println("process noise..."); } catch (IOException e) { e.printStackTrace(); } } } /** * 獲得字符字典 * * @param srcDirectory * @param destDirectory */ public static void splitCharacter(String srcDirectory, String destDirectory) { File file = new File(srcDirectory); File[] imgFileArray = file.listFiles(); Map<Integer, BufferedImage> charDictionary = new HashMap<>(); for (File imgFile : imgFileArray) { BufferedImage image = null; try { image = ImageIO.read(imgFile); } catch (IOException e) { e.printStackTrace(); } List<BufferedImage> charList = W3cSchoolCaptchaUtil.mattingCharacter(image); charList.forEach(x -> { int hashcode = W3cSchoolCaptchaUtil.imgHashCode(x); System.out.println(hashcode); charDictionary.put(hashcode, x); }); System.out.println("split..."); } charDictionary.forEach((k, v) -> { try { ImageIO.write(v, "png", new File(destDirectory + k + ".png")); System.out.println("write..."); } catch (IOException e) { e.printStackTrace(); } }); } /** * 根據字符圖片生成字符字典 * * @param charDirectory */ public static void genDictionary(String charDirectory) { File[] charImgs = new File(charDirectory).listFiles(); for (File charImgFile : charImgs) { try { BufferedImage charBufferedImage = ImageIO.read(charImgFile); int charHashCode = W3cSchoolCaptchaUtil.imgHashCode(charBufferedImage); System.out.printf("charMapping.put(%d, '%c');\n", charHashCode, charImgFile.getName().split("\\.")[0].charAt(0)); } catch (IOException e) { e.printStackTrace(); } } } public static void main(String[] args) { // download("D:/test/ocr/w3cschool/original/", 5000); // processNoise("D:/test/ocr/w3cschool/original", "D:/test/ocr/w3cschool/stage01/"); // splitCharacter("D:/test/ocr/w3cschool/stage01", "D:/test/ocr/w3cschool/stage02/"); genDictionary("D:/test/ocr/w3cschool/stage03"); } }
W3cSchoolCaptchaUtil.java:
package bar.ocr.w3cschool; import java.awt.image.BufferedImage; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import static java.util.stream.Collectors.joining; /** * @author CC11001100 */ public class W3cSchoolCaptchaUtil { private static Map<Integer, Character> charMapping = new HashMap<>(); static { charMapping.put(1844796036, '1'); charMapping.put(1594429278, '2'); charMapping.put(-222305694, '3'); charMapping.put(452270032, '4'); charMapping.put(-1898118878, '5'); charMapping.put(999670338, '6'); charMapping.put(-965770966, '7'); charMapping.put(-337170896, '8'); charMapping.put(585835558, '9'); charMapping.put(-724014232, 'A'); charMapping.put(-428164778, 'B'); charMapping.put(-886387444, 'C'); charMapping.put(1946490946, 'D'); charMapping.put(416715843, 'E'); charMapping.put(-917974862, 'F'); charMapping.put(-764688176, 'G'); charMapping.put(28434468, 'H'); charMapping.put(10891004, 'I'); charMapping.put(-2084516900, 'J'); charMapping.put(259070252, 'K'); charMapping.put(1209338035, 'L'); charMapping.put(486706942, 'M'); charMapping.put(983181712, 'N'); charMapping.put(1065112842, 'P'); charMapping.put(183746070, 'Q'); charMapping.put(782513722, 'R'); charMapping.put(-984311436, 'S'); charMapping.put(-1276745734, 'T'); charMapping.put(-796848932, 'U'); charMapping.put(-967446486, 'V'); charMapping.put(331594374, 'W'); charMapping.put(1503060590, 'X'); charMapping.put(-507424510, 'Y'); charMapping.put(468466871, 'Z'); } /** * 去噪點,使用連通域大小來判斷 * * @param originalCaptcha 原始的驗證碼圖片 * @param areaSizeFilter 連通域小於等於此大小的將被過濾掉 * @return */ public static BufferedImage noiseClean(BufferedImage originalCaptcha, int areaSizeFilter) { // 會有一些干擾邊,把邊緣部分切割丟掉 int edgeDropWidth = 15; BufferedImage captcha = originalCaptcha.getSubimage(edgeDropWidth / 2, edgeDropWidth / 2, // originalCaptcha.getWidth() - edgeDropWidth, originalCaptcha.getHeight() - edgeDropWidth); int w = captcha.getWidth(); int h = captcha.getHeight(); int[][] book = new int[w][h]; // 連通域最大的色塊將被認爲是背景色,這樣實現了自動識別背景色 Map<Integer, Integer> flagAreaSizeMap = new HashMap<>(); int currentFlag = 1; int maxAreaSizeFlag = currentFlag; int maxAreaSizeColor = 0XFFFFFFFF; // 標記 for (int i = 0; i < w; i++) { for (int j = 0; j < h; j++) { if (book[i][j] != 0) { continue; } book[i][j] = currentFlag; int currentColor = captcha.getRGB(i, j); int areaSize = waterFlow(captcha, book, i, j, currentColor, currentFlag); if (areaSize > flagAreaSizeMap.getOrDefault(maxAreaSizeFlag, 0)) { maxAreaSizeFlag = currentFlag; maxAreaSizeColor = currentColor; } flagAreaSizeMap.put(currentFlag, areaSize); currentFlag++; } } // 複製 BufferedImage resultImage = new BufferedImage(w, h, BufferedImage.TYPE_INT_RGB); for (int i = 0; i < w; i++) { for (int j = 0; j < h; j++) { int currentColor = captcha.getRGB(i, j); if (book[i][j] == maxAreaSizeFlag // || (currentColor & 0XFFFFFF) == (maxAreaSizeColor & 0XFFFFFF) // || flagAreaSizeMap.get(book[i][j]) <= areaSizeFilter) { resultImage.setRGB(i, j, 0XFFFFFFFF); } else { resultImage.setRGB(i, j, currentColor); } } } return resultImage; } /** * 將圖像抽象爲顏色矩陣 * * @param img * @param book * @param x * @param y * @param color * @param flag * @return */ private static int waterFlow(BufferedImage img, int[][] book, int x, int y, int color, int flag) { if (x < 0 || x >= img.getWidth() || y < 0 || y >= img.getHeight()) { return 0; } // 這個1統計的是當前點 int areaSize = 1; for (int i = -1; i <= 1; i++) { for (int j = -1; j <= 1; j++) { int nextX = x + i; int nextY = y + j; if (nextX < 0 || nextX >= img.getWidth() || nextY < 0 || nextY >= img.getHeight()) { continue; } // 若是這一點沒有被訪問過,而且顏色相同 // if (book[nextX][nextY] == 0 && isSimilar(img.getRGB(nextX, nextY), color, 0)) { if (book[nextX][nextY] == 0 && (img.getRGB(nextX, nextY) & 0XFFFFFF) == (color & 0XFFFFFF)) { book[nextX][nextY] = flag; areaSize += waterFlow(img, book, nextX, nextY, color, flag); } } } return areaSize; } // /** // * 判斷兩個像素的類似性 // * // * @param rgb1 // * @param rgb2 // * @param distance // * @return // */ // private static boolean isSimilar(int rgb1, int rgb2, int distance) { // int r1 = rgb1 & 0XFF0000 >> 16; // int g1 = rgb1 & 0X00FF00 >> 8; // int b1 = rgb1 & 0X0000FF; // // int r2 = rgb2 & 0XFF0000 >> 16; // int g2 = rgb2 & 0X00FF00 >> 8; // int b2 = rgb2 & 0X0000FF; // // return (Math.abs(r1 - r2) <= distance) && (Math.abs(g1 - g2) <= distance) && (Math.abs(b1 - b2) <= distance); // } /** * 切割字符 * * @param img * @return */ public static List<BufferedImage> mattingCharacter(BufferedImage img) { List<BufferedImage> list = new ArrayList<>(); int w = img.getWidth(); int h = img.getHeight(); boolean lastColumnIsBlack = true; int beginColumn = -1; for (int i = 0; i < w; i++) { boolean currentColumnIsBlack = true; for (int j = 0; j < h; j++) { if ((img.getRGB(i, j) & 0XFFFFFF) != 0XFFFFFF) { currentColumnIsBlack = false; } } // 進入字符區域 if (lastColumnIsBlack && !currentColumnIsBlack) { beginColumn = i; } else if (!lastColumnIsBlack && currentColumnIsBlack) { // 離開字符區域 BufferedImage charImage = img.getSubimage(beginColumn, 0, i - beginColumn, h); BufferedImage trimCharImage = trimUpAndDown(charImage); list.add(trimCharImage); } lastColumnIsBlack = currentColumnIsBlack; } return list; } private static BufferedImage trimUpAndDown(BufferedImage img) { int w = img.getWidth(); int h = img.getHeight(); // 計算上方空白 int upBeginLine = -1; for (int i = 0; i < h; i++) { boolean currentColumnIsBlack = true; for (int j = 0; j < w; j++) { if ((img.getRGB(j, i) & 0XFFFFFF) != 0XFFFFFF) { currentColumnIsBlack = false; } } if (!currentColumnIsBlack) { upBeginLine = i; break; } } // 計算下方空白 int downBeginLine = -1; for (int i = h - 1; i >= 0; i--) { boolean currentColumnIsBlack = true; for (int j = 0; j < w; j++) { if ((img.getRGB(j, i) & 0XFFFFFF) != 0XFFFFFF) { currentColumnIsBlack = false; } } if (!currentColumnIsBlack) { downBeginLine = i; break; } } return img.getSubimage(0, upBeginLine, w, downBeginLine - upBeginLine + 1); } /** * 計算圖像的哈希值,即將圖片內容壓縮爲一個整數 * <p> * NOTE: 適用於小圖像 * * @param img * @return */ public static int imgHashCode(BufferedImage img) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < img.getWidth(); i++) { for (int j = 0; j < img.getHeight(); j++) { sb.append(i).append("|").append(j).append("|").append(img.getRGB(i, j) & 0XFFFFFF).append("|"); } } return sb.toString().hashCode(); } /** * 解析傳入的驗證碼 * * @param captcha * @return */ public static String ocr(BufferedImage captcha) { BufferedImage noiseCleaned = noiseClean(captcha, 20); List<BufferedImage> charImageList = mattingCharacter(noiseCleaned); return charImageList.stream().map(x -> charMapping.get(imgHashCode(x)).toString()).collect(joining()); } }
參考資料:
1. https://www.w3cschool.cn/checkmphone?type=findpwd
2. https://www.w3cschool.cn/scode
.