<dependency> <groupId>org.bytedeco.javacpp-presets</groupId> <artifactId>tesseract-platform</artifactId> <version>3.04.01-1.3</version> </dependency>
在resources目錄下新建tessdata目錄,而後從tessdata獲取一個ENG.traineddata,再在tessdata目錄下新建configs目錄,設置幾個配置文件java
api_configgit
tessedit_zero_rejection T
表示開啓tessedit_zero_rejectiongithub
digits
能夠設置白名單api
tessedit_char_whitelist 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
hocrmaven
tessedit_create_hocr 1
tessedit_create_hocr 1,表示輸出Html的意思學習
public static String recognize(String fileName) { BytePointer outText; tesseract.TessBaseAPI api = new tesseract.TessBaseAPI(); String path = OcrUtil.class.getClassLoader().getResource("").getPath(); if (api.Init(path, "ENG") != 0) { System.err.println("Could not initialize tesseract."); System.exit(1); } // api.SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); // api.SetPageSegMode(tesseract.PSM_SINGLE_LINE); // Open input image with leptonica library lept.PIX image = pixRead(fileName); api.SetImage(image); // Get OCR result outText = api.GetUTF8Text(); String captcha = outText.getString(); // Destroy used object and release memory api.End(); outText.deallocate(); pixDestroy(image); api.close(); return captcha.trim(); }