atitit。ocr框架類庫大全 attilax總結html
閒來無事,發現百度有一個OCR文字識別接口,感受挺有意思的,拿來研究一下。 java
百度服務簡介:文字識別是百度天然場景OCR服務,依託百度業界領先的OCR算法,提供了整圖文字檢測、識別、整圖文字識別、整圖文字行定位和單字圖像識別等功能。算法
很少說啦,直接看demo吧!windows
The J4L OCR tools is set of components that can be used to include OCR capabilities in Java applications. That means you can receive faxes, PDF files or scan documents and extract business information from the images. The main 3 components are:app
a Java wrapper for the Tesseract OCR engine. The OCR engine Tesseract itself is delivered under the Apache 2.0 license and we support a version compiled for windows only.框架
a PDF to text converter. less
a text document parser.ide
The document recognition process can therefore be divided in 2 steps:ui
The component takes an image file (tif, png, jpg....) or a PDF file and returns the text contained in it. The Java wrapper will perform this operation by using Tesseract. Alternatively you can use any other OCR engine. If you are however using a PDF file, you will use our PDF to Text converter.this
In the second step, your Java application needs to understand the text returned by the OCR engine or PDF converter. This is done by the document parser. The document parser uses as input as text string (the data) and a xml file that describes the structure of the document and the ouput is a business document either as a Java object or as a XML file
JAVA實現百度OCR文字識別功能 - 張榮珍的專欄 - 博客頻道 - CSDN.NET.html
做者:: 綽號:老哇的爪子 ( 全名::Attilax Akbar Al Rapanui 阿提拉克斯 阿克巴 阿爾 拉帕努伊 )
漢字名:艾提拉(艾龍), EMAIL:1466519819@qq.com
轉載請註明來源: http://www.cnblogs.com/attilax/
Atiend