atitit。ocr框架類庫大全 attilax總結

時間 2019-11-16

標籤 atitit ocr 框架大全 attilax 總結简体版

原文原文鏈接

atitit。ocr框架類庫大全 attilax總結html

Tesseract

Asprise JavaOCR

閒來無事，發現百度有一個OCR文字識別接口，感受挺有意思的，拿來研究一下。 java

百度服務簡介：文字識別是百度天然場景OCR服務，依託百度業界領先的OCR算法，提供了整圖文字檢測、識別、整圖文字識別、整圖文字行定位和單字圖像識別等功能。算法

很少說啦，直接看demo吧！windows

java4less

The J4L OCR tools is set of components that can be used to include OCR capabilities in Java applications. That means you can receive faxes, PDF files or scan documents and extract business information from the images. The main 3 components are:app

a Java wrapper for the Tesseract OCR engine. The OCR engine Tesseract itself is delivered under the Apache 2.0 license and we support a version compiled for windows only.框架

a PDF to text converter. less

a text document parser.ide

The document recognition process can therefore be divided in 2 steps:ui

The component takes an image file (tif, png, jpg....) or a PDF file and returns the text contained in it. The Java wrapper will perform this operation by using Tesseract. Alternatively you can use any other OCR engine. If you are however using a PDF file, you will use our PDF to Text converter.this

In the second step, your Java application needs to understand the text returned by the OCR engine or PDF converter. This is done by the document parser. The document parser uses as input as text string (the data) and a xml file that describes the structure of the document and the ouput is a business document either as a Java object or as a XML file

JAVA實現百度OCR文字識別功能 - 張榮珍的專欄 - 博客頻道 - CSDN.NET.html

做者:: 綽號:老哇的爪子（全名：：Attilax Akbar Al Rapanui 阿提拉克斯阿克巴阿爾拉帕努伊）

漢字名：艾提拉（艾龍）， EMAIL:1466519819@qq.com

轉載請註明來源： http://www.cnblogs.com/attilax/

Atiend