pyocr的安裝使用簡要指南

時間 2019-11-19

標籤 pyocr 安裝使用簡要指南简体版

原文原文鏈接

pyocr是一個Python的OCR類庫，他的github地址是:https://github.com/jflesch/pyocr 。python

若是要使用這個類庫，須要環境上有幾個依賴:linux

tesseract-ocr：一個開源的OCR類庫,要求是在3.01以上(能夠用 tesseract --version 命令檢測版本)git

PIL：Python的圖片處理類庫github

Python的版本>=2.7shell

個人測試環境是ubuntu 15.10
ubuntu

能夠使用help命令檢查是否存在須要PIL測試

Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
[GCC 5.2.1 20151010] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> help("modules")

若是不存在PIL,能夠使用命令安裝ui

$ sudo apt-get install python-imaging

安裝tesseract-ocr。code

這個安裝步驟相對來講比較簡單。orm

從網上下載源碼包以後進行解壓，而後進行編譯安裝便可，安裝步驟以下：

./autogen.sh
./configure
make
sudo make install
sudo ldconfig

安裝語言包，這個能夠從網上直接安裝，我這裏只是安裝了英文和中文的語言包

$ sudo apt-get install tesseract-ocr-eng tesseract-ocr-chi-sim

而後配置系統環境：

export TESSDATA_PREFIX="tessdata所在的路徑」

這樣就能夠進行測試了

$ tesseract t2.png out -l chi_sim

至此，環境已經處理完畢，就能夠按照說明來安裝pyocr了。

下載pyocr源碼包，解壓執行命令去安裝

$ sudo python ./setup.py install

若是沒有出現意外的話就已經安裝成功，能夠嘗試一個demo來驗證咱們的安裝配置是否成功

下面是一個識別圖片上中文的demo，這個能夠根據本身的環境修改:

from PIL import Image
import sys
import pyocr
import pyocr.builders
image_path = sys.argv[1]
tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]
print("Will use tool '%s'" % (tool.get_name()))
# Ex: Will use tool 'tesseract'
langs = tool.get_available_languages()
print("Available languages: %s" % ", ".join(langs))
lang = langs[1]
print("Will use lang '%s'" % (lang))
# Ex: Will use lang 'fra'
txt = tool.image_to_string(
    Image.open(image_path),
   ,
    builder=pyocr.builders.TextBuilder()
)
print txt

須要說明的是，安裝tesseract-ocr須要編譯環境的支持，我這邊沒有檢查，因此出現了下面的錯誤

一、wrong: autogen.sh: 60: autogen.sh: aclocal: not found

缺乏: automake

二、autogen.sh: 65: autogen.sh: libtoolize: not found

autogen.sh: 65: autogen.sh: glibtoolize: not found

缺乏： libtool

三、configure: error: leptonica not found

sudo apt-get install libleptonica-dev