Python 進行 OCR識別 -- pytesseract庫

時間 2020-01-15

標籤 python 進行 ocr 識別 pytesseract 欄目 Python 简体版

原文原文鏈接

pip install pytesseract

報錯：tesseract is not installed or it's not in your path

下載安裝 Tesseract-OCR
- https://pan.baidu.com/s/1qXumxdltxOnb0geaE_1U-Q
修改 pytesseract 源碼中的路徑
- 文件位置： Python安裝目錄 \Lib\site-packages\pytesseract\pytesseract.py
- 將 tesseract_cmd 的值改成 Tesseract-OCR 的安裝路徑\tesseract.exe

識別中文須要新的字庫

https://pan.baidu.com/s/1GfspC5uef73B2Oa8YudBgQ
將下載的中文庫放在 Tesseract-OCR 安裝目錄下的 tessdata 文件夾中

圖片：English.png

圖片：Chinese.png

識別

import pytesseract
from PIL import Image

im_en = Image.open('English.png')
im_ch = Image.open('Chinese.png')

print('========識別字母========')
print(pytesseract.image_to_string(im_en), '\n\n')

print('========識別中文========')
print(pytesseract.image_to_string(im_ch, lang='chi_sim'))

結果

相關標籤/搜索