python 3.7 識別圖片

時間 2019-11-05

標籤 python 3.7 識別圖片欄目 Python 简体版

原文原文鏈接

爲了把百度文檔的內容弄下來，就弄了一下這個python

基本環境
操做系統：win7 64位系統

python版本：3.7git

2.安裝配套環境
2.1 首先安裝OCR字符識別庫Tesseract 下載網址：https://digi.bib.uni-mannheim.de/tesseract/
我下載的是：tesseract-ocr-w64-setup-v4.0.0-beta.4.20180912.exegithub

2.2 下載後雙擊進行安裝，這裏由於咱們要識別中文字符，因此在安裝界面中須要進行額外的語言勾選，展開Additional language data（這裏添加語言可能會出現語言包安裝失敗，可單獨下載語言包，放入安裝目錄下的tessdata下便可）ide

而後按照下圖進行勾選
測試

2.3 安裝python環境
pip install Pillow
pip install pytesseractgoogle

2.4 修改pytesseract.py（在這路徑下 python37\Scripts）
tesseract_cmd = 'D:/Program Files (x86)/Tesseract-OCR/tesseract.exe'url

3.測試（識別中文的時候，在剪切圖片，要讓數字稍微大一點，把數字放在圖片中心，若識別出來，錯別字比較多的話，再從新弄一次圖片來識別）
#coding=utf-8
from PIL import Image
import pytesseract
text=pytesseract.image_to_string(Image.open('H:/2.png'),lang='chi_sim')
for i in text.split("\n"):
print(i.replace(" ",""))操作系統

報錯提示語言包，可在這下面進行下載
https://github.com/tesseract-ocr/tessdata.net

參考文檔：
還有一些關閉了，沒有copy到url，能夠百度和google，一大堆
https://blog.csdn.net/a519395243/article/details/80447038blog

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。