python3 自動識圖

時間 2020-05-08

標籤 python3 python 自動識圖欄目 Python 简体版

原文原文鏈接

1、安裝依賴庫html

pip install pytesseract python

pip install pillowgit

2、安裝識圖引擎tesseract-ocrgithub

https://pan.baidu.com/s/1QaYJc4ggpqhljf4sq_-WQw
密碼：2v4a測試

下載tesseract-ocr-setup-4.00.00dev.exe並安裝阿里雲

3、修改pytesseract庫指向tesseract的配置spa

一、找到python3的安裝路徑code

二、修改pytesseract.py文件htm

二、將tesseract_cmd的配置改爲tesseract安裝的執行文件blog

4、測試識圖

一、圖片內容

二、代碼

from PIL import Image from pytesseract import image_to_string tessdata_dir_config = '--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata"' img = Image.open("1.png") text = image_to_string(img,lang = 'eng',config=tessdata_dir_config) print(text)

三、結果

5、支持中文

全部語音包地址

https://github.com/tesseract-ocr/tessdata

一、下載中文語音包

https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata

二、將下載好的chi_sim.traineddata包放入Tesseract-OCR安裝地址中的tessdata目錄中

三、測試中文圖片

中文圖片

測試代碼

from PIL import Image from pytesseract import image_to_string tessdata_dir_config = '--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata"' img = Image.open("3.png") text = image_to_string(img,lang = 'chi_sim',config=tessdata_dir_config) #以前安裝的中文包名 print(text)

測試結果

四、圖標二值化

灰度化和二值化後的圖片

代碼

from PIL import Image from pytesseract import image_to_string tessdata_dir_config = '--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata"' img = Image.open("3.png") #灰度化
image = img.convert('L') pixels = image.load() threshold = 200 #閾值

#二值化
for x in range(image.width): for y in range(image.height): if pixels[x, y] > threshold: pixels[x, y] = 255
        else: pixels[x, y] = 0 image.show() text = image_to_string(image,lang = 'chi_sim',config=tessdata_dir_config) print(text)