Python圖形驗證碼識別

時間 2019-12-11

原文原文鏈接

一，OCR　　

　　OCR,即Optical Character Recognition，光學字符識別，經過掃描字符，分析形狀，而後將其翻譯成電子文本的過程。tesserocr是Python的一個OCR識別庫，但實際上是對tesseract作的一層封裝。安裝tesserocr以前須要先按照tesseract。git

二，準備工具

　　安裝庫tesserocr，windows下安裝前須要下載安裝tesseract，github

　　tesseract下載地址：https://digi.bib.uni-mannheim.de/tesseract/windows

　　圖中有不少版本，其中帶dev的爲開發版本，不帶dev的爲穩定版本，推薦下載穩定版本。app

　　安裝時勾選Additional language data選項來安裝OCR識別支持的語言包，能夠識別多國語言。而後一直點擊Next便可。工具

　　接下來，安裝tesserocr便可：pip3 install tesserocr pillowui

　　whl安裝包下載連接：https://github.com/simonflueckiger/tesserocr-windows_build/releasesspa

　　選擇合適的版本下載運行翻譯

　　pip3 install tesserocr-2.2.2-cp36-cp36m-win_amd64.whl

code

3、代碼blog

import tesserocr
from PIL import Image

image = Image.open('code.png')
res = tesserocr.image_to_text(image)
print(image, res)
# 二值化
image = image.convert('L')
threshold = 127
table = []
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

image = image.point(table, '1')
image.show()

result = tesserocr.image_to_text(image)
print(result)

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。