python處理識別圖片驗證碼

時間 2019-12-01

原文原文鏈接

安裝圖片圖像處理標準庫PIL

32位windows系統下載連接:https://pypi.python.org/pypi/Pillow/2.1.0#id2
64位windows系統下載連接:https://pypi.python.org/pypi/Pillow/2.1.0#downloads

圖片處理示例:

1 from PIL import Image
2 from pytesser import *
3 image = Image.open('7039.jpg')
4 print image_file_to_string('7039.jpg')
5 print image_to_string(image)

備註：若是出現報錯ImportError: The _imaging C module is not installed，可能出現的緣由下載錯了版本，更改下安裝64位的版本python

圖片識別

pytesser是谷歌OCR開源項目的一個模塊，在python中導入這個模塊便可將圖片中的文字轉換成文本，可是在pytesser模塊中調用了tesseract，因此須要先安裝tesseractwindows

tesseract下載路徑:https://bitbucket.org/3togo/python-tesseract/downloads/,選擇合適的版本進行下載安裝ide

pytesser安裝
- 下載路徑:http://code.google.com/p/pytesser/ ,下載下來的模塊包並非傳統的安裝包,因此須要進行一系列的設置
- 解壓文件夾,新建一個空的__init.py__文件
- 下載Tesseract OCR engine：http://code.google.com/p/tesseract-ocr/,解壓後,將文件中的tessdata文件夾,複製至ptesser中進行替換原文件
- 複製pytesser至python安裝目錄的Libsite-packages,而且添加至環境變量中(若是以爲這一系列操做複雜能夠直接將源碼放到代碼路徑)

圖片識別源碼google

1 from PIL import Image
2 from pytesser import *
3 image = Image.open('7039.jpg')
4 print image_file_to_string('7039.jpg')
5 print image_to_string(image)