代碼以下:html
#coding:utf-8 from PIL import Image import pytesseract def test(): im = Image.open(r"pic.gif") vcode = pytesseract.image_to_string(im) print vcode
執行以上代碼進行簡單驗證碼識別的時候會拋出一個異常:python
Traceback (most recent call last): File "D:\test\vcode.py", line 15, in <module> main() File "D:\test\vcode.py", line 9, in main test() File "D:\test\test.py", line 8, in test vcode = pytesseract.image_to_string(im) File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 143, in image_to_string File "D:\Program Files (x86)\Python\Python27\lib\site-packages\PIL\Image.py", line 1749, in split self.load() File "D:\Program Files (x86)\Python\Python27\lib\site-packages\PIL\ImageFile.py", line 232, in load "(%d bytes not processed)" % len(b)) IOError: image file is truncated (5 bytes not processed)
解決辦法是,再添加以下2句代碼:cookie
from PIL import ImageFile ImageFile.LOAD_TRUNCATED_IMAGES = True
最終,完整的代碼以下:post
#coding:utf-8 from PIL import Image import pytesseract from PIL import ImageFile ImageFile.LOAD_TRUNCATED_IMAGES = True def test(): im = Image.open(r"pic.gif") vcode = pytesseract.image_to_string(im) print vcode
關於利用python進行驗證碼識別的一些想法:http://www.cnblogs.com/xiaowuyi/archive/2012/09/10/2675286.htmlui
python利用pytesser模塊實現圖片文字識別:http://www.jinglingshu.org/?p=9281spa
驗證碼圖片字符識別兩種python實現方法:http://vipscu.blog.163.com/blog/static/18180837220134234528457/code
python模擬登錄登錄一:驗證碼與cookies的同步處理思路:http://www.dabu.info/python-login-crawler-captcha-cookies.htmlhtm
原文地址:http://www.cnblogs.com/hongfei/p/4436767.htmlblog