今天介紹一個簡單驗證的識別。html
主要是標準的格式,沒有扭曲和變現。就用 pytesseract 去識別一下。web
驗證碼地址:http://wscx.gjxfj.gov.cn/zfp/webroot/xfsxcx.html數組
須要識別的驗證碼是:spa
由於這個驗證碼有干擾點,因此直接識別的效果很是很差。code
首先對驗證碼進行二值化和降噪。htm
效果以下:blog
識別結果:utf-8
識別率只有百分之四十,針對這麼低的識別率,能夠去切割分類,目前這個驗證碼很容易去切割。提升驗證碼的識別率問題。get
二值化代碼:string
# coding:utf-8 import sys, os from PIL import Image, ImageDraw # 二值數組 t2val = {} def twoValue(image, G): for y in xrange(0, image.size[1]): for x in xrange(0, image.size[0]): g = image.getpixel((x, y)) if g > G: t2val[(x, y)] = 1 else: t2val[(x, y)] = 0 # 根據一個點A的RGB值,與周圍的8個點的RBG值比較,設定一個值N(0 <N <8),當A的RGB值與周圍8個點的RGB相等數小於N時,此點爲噪點 # G: Integer 圖像二值化閥值 # N: Integer 降噪率 0 <N <8 # Z: Integer 降噪次數 # 輸出 # 0:降噪成功 # 1:降噪失敗 def clearNoise(image, N, Z): for i in xrange(0, Z): t2val[(0, 0)] = 1 t2val[(image.size[0] - 1, image.size[1] - 1)] = 1 for x in xrange(1, image.size[0] - 1): for y in xrange(1, image.size[1] - 1): nearDots = 0 L = t2val[(x, y)] if L == t2val[(x - 1, y - 1)]: nearDots += 1 if L == t2val[(x - 1, y)]: nearDots += 1 if L == t2val[(x - 1, y + 1)]: nearDots += 1 if L == t2val[(x, y - 1)]: nearDots += 1 if L == t2val[(x, y + 1)]: nearDots += 1 if L == t2val[(x + 1, y - 1)]: nearDots += 1 if L == t2val[(x + 1, y)]: nearDots += 1 if L == t2val[(x + 1, y + 1)]: nearDots += 1 if nearDots < N: t2val[(x, y)] = 1 def saveImage(filename, size): image = Image.new("1", size) draw = ImageDraw.Draw(image) for x in xrange(0, size[0]): for y in xrange(0, size[1]): draw.point((x, y), t2val[(x, y)]) image.save(filename) for i in range(1,11): path = "5/" + str(i) + ".jpg" image = Image.open(path).convert("L") twoValue(image, 222) clearNoise(image, 3, 6) path1 = "5/" + str(i) + ".png" saveImage(path1, image.size)
識別代碼:
#coding:utf-8 from common.contest import * from PIL import Image import pytesseract def recognize_captcha(img_path): im = Image.open(img_path) tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"' num = pytesseract.image_to_string(im,config=tessdata_dir_config) return num if __name__ == '__main__': for i in range(1, 11): img_path = "5/" + str(i) + ".png" res = recognize_captcha(img_path) strs = res.split("\n") print strs[0].replace(" ",'')