python 驗證碼識別示例(五) 簡單驗證碼識別

 

今天介紹一個簡單驗證的識別。html

主要是標準的格式,沒有扭曲和變現。就用 pytesseract 去識別一下。web

 

驗證碼地址:http://wscx.gjxfj.gov.cn/zfp/webroot/xfsxcx.html數組

 

須要識別的驗證碼是:spa

由於這個驗證碼有干擾點,因此直接識別的效果很是很差。code

首先對驗證碼進行二值化和降噪。htm

效果以下:blog

 

識別結果:utf-8

 

 

識別率只有百分之四十,針對這麼低的識別率,能夠去切割分類,目前這個驗證碼很容易去切割。提升驗證碼的識別率問題。get

 

二值化代碼:string

# coding:utf-8
import sys, os
from PIL import Image, ImageDraw

# 二值數組
t2val = {}


def twoValue(image, G):
    for y in xrange(0, image.size[1]):
        for x in xrange(0, image.size[0]):
            g = image.getpixel((x, y))
            if g > G:
                t2val[(x, y)] = 1
            else:
                t2val[(x, y)] = 0


# 根據一個點A的RGB值,與周圍的8個點的RBG值比較,設定一個值N(0 <N <8),當A的RGB值與周圍8個點的RGB相等數小於N時,此點爲噪點
# G: Integer 圖像二值化閥值
# N: Integer 降噪率 0 <N <8
# Z: Integer 降噪次數
# 輸出
#  0:降噪成功
#  1:降噪失敗
def clearNoise(image, N, Z):
    for i in xrange(0, Z):
        t2val[(0, 0)] = 1
        t2val[(image.size[0] - 1, image.size[1] - 1)] = 1

        for x in xrange(1, image.size[0] - 1):
            for y in xrange(1, image.size[1] - 1):
                nearDots = 0
                L = t2val[(x, y)]
                if L == t2val[(x - 1, y - 1)]:
                    nearDots += 1
                if L == t2val[(x - 1, y)]:
                    nearDots += 1
                if L == t2val[(x - 1, y + 1)]:
                    nearDots += 1
                if L == t2val[(x, y - 1)]:
                    nearDots += 1
                if L == t2val[(x, y + 1)]:
                    nearDots += 1
                if L == t2val[(x + 1, y - 1)]:
                    nearDots += 1
                if L == t2val[(x + 1, y)]:
                    nearDots += 1
                if L == t2val[(x + 1, y + 1)]:
                    nearDots += 1

                if nearDots < N:
                    t2val[(x, y)] = 1


def saveImage(filename, size):
    image = Image.new("1", size)
    draw = ImageDraw.Draw(image)

    for x in xrange(0, size[0]):
        for y in xrange(0, size[1]):
            draw.point((x, y), t2val[(x, y)])

    image.save(filename)
for i in range(1,11):
    path =  "5/" + str(i) + ".jpg"
    image = Image.open(path).convert("L")
    twoValue(image, 222)
    clearNoise(image, 3, 6)
    path1 = "5/" + str(i) + ".png"
    saveImage(path1, image.size)

 

 

 

識別代碼:

#coding:utf-8
from common.contest import *
from PIL import Image
import pytesseract

def recognize_captcha(img_path):
    im = Image.open(img_path)
    tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
    num = pytesseract.image_to_string(im,config=tessdata_dir_config)
    return num

if __name__ == '__main__':
    for i in range(1, 11):
        img_path = "5/" + str(i) + ".png"
        res = recognize_captcha(img_path)
        strs = res.split("\n")
        print strs[0].replace(" ",'')
相關文章
相關標籤/搜索