python 驗證碼識別示例（五）簡單驗證碼識別

時間 2019-11-06

標籤 python 驗證碼識別示例簡單欄目 Python 简体版

原文原文鏈接

今天介紹一個簡單驗證的識別。html

主要是標準的格式，沒有扭曲和變現。就用 pytesseract 去識別一下。web

驗證碼地址：http://wscx.gjxfj.gov.cn/zfp/webroot/xfsxcx.html數組

須要識別的驗證碼是：spa

由於這個驗證碼有干擾點，因此直接識別的效果很是很差。code

首先對驗證碼進行二值化和降噪。htm

效果以下：blog

識別結果：utf-8

識別率只有百分之四十，針對這麼低的識別率，能夠去切割分類，目前這個驗證碼很容易去切割。提升驗證碼的識別率問題。get

二值化代碼：string

# coding:utf-8
import sys, os
from PIL import Image, ImageDraw

# 二值數組
t2val = {}


def twoValue(image, G):
    for y in xrange(0, image.size[1]):
        for x in xrange(0, image.size[0]):
            g = image.getpixel((x, y))
            if g > G:
                t2val[(x, y)] = 1
            else:
                t2val[(x, y)] = 0


# 根據一個點A的RGB值，與周圍的8個點的RBG值比較，設定一個值N（0 <N <8），當A的RGB值與周圍8個點的RGB相等數小於N時，此點爲噪點
# G: Integer 圖像二值化閥值
# N: Integer 降噪率 0 <N <8
# Z: Integer 降噪次數
# 輸出
#  0：降噪成功
#  1：降噪失敗
def clearNoise(image, N, Z):
    for i in xrange(0, Z):
        t2val[(0, 0)] = 1
        t2val[(image.size[0] - 1, image.size[1] - 1)] = 1

        for x in xrange(1, image.size[0] - 1):
            for y in xrange(1, image.size[1] - 1):
                nearDots = 0
                L = t2val[(x, y)]
                if L == t2val[(x - 1, y - 1)]:
                    nearDots += 1
                if L == t2val[(x - 1, y)]:
                    nearDots += 1
                if L == t2val[(x - 1, y + 1)]:
                    nearDots += 1
                if L == t2val[(x, y - 1)]:
                    nearDots += 1
                if L == t2val[(x, y + 1)]:
                    nearDots += 1
                if L == t2val[(x + 1, y - 1)]:
                    nearDots += 1
                if L == t2val[(x + 1, y)]:
                    nearDots += 1
                if L == t2val[(x + 1, y + 1)]:
                    nearDots += 1

                if nearDots < N:
                    t2val[(x, y)] = 1


def saveImage(filename, size):
    image = Image.new("1", size)
    draw = ImageDraw.Draw(image)

    for x in xrange(0, size[0]):
        for y in xrange(0, size[1]):
            draw.point((x, y), t2val[(x, y)])

    image.save(filename)
for i in range(1,11):
    path =  "5/" + str(i) + ".jpg"
    image = Image.open(path).convert("L")
    twoValue(image, 222)
    clearNoise(image, 3, 6)
    path1 = "5/" + str(i) + ".png"
    saveImage(path1, image.size)

識別代碼：

#coding:utf-8
from common.contest import *
from PIL import Image
import pytesseract

def recognize_captcha(img_path):
    im = Image.open(img_path)
    tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
    num = pytesseract.image_to_string(im,config=tessdata_dir_config)
    return num

if __name__ == '__main__':
    for i in range(1, 11):
        img_path = "5/" + str(i) + ".png"
        res = recognize_captcha(img_path)
        strs = res.split("\n")
        print strs[0].replace(" ",'')