基於python+深度學習構建驗證碼識別服務系列文章第一章

時間 2019-11-05

標籤基於 python 深度學習構建驗證碼識別服務系列文章第一章欄目 Python 简体版

原文原文鏈接

注意：請勿用於商業用途

python環境：python3.5

第一節：準備工做

1.前言

項目基於python+CNN+Tensorflow,模型訓練中使用Tensorflow CPU版本，只要你的機器內存8G以上，就能夠按照文章描述的替換訓練樣本爲你本身的樣本、簡單修改模型幾個參數就能夠訓練出一個指望的模型。html

2.常見字符驗證碼形式

上述展現的驗證碼圖片不表明任何實際的網站，若有雷同，純屬巧合，該項目只能用於學習和交流用途，不得用於非法用途。常見驗證碼驗證規則爲輸入圖片中的字符，或者輸入彩色字符中指定顏色的字符。

3.訓練樣本得到

識別模型效果與訓練樣本的質量、數量有直接關係，就目前我本身多個驗證碼識別模型訓練經驗來講，大小寫+數字這種格式的驗證碼識別，訓練樣本個數只要超過一萬基本就能夠獲得一個80%準確率的模型；若是依據具體驗證碼的特色經過圖像處理作一些簡單處理以後在去訓練和識別，樣本量還能夠更少。上圖中的第五種驗證碼和第六種驗證碼在作必定處理以後2000張樣本就能夠獲得90%的識別準確率。python

3.1 人工打碼

人工打碼是最多見的一種辦法，目前網絡上有多家公司提供打碼服務，基於他們的服務就能夠批量的對咱們的樣本作標註。可是問題也比較明顯，首先打碼平臺須要付費須要必定成本、其次標註的數據有一部分錯誤數據，錯誤數據對模型最終的效果有必定的影響。錯誤標籤能夠基於必定的邏輯來規避，好比驗證標註的數據是否正確，這個你們本身腦補便可。git

3.2 模擬生成

分析驗證碼的特色，經過程序模擬生成相似的甚至徹底同樣的驗證碼；技術要求高、可是能夠得到無羣訓練樣本github

3.3 基於Python生成驗證碼

實際網站驗證碼（來源網絡）

代碼生成驗證碼

代碼生成驗證碼注意的點：

字體文件：要找到實際圖片所用的字體庫

干擾添加：分析圖片中的干擾，模擬生成干擾

生成所用的python代碼：

其餘類型的驗證碼能夠依據下面代碼作必定修改便可生成對應格式的驗證碼express

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2019/10/12 10:01
# @Author : shm
# @Site : 
# @File : create_yzm.py
# @Software: PyCharm
import random
from PIL import Image,ImageDraw,ImageFont
def getRandomColor():
    ''' 生成隨機顏色 :return: '''
    r = random.randint(0,255)
    g = random.randint(0,255)
    b = random.randint(0,255)
    return (r,g,b)

def getRandomChar():
    ''' 生成隨機字符 :return: '''
    charlist = "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"
    random_char = random.choice(charlist)
    return random_char

def genImg(width,height,font_size,chr_num):
    ''' 生成一張width*height圖片 :param width:圖片寬度 :param height:圖片高度 :param font_size: 字體大小 :param chr_num: 字符數 :return: '''
    #bg_color = getRandomColor()
    bg_color = (255,255,255)   #白色背景
    #建立一張隨機背景色圖片
    img = Image.new(mode="RGB",size=(width,height),color=bg_color)
    #獲取圖片畫筆、用於描繪字
    draw = ImageDraw.Draw(img)
    #修改字體
    font = ImageFont.truetype(font="Action Jackson",size=font_size)
    #font = ImageFont.truetype(font="華文彩雲", size=font_size)
    for i in range(chr_num):
        #隨機生成5種字符+5種顏色
        random_txt = getRandomChar()
        #txt_color = getRandomColor()
        txt_color = (0,0,255)   #藍色字體
        # while txt_color == bg_color:
        # txt_color = getRandomColor()
        draw.text((36+16*i,5),text=random_txt,fill=txt_color,font=font)
    #畫干擾線
    drawLine(draw,width,height)
    #畫噪點
    drawPoint(draw,width,height)
    return img
def drawLine(draw,width,height):
    ''' 隨機畫線 :param draw: :param width: :param height: :return: '''
    for i in range(10):
        x1 = random.randint(0, width)
        #x2 = random.randint(0,width-x1)
        x2 = x1+random.randint(0,25)
        y1 = random.randint(0, height)
        y2 = y1
        #y2 = random.randint(0, height)
        #draw.line((x1, y1, x2, y2), fill=getRandomColor())
        draw.line((x1, y1, x2, y2), fill=(0,0,255))
def drawPoint(draw,width,height):
    ''' 添加噪點 :param draw: :param width: :param height: :return: '''
    for i in range(5):
        x = random.randint(0, 40)
        y = random.randint(0, height)
        #draw.point((x, y), fill=getRandomColor())
        draw.point((x, y), fill=(0,0,255))
def drawOther(draw):
    ''' 添加自定義噪聲 :return: '''
    pass
def genyzm():
    ''' 生成驗證碼 :param path: :return: '''
    #圖片寬度
    width = 106
    #圖片高度
    height = 30
    #字體大小
    font_size = 20
    #字符個數
    chr_num = 4
    #驗證碼存儲位置
    path = "./yzm_pic/"
    for i in range(10):
        img = genImg(width,height,font_size,chr_num)
        dir  = path + str(i)+".png"
        with open(dir,"wb") as fp:
            img.save(fp,format="png")

if __name__=="__main__":
    try:
        genyzm()
    except Exception as e:
        print(e)

複製代碼

注意：真實的驗證碼可能會規避一些容易混淆的字符，好比一、0、O、z、2等容易被混淆的字符，因此在生成實際樣本時能夠不加入這些字符，這樣訓練模型時標籤類別能夠減小一些。

第二種漢字+數字+字母格式的驗證碼的生成代碼此處就不直接貼出來了，這個驗證碼目前有網站在使用，爲了避免影響網站正常使用，此處不開源具體python代碼，總體思路和上面代碼相似，只是背景不是單一顏色，字符中加入了中文而已。apache

第二節：模型訓練

2.1 驗證碼識別思路

如下面兩種驗證碼爲例進行說明 flask

首先分析驗證碼可否切割、切割成單個字符識別須要的訓練樣本比較少，並且識別率很容易作的很高。好比上面這兩種驗證碼，經過切分紅單個字符進行識別，咱們只須要模擬隨機生成2000個驗證碼，切分以後就是4*2000 = 8000訓練樣本，便可得到90%以上的準確率。

2.1.1 驗證碼特色分析

圖一中的驗證碼大小106X30且字符都集中在右邊，經過windows自帶的畫圖工具打開，以後，發現字符集中在36-100這一區段，因此首先對圖片截取36-106區間圖片，截取以後圖片大小64*30 結果以下圖： windows

截取圖片主要區域代碼：

def screen_shot(src,dstpath):
    ''' 圖片預處理，截取圖片主要區域 :param src:源圖片地址 :param dstpath:目標圖片地址 :return: '''
    try:
        img = Image.open(src)
        s = os.path.split(src)
        fn = s[1].split(".")
        basename = fn[0]
        ext = fn[-1]
        box = (36, 0, 100, 30)
        dstdir = dstpath + basename + "." + ext
        img.crop(box).save(dstdir)
    except Exception as e:
        print("screenshot:",e)
複製代碼

圖二驗證碼大小100X38字符均勻分佈，無需額外處理bash

2.1.2 圖片切割

對圖一處理以後的圖片作均勻切割，每一個驗證碼圖片分割成四個小圖片，分割以後的結果以下圖：網絡

直接對圖二作均勻切割，分割成四個單獨圖片結果以下：

經過觀察發現字符被完整切割成單個字符，圖一切分以後圖片大小16X30 圖二切分以後圖片大小25*38

切割圖片代碼：

def split_image(src,rownum,colnum,dstpath):
    ''' 切分圖片 :param src: :param rownum: :param colnum: :param dstpath: :return: '''
    try:
        img = Image.open(src)
        w,h = img.size
        if rownum <= h and colnum<=w:
            s = os.path.split(src)
            fn = s[1].split(".")
            basename = fn[0]
            ext = fn[-1]
            rowheight = h // rownum
            colwidth = w // colnum
            num = 0
            for r in range(rownum):
                for c in range(colnum):
                    name = str(basename[c:c+1])
                    t = str(int(time.time()*100000))
                    box = (c*colwidth,r*rowheight,(c+1)*colwidth,(r+1)*rowheight)
                    img.crop(box).save(dstpath+name+"/"+name+"#"+t+"."+ext)
                    num = num + 1
            print("圖片切割完畢，共生成%s張小圖片" % num)
        else:
            print("不合法的行列切割參數")
    except Exception as e:
        print("e:",e)
複製代碼

2.2 深度學習模型

模型基於alexnet,關於alexnet模型詳細介紹能夠查看相關的文章，此處貼出模型代碼：

此模型爲輸出單個字符識別模型：

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains a models definition for AlexNet. This work was first described in: ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton and later refined in: One weird trick for parallelizing convolutional neural networks Alex Krizhevsky, 2014 Here we provide the implementation proposed in "One weird trick" and not "ImageNet Classification", as per the paper, the LRN layers have been removed. Usage: with slim.arg_scope(alexnet.alexnet_v2_arg_scope()): outputs, end_points = alexnet.alexnet_v2(inputs) @@alexnet_v2 """
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def alexnet_v2_arg_scope(weight_decay=0.0005):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      biases_initializer=tf.constant_initializer(0.1),
                      weights_regularizer=slim.l2_regularizer(weight_decay)):
    with slim.arg_scope([slim.conv2d], padding='SAME'):
      with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
        return arg_sc
def alexnet_v2(inputs,
               num_classes=1000,
               is_training=True,
               dropout_keep_prob=0.5,
               spatial_squeeze=True,
               scope='alexnet_v2'):
  """AlexNet version 2. Described in: http://arxiv.org/pdf/1404.5997v2.pdf Parameters from: github.com/akrizhevsky/cuda-convnet2/blob/master/layers/ layers-imagenet-1gpu.cfg Note: All the fully_connected layers have been transformed to conv2d layers. To use in classification mode, resize input to 224x224. To use in fully convolutional mode, set spatial_squeeze to false. The LRN layers have been removed and change the initializers from random_normal_initializer to xavier_initializer. Args: inputs: a tensor of size [batch_size, height, width, channels]. num_classes: number of predicted classes. is_training: whether or not the models is being trained. dropout_keep_prob: the probability that activations are kept in the dropout layers during training. spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. Returns: the last op containing the log predictions and end_points dict. """
  with tf.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
    end_points_collection = sc.name + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                        outputs_collections=[end_points_collection]):
      net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
                        scope='conv1')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
      net = slim.conv2d(net, 192, [5, 5], scope='conv2')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
      net = slim.conv2d(net, 384, [3, 3], scope='conv3')
      net = slim.conv2d(net, 384, [3, 3], scope='conv4')
      net = slim.conv2d(net, 256, [3, 3], scope='conv5')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

      # Use conv2d instead of fully_connected layers.
      with slim.arg_scope([slim.conv2d],
                          weights_initializer=trunc_normal(0.005),
                          biases_initializer=tf.constant_initializer(0.1)):
        net = slim.conv2d(net, 4096, [5, 5], padding='VALID',
                          scope='fc6')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout6')
        net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout7')
        net0 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_0')

      # Convert end_points_collection into a end_point dict.
      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
      if spatial_squeeze:
        net0 = tf.squeeze(net0, [1, 2], name='fc8_0/squeezed')
        end_points[sc.name + '/fc8_0'] = net0
      return net0, end_points
alexnet_v2.default_image_size = 224
複製代碼

模型文件須要依據具體識別的驗證碼長度作一些調整：

輸出四字符驗證碼識別模型：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)


def alexnet_v2_arg_scope(weight_decay=0.0005):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      biases_initializer=tf.constant_initializer(0.1),
                      weights_regularizer=slim.l2_regularizer(weight_decay)):
    with slim.arg_scope([slim.conv2d], padding='SAME'):
      with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
        return arg_sc


def alexnet_v2(inputs,
               num_classes=1000,
               is_training=True,
               dropout_keep_prob=0.5,
               spatial_squeeze=True,
               scope='alexnet_v2'):
  """AlexNet version 2. Described in: http://arxiv.org/pdf/1404.5997v2.pdf Parameters from: github.com/akrizhevsky/cuda-convnet2/blob/master/layers/ layers-imagenet-1gpu.cfg Note: All the fully_connected layers have been transformed to conv2d layers. To use in classification mode, resize input to 224x224. To use in fully convolutional mode, set spatial_squeeze to false. The LRN layers have been removed and change the initializers from random_normal_initializer to xavier_initializer. Args: inputs: a tensor of size [batch_size, height, width, channels]. num_classes: number of predicted classes. is_training: whether or not the model is being trained. dropout_keep_prob: the probability that activations are kept in the dropout layers during training. spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. Returns: the last op containing the log predictions and end_points dict. """
  with tf.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
    end_points_collection = sc.name + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                        outputs_collections=[end_points_collection]):
      net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
                        scope='conv1')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
      net = slim.conv2d(net, 192, [5, 5], scope='conv2')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
      net = slim.conv2d(net, 384, [3, 3], scope='conv3')
      net = slim.conv2d(net, 384, [3, 3], scope='conv4')
      net = slim.conv2d(net, 256, [3, 3], scope='conv5')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

      # Use conv2d instead of fully_connected layers.
      with slim.arg_scope([slim.conv2d],
                          weights_initializer=trunc_normal(0.005),
                          biases_initializer=tf.constant_initializer(0.1)):
        net = slim.conv2d(net, 4096, [5, 5], padding='VALID',
                          scope='fc6')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout6')
        net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout7')
        net0 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_0')
        net1 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_1')
        net2 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_2')
        net3 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_3')

      # Convert end_points_collection into a end_point dict.
      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
      if spatial_squeeze:
        net0 = tf.squeeze(net0, [1, 2], name='fc8_0/squeezed')
        end_points[sc.name + '/fc8_0'] = net0
        net1 = tf.squeeze(net1, [1, 2], name='fc8_1/squeezed')
        end_points[sc.name + '/fc8_1'] = net1
        net2 = tf.squeeze(net2, [1, 2], name='fc8_2/squeezed')
        end_points[sc.name + '/fc8_2'] = net2
        net3 = tf.squeeze(net3, [1, 2], name='fc8_3/squeezed')
        end_points[sc.name + '/fc8_3'] = net3
      return net0,net1,net2,net3,end_points
alexnet_v2.default_image_size = 224
複製代碼

不同的地方：

五個、六個字符驗證碼就是依照上圖中的紅色部分擴展便可

2.3 TFrecord格式訓練數據生成

關於TFrecord格式文件具體介紹請查看Tensorflow相關文檔

TFrecord訓練數據生成代碼：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import os
import random
import math
import sys
from PIL import Image
import numpy as np

_NUM_TEST = 500
_RANDOM_SEED = 0
MAX_CAPTCHA = 1
#切分以後的單個字符圖片存儲位置，文件名稱以圖片實際對應字符命名
DATASET_DIR = "./split_img/yzm"
#訓練數據存放位置
TFRECORD_DIR = './TFrecord/'
def _dataset_exists(dataset_dir):
    for split_name in ['train', 'test']:
        output_filename = os.path.join(dataset_dir, split_name + '.tfrecords')
        if not tf.gfile.Exists(output_filename):
            return False
    return True

def _get_filenames_and_classes(dataset_dir):
    photo_filenames = []
    for filename in os.listdir(dataset_dir):
        path = os.path.join(dataset_dir, filename)
        photo_filenames.append(path)
    return photo_filenames

def int64_feature(values):
    if not isinstance(values, (tuple, list)):
        values = [values]
    return tf.train.Feature(int64_list=tf.train.Int64List(value=values))

def bytes_feature(values):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values]))

def image_to_tfexample(image_data, label0):
    # Abstract base class for protocol messages.
    return tf.train.Example(features=tf.train.Features(feature={
        'image': bytes_feature(image_data),
        'label0': int64_feature(label0)
    }))


def char2pos(c):
    if c == '_':
        k = 62
        return k
    k = ord(c) - 48
    if k > 9:
        k = ord(c) - 55
        if k > 35:
            k = ord(c) - 61
            if k > 61:
                raise ValueError('No Map')
    return k

def char2pos1(c):
    if c == '_':
        k = 36
        return k
    k = ord(c) - 48
    if k > 9:
        k = ord(c) - 55
        if k > 35:
            k = ord(c) - (61 + 26)
            if k > 36:
                raise ValueError('No Map')
    return k
def _convert_dataset(split_name, filenames, dataset_dir):
    assert split_name in ['train', 'test']

    with tf.Session() as sess:
        output_filename = os.path.join(TFRECORD_DIR, split_name + '.tfrecords')
        with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
            for i, filename in enumerate(filenames):
                try:
                    sys.stdout.write('\r>> Converting image %d/%d' % (i + 1, len(filenames)))
                    sys.stdout.flush()
                    image_data = Image.open(filename)
                    image_data = image_data.resize((224, 224))
                    image_data = np.array(image_data.convert('L'))
                    image_data = image_data.tobytes()

                    labels = filename.split('\\')[-1][0:1]
                    print(labels)
                    num_labels = []
                    num_labels.append(int(char2pos1(labels)))
                    example = image_to_tfexample(image_data, num_labels[0])
                    tfrecord_writer.write(example.SerializeToString())
                    # for j in range(4): //四字符用
                    # num_labels.append(int(char2pos1(labels[j])))
                    # example = image_to_tfexample(image_data, num_labels[0], num_labels[1], num_labels[2], num_labels[3])
                    # tfrecord_writer.write(example.SerializeToString())

                except IOError as e:
                    print('Could not read:', filename)
                    print('Error:', e)
                    print('Skip it\n')
    sys.stdout.write('\n')
    sys.stdout.flush()

if _dataset_exists(TFRECORD_DIR):
    print('tfcecord file exists')
else:
    photo_filenames = _get_filenames_and_classes(DATASET_DIR)
    random.seed(_RANDOM_SEED)
    random.shuffle(photo_filenames)
    training_filenames = photo_filenames[_NUM_TEST:]
    testing_filenames = photo_filenames[:_NUM_TEST]
    _convert_dataset('train', training_filenames, DATASET_DIR)
    _convert_dataset('test', testing_filenames, DATASET_DIR)
    print('完成')
複製代碼

2.4 模型訓練：

以圖一爲例的訓練代碼以下：

代碼：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2019/4/30 10:59
# @Author : shm
# @Site : 
# @File : MyTensorflowTrain.py
# @Software: PyCharm

import os
import tensorflow as tf
from PIL import Image
from nets import nets_factory
import numpy as np

# 不一樣字符數量
CHAR_SET_LEN = 36
#t圖片高度
IMAGE_HEIGHT = 30
# 圖片寬度
IMAGE_WIDTH = 16
# 批次
BATCH_SIZE = 100
# tfrecord文件存放路徑
TFRECORD_FILE = "./TFrecord/train.tfrecords"
# placeholder
x = tf.placeholder(tf.float32, [None, 224, 224])
y0 = tf.placeholder(tf.float32, [None])
# 學習率
lr = tf.Variable(0.003, dtype=tf.float32)
# 從tfrecord讀出數據
def read_and_decode(filename):
    # 根據文件名生成一個隊列
    filename_queue = tf.train.string_input_producer([filename])
    reader = tf.TFRecordReader()
    # 返回文件名和文件
    _, serialized_example = reader.read(filename_queue)
    features = tf.parse_single_example(serialized_example,
                                       features={
                                           'image': tf.FixedLenFeature([], tf.string),
                                           'label0': tf.FixedLenFeature([], tf.int64)
                                       })
    # 獲取圖片數據
    image = tf.decode_raw(features['image'], tf.uint8)
    # tf.train.shuffle_batch必須肯定shape
    image = tf.reshape(image, [224, 224])
    # 圖片預處理
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    # 獲取label
    label0 = tf.cast(features['label0'], tf.int32)
    return image, label0

# 獲取圖片數據和標籤
image, label0 = read_and_decode(TFRECORD_FILE)
# 使用shuffle_batch能夠隨機打亂
image_batch, label_batch0 = tf.train.shuffle_batch(
    [image, label0], batch_size=BATCH_SIZE,
    capacity=50000, min_after_dequeue=10000, num_threads=1)

# 定義網絡結構
train_network_fn = nets_factory.get_network_fn(
    'alexnet_v2',
    num_classes=CHAR_SET_LEN,
    weight_decay=0.0005,
    is_training=True)

with tf.Session() as sess:
    # inputs: a tensor of size [batch_size, height, width, channels]
    X = tf.reshape(x, [BATCH_SIZE, 224, 224, 1])
    # 數據輸入網絡獲得輸出值
    logits0,end_points = train_network_fn(X)

    # 把標籤轉成one_hot的形式
    one_hot_labels0 = tf.one_hot(indices=tf.cast(y0, tf.int32), depth=CHAR_SET_LEN)

    # 計算loss
    loss0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits0, labels=one_hot_labels0))
    # 計算總的loss
    total_loss = (loss0)
    # 優化total_loss
    optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss)

    # 計算準確率
    correct_prediction0 = tf.equal(tf.argmax(one_hot_labels0, 1), tf.argmax(logits0, 1))
    accuracy0 = tf.reduce_mean(tf.cast(correct_prediction0, tf.float32))

    # 用於保存模型
    saver = tf.train.Saver()
    # 初始化
    sess.run(tf.global_variables_initializer())

    # 建立一個協調器，管理線程
    coord = tf.train.Coordinator()
    # 啓動QueueRunner, 此時文件名隊列已經進隊
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    for i in range(60001):
        # 獲取一個批次的數據和標籤
        b_image, b_label0 = sess.run([image_batch, label_batch0])
        # 優化模型
        sess.run(optimizer, feed_dict={x: b_image, y0: b_label0})
        # 每迭代20次計算一次loss和準確率
        if i % 20 == 0:
            # 每迭代2000次下降一次學習率
            if i % 2000 == 0:
                sess.run(tf.assign(lr, lr / 3))
            acc0, loss_ = sess.run([accuracy0, total_loss],feed_dict={x: b_image,y0: b_label0})
            learning_rate = sess.run(lr)
            print("Iter:%d Loss:%.3f Accuracy:%.2f Learning_rate:%.4f" % (i, loss_, acc0, learning_rate))
            # 保存模型
            if acc0 > 0.99:
                saver.save(sess, "./models/crack_captcha_model", global_step=i)
                break
            if i == 60000:
                saver.save(sess, "./models/crack_captcha_model", global_step=i)
                break

    # 通知其餘線程關閉
    coord.request_stop()
    # 其餘全部線程關閉以後，這一函數才能返回
    coord.join(threads)
複製代碼

2.5 模型識別率測試代碼：

#coding=utf-8
import os
import tensorflow as tf 
from PIL import Image
from nets import nets_factory
import numpy as np
import matplotlib.pyplot as plt  

CHAR_SET_LEN = 36
IMAGE_HEIGHT = 30
IMAGE_WIDTH =16

BATCH_SIZE = 1
TFRECORD_FILE = "./TFrecord/test.tfrecords"
# placeholder
x = tf.placeholder(tf.float32, [None, 224, 224])  

def read_and_decode(filename):

    filename_queue = tf.train.string_input_producer([filename])
    reader = tf.TFRecordReader()

    _, serialized_example = reader.read(filename_queue)   
    features = tf.parse_single_example(serialized_example,
                                       features={
                                           'image' : tf.FixedLenFeature([], tf.string),
                                           'label0': tf.FixedLenFeature([], tf.int64),
                                       })

    image = tf.decode_raw(features['image'], tf.uint8)

    image_raw = tf.reshape(image, [224, 224])
    #
    image = tf.reshape(image, [224, 224])
    #
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    #
    label0 = tf.cast(features['label0'], tf.int32)

    return image, image_raw, label0

image, image_raw, label0 = read_and_decode(TFRECORD_FILE)

#
image_batch, image_raw_batch, label_batch0 = tf.train.shuffle_batch([image, image_raw, label0], batch_size = BATCH_SIZE,capacity = 50000, min_after_dequeue=10000, num_threads=1)

train_network_fn = nets_factory.get_network_fn('alexnet_v2',num_classes=CHAR_SET_LEN,weight_decay=0.0005, is_training=False)

with tf.Session() as sess:
    # inputs: a tensor of size [batch_size, height, width, channels]
    X = tf.reshape(x, [BATCH_SIZE, 224, 224, 1])
    #
    logits0,end_points = train_network_fn(X)
    
    #
    predict0 = tf.reshape(logits0, [-1, CHAR_SET_LEN])  
    predict0 = tf.argmax(predict0, 1)
    #
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    #
    saver = tf.train.Saver()
    saver.restore(sess, './models/crack_captcha_model-1080')
    #
    coord = tf.train.Coordinator()
    #
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    count = 0
    for i in range(500):
        #
        try:
            b_image, b_image_raw, b_label0 = sess.run([image_batch,image_raw_batch, label_batch0])
        except Exception as e:
            print(e)
        #
        img=Image.fromarray(b_image_raw[0],'L')
        print('label:',b_label0)
        #得到預測值
        label0 = sess.run(predict0, feed_dict={x: b_image})
        print('predict:',label0)
        if b_label0[0] == label0[0]:
            count = count + 1
    print(count)
    #
    coord.request_stop()
    #
    coord.join(threads)
複製代碼

第三節：API接口開發

代碼中直接加載了兩個模型文件，統一一個接口經過module參數傳不一樣的值，能夠調用不一樣的模型來識別不一樣驗證碼

3.1基於Flask實現驗證碼識別服務：

API服務代碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2019/10/14 10:25
# @Author : shm
# @Site : 
# @File : YZM_Service.py
# @Software: PyCharm

from flask import Flask, request, render_template
import tensorflow as tf
from PIL import Image
from nets import nets_factory
import numpy as np
import base64
from io import BytesIO

def num2char(num):
    ''' 數字轉字符 :param num: :return: '''
    if num < 10:
        return (num + ord('0'))
    elif num < 36:
        return (num - 10 + ord('a'))
    elif num == 36:
        return (ord('_'))
    else:
        raise ValueError('Error')

def splitimage(img, rownum, colnum):
    ''' 圖片切割 :param img: :param rownum: :param colnum: :return: '''
    w, h = img.size
    if rownum <= h and colnum <= w:
        rowheight = h // rownum
        colwidth = w // colnum
        r = 0
        imlist = []
        for c in range(colnum):
            box = (c * colwidth, r * rowheight, (c + 1) * colwidth, (r + 1) * rowheight)
            imlist.append(img.crop(box))
        return imlist
def ImageReshap(img):
    ''' 預處理224*224 :param img: :return: '''
    image_data = img.resize((224, 224))
    image_data = np.array(image_data.convert('L'))
    return image_data

class LoadModel_v1:
    def __init__(self,model_path,char_set_len=36):
        ''' :param model_path: 模型文件路徑 :param char_set_len: '''
        self.char_set_len = char_set_len
        g = tf.Graph()
        with g.as_default():
            self.sess = tf.Session(graph=g)
            self.graph = self.build_graph()
            BATCH_SIZE = 1
            self.x = tf.placeholder(tf.float32, [None, 224, 224])
            self.img = tf.placeholder(tf.float32, None)
            image_data1 = tf.cast(self.img, tf.float32) / 255.0
            image_data2 = tf.subtract(image_data1, 0.5)
            image_data3 = tf.multiply(image_data2, 2.0)
            self.image_batch = tf.reshape(image_data3, [1, 224, 224])
            X = tf.reshape(self.x, [BATCH_SIZE, 224, 224, 1])
            self.logits0, self.end_points = self.graph(X)
            self.sess.run(tf.global_variables_initializer())
            saver = tf.train.Saver()
        saver.restore(self.sess,model_path)

    def build_graph(self):
        ''' 加載模型文件 :return: '''
        train_network_fn = nets_factory.get_network_fn('alexnet_v2',num_classes=self.char_set_len,weight_decay=0.0005,is_training=False)
        return train_network_fn
    def recognize(self,image):
        ''' 圖片識別 :param image: :return: '''
        try:
            inputdata = self.sess.run(self.image_batch, feed_dict={self.img: image})
            predict0 = tf.reshape(self.logits0, [-1, self.char_set_len])
            predict0 = tf.argmax(predict0, 1)
            label = self.sess.run(predict0, feed_dict={self.x: inputdata})
            text = chr(num2char(label))
            return text
        except Exception as e:
            print("recognize",e)
            return ""

    def screen_shot(self,img):
        ''' 圖片預處理，截取圖片主要區域 :param img: :return: '''
        try:
            box = (36, 0, 100, 30)
            return img.crop(box)
        except Exception as e:
            print("screenshot:", e)
            return None
    def img_to_text(self,imgdata):
        ''' 圖片轉字符 :return:識別以後的字符結果 '''
        yzmstr = ""
        with BytesIO() as iofile:
            iofile.write(imgdata)
            with Image.open(iofile) as img:
                img = self.screen_shot(img)
                imglist = splitimage(img, 1, 4)
            text = []
            for im in imglist:
                imgreshap = ImageReshap(im)
                yzmstr = self.recognize(imgreshap)
                text.append(yzmstr)
            yzmstr = "".join(text)
        return yzmstr
class LoadModel_v2(LoadModel_v1):
    def __init__(self,model_path):
        super(LoadModel_v2, self).__init__(model_path)
    def img_to_text(self,imgdata):
        yzmstr = ""
        with BytesIO() as iofile:
            iofile.write(imgdata)
            with Image.open(iofile) as img:
                imglist = splitimage(img, 1, 4)
            text = []
            for im in imglist:
                imgreshap = ImageReshap(im)
                yzmstr = self.recognize(imgreshap)
                text.append(yzmstr)
            print(yzmstr)
            yzmstr = "".join(text)
        return yzmstr

app = Flask(__name__)
@app.route('/')
def index():
    return render_template('index.html')
@app.route('/Recognition',methods=['POST'])
def recognition():
    try:
        imgdata = request.form.get('imgdata')
        module = request.form.get("module","")
        if module == "v1":
            decodeData = base64.b64decode(imgdata)
            yzmstr = loadModel_model1.img_to_text(decodeData)
            return yzmstr
        elif module == "v2":
            decodeData = base64.b64decode(imgdata)
            yzmstr = loadModel_model2.img_to_text(decodeData)
            return yzmstr
        else:
            return "unkonw channel"
    except Exception as e:
        return repr(e)
if __name__ == "__main__":
    #初始化模型1
    loadModel_model1 = LoadModel_v1("./models/crack_captcha_model-1080")
    #初始化模型2
    loadModel_model2 = LoadModel_v2("./models/crack_captcha.model-2140")
    app.run(host='0.0.0.0', port=2002, debug=True)
複製代碼

3.2 接口調用代碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2019/5/6 18:46
# @Author : shm
# Site : 
# @File : test.py
# @Software: PyCharm
import base64
import requests
import os

#識別API接口
url = "http://127.0.0.1:2002/Recognition"

#測試用驗證碼存放路徑
path = "./image/pic"

#對應不一樣模型版本號
model = "v1"   
#model = "v2"

imglist = os.listdir(path)
count = 0
nums = len(imglist)
for file in imglist:
    try:
        dir  = path + "\\" + file
        with open(dir,"rb") as fp:
            database64 = base64.b64encode(fp.read())
        form = {
            'module':model,
            'imgdata': database64
        }
        r = requests.post(url, data=form)
        res = r.text
        yuan = file[0:4]
        if yuan.lower() == res:
            count = count + 1
            print("Success")
        else:
            print(file[0:4],"==",res)
    except Exception as e:
        print(e)
print("%s平臺-----總共：%s-----正確識別：%s" % (model,nums,count))
複製代碼

總結

此處主要介紹瞭如何模擬生成驗證碼訓練樣本數據，以及如何切分驗證碼作識別。後續文章會在此文章基礎上，實現圖片不切分總體識別模型訓練、以及不定長驗證碼識別技術方案實現、基於深度學習驗證碼通用識別模型解決方案項目相關的代碼後期都會同步到git上，後期會把地址添加進來，今天先寫到這兒文章中有不足地方和任何疑問，歡迎加QQ:1071830794進行交流探討，歡迎你們一塊兒學習和成長。