TensorFlow驗證碼識別

時間 2021-01-24

標籤 python git github api 數組網絡 app dom ide 函數欄目 Python 简体版

原文原文鏈接

本節咱們來用 TensorFlow 來實現一個深度學習模型，用來實現驗證碼識別的過程，這裏咱們識別的驗證碼是圖形驗證碼，首先咱們會用標註好的數據來訓練一個模型，而後再用模型來實現這個驗證碼的識別。python

驗證碼

首先咱們來看下驗證碼是怎樣的，這裏咱們使用 Python 的 captcha 庫來生成便可，這個庫默認是沒有安裝的，因此這裏咱們須要先安裝這個庫，另外咱們還須要安裝 pillow 庫，使用 pip3 便可：git

pip3 install captcha pillow

安裝好以後，咱們就能夠用以下代碼來生成一個簡單的圖形驗證碼了：github

from captcha.image import ImageCaptchafrom PIL import Imagetext = '1234'image = ImageCaptcha()captcha = image.generate(text)captcha_image = Image.open(captcha)captcha_image.show()

運行以後便會彈出一張圖片，結果以下：api

能夠看到圖中的文字正是咱們所定義的 text 內容，這樣咱們就能夠獲得一張圖片和其對應的真實文本，這樣咱們就能夠用它來生成一批訓練數據和測試數據了。數組

預處理

在訓練以前確定是要進行數據預處理了，如今咱們首先定義好了要生成的驗證碼文本內容，這就至關於已經有了 label 了，而後咱們再用它來生成驗證碼，就能夠獲得輸入數據 x 了，在這裏咱們首先定義好咱們的輸入詞表，因爲大小寫字母加數字的詞表比較龐大，設想咱們用含有大小寫字母和數字的驗證碼，一個驗證碼四個字符，那麼一共可能的組合是 (26 + 26 + 10) ^ 4 = 14776336 種組合，這個數量訓練起來有點大，因此這裏咱們精簡一下，只使用純數字的驗證碼來訓練，這樣其組合個數就變爲 10 ^ 4 = 10000 種，顯然少了不少。網絡

因此在這裏咱們先定義一個詞表和其長度變量：app

VOCAB = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']CAPTCHA_LENGTH = 4VOCAB_LENGTH = len(VOCAB)

這裏 VOCAB 就是詞表的內容，即 0 到 9 這 10 個數字，驗證碼的字符個數即 CAPTCHA_LENGTH 是 4，詞表長度是 VOCAB 的長度，即 10。dom

接下來咱們定義一個生成驗證碼數據的方法，流程相似上文，只不過這裏咱們將返回的數據轉爲了 Numpy 形式的數組：ide

from PIL import Imagefrom captcha.image import ImageCaptchaimport numpy as npdef generate_captcha(captcha_text):    """    get captcha text and np array    :param captcha_text: source text    :return: captcha image and array    """    image = ImageCaptcha()    captcha = image.generate(captcha_text)    captcha_image = Image.open(captcha)    captcha_array = np.array(captcha_image)    return captcha_array

這樣調用此方法，咱們就能夠獲得一個 Numpy 數組了，這個實際上是把驗證碼轉化成了每一個像素的 RGB，咱們調用一下這個方法試試：函數

captcha = generate_captcha('1234')print(captcha, captcha.shape)

內容以下：

[[[239 244 244]  [239 244 244]  [239 244 244]  ...,   ...,   [239 244 244]  [239 244 244]  [239 244 244]]] (60, 160, 3)

能夠看到它的 shape 是 (60, 160, 3)，這其實表明驗證碼圖片的高度是 60，寬度是 160，是 60 x 160 像素的驗證碼，每一個像素都有 RGB 值，因此最後一維即爲像素的 RGB 值。

接下來咱們須要定義 label，因爲咱們須要使用深度學習模型進行訓練，因此這裏咱們的 label 數據最好使用 One-Hot 編碼，即若是驗證碼文本是 1234，那麼應該詞表索引位置置 1，總共的長度是 40，咱們用程序實現一下 One-Hot 編碼和文本的互相轉換：

def text2vec(text):    """    text to one-hot vector    :param text: source text    :return: np array    """    if len(text) > CAPTCHA_LENGTH:        return False    vector = np.zeros(CAPTCHA_LENGTH * VOCAB_LENGTH)    for i, c in enumerate(text):        index = i * VOCAB_LENGTH + VOCAB.index(c)        vector[index] = 1    return vectordef vec2text(vector):    """    vector to captcha text    :param vector: np array    :return: text    """    if not isinstance(vector, np.ndarray):        vector = np.asarray(vector)    vector = np.reshape(vector, [CAPTCHA_LENGTH, -1])    text = ''    for item in vector:        text += VOCAB[np.argmax(item)]    return text

這裏 text2vec() 方法就是將真實文本轉化爲 One-Hot 編碼，vec2text() 方法就是將 One-Hot 編碼轉回真實文本。

例如這裏調用一下這兩個方法，咱們將 1234 文本轉換爲 One-Hot 編碼，而後在將其轉回來：

vector = text2vec('1234')text = vec2text(vector)print(vector, text)

運行結果以下：

[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]1234

這樣咱們就能夠實現文本到 One-Hot 編碼的互轉了。

接下來咱們就能夠構造一批數據了，x 數據就是驗證碼的 Numpy 數組，y 數據就是驗證碼的文本的 One-Hot 編碼，生成內容以下：

import randomfrom os.path import join, existsimport pickleimport numpy as npfrom os import makedirsDATA_LENGTH = 10000DATA_PATH = 'data'def get_random_text():    text = ''    for i in range(CAPTCHA_LENGTH):        text += random.choice(VOCAB)    return textdef generate_data():    print('Generating Data...')    data_x, data_y = [], []    # generate data x and y    for i in range(DATA_LENGTH):        text = get_random_text()        # get captcha array        captcha_array = generate_captcha(text)        # get vector        vector = text2vec(text)        data_x.append(captcha_array)        data_y.append(vector)    # write data to pickle    if not exists(DATA_PATH):        makedirs(DATA_PATH)    x = np.asarray(data_x, np.float32)    y = np.asarray(data_y, np.float32)    with open(join(DATA_PATH, 'data.pkl'), 'wb') as f:        pickle.dump(x, f)        pickle.dump(y, f)

這裏咱們定義了一個 getrandomtext() 方法，能夠隨機生成驗證碼文本，而後接下來再利用這個隨機生成的文原本產生對應的 x、y 數據，而後咱們再將數據寫入到 pickle 文件裏，這樣就完成了預處理的操做。

構建模型

有了數據以後，咱們就開始構建模型吧，這裏咱們仍是利用 traintestsplit() 方法將數據分爲三部分，訓練集、開發集、驗證集：

with open('data.pkl', 'rb') as f:    data_x = pickle.load(f)    data_y = pickle.load(f)    return standardize(data_x), data_ytrain_x, test_x, train_y, test_y = train_test_split(data_x, data_y, test_size=0.4, random_state=40)dev_x, test_x, dev_y, test_y, = train_test_split(test_x, test_y, test_size=0.5, random_state=40)

接下來咱們使用者三個數據集構建三個 Dataset 對象：

# train and dev datasettrain_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y)).shuffle(10000)train_dataset = train_dataset.batch(FLAGS.train_batch_size)dev_dataset = tf.data.Dataset.from_tensor_slices((dev_x, dev_y))dev_dataset = dev_dataset.batch(FLAGS.dev_batch_size)test_dataset = tf.data.Dataset.from_tensor_slices((test_x, test_y))test_dataset = test_dataset.batch(FLAGS.test_batch_size)

而後初始化一個迭代器，並綁定到這個數據集上：

# a reinitializable iteratoriterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)train_initializer = iterator.make_initializer(train_dataset)dev_initializer = iterator.make_initializer(dev_dataset)test_initializer = iterator.make_initializer(test_dataset)

接下來就是關鍵的部分了，在這裏咱們使用三層卷積和兩層全鏈接網絡進行構造，在這裏爲了簡化寫法，直接使用 TensorFlow 的 layers 模塊：

# input Layerwith tf.variable_scope('inputs'):    # x.shape = [-1, 60, 160, 3]    x, y_label = iterator.get_next()keep_prob = tf.placeholder(tf.float32, [])y = tf.cast(x, tf.float32)# 3 CNN layersfor _ in range(3):    y = tf.layers.conv2d(y, filters=32, kernel_size=3, padding='same', activation=tf.nn.relu)    y = tf.layers.max_pooling2d(y, pool_size=2, strides=2, padding='same')    # y = tf.layers.dropout(y, rate=keep_prob)# 2 dense layersy = tf.layers.flatten(y)y = tf.layers.dense(y, 1024, activation=tf.nn.relu)y = tf.layers.dropout(y, rate=keep_prob)y = tf.layers.dense(y, VOCAB_LENGTH)

這裏卷積核大小爲 3，padding 使用 SAME 模式，激活函數使用 relu。

通過全鏈接網絡變換以後，y 的 shape 就變成了 [batchsize, nclasses]，咱們的 label 是 CAPTCHALENGTH 個 One-Hot 向量拼合而成的，在這裏咱們想使用交叉熵來計算，可是交叉熵計算的時候，label 參數向量最後一維各個元素之和必須爲 1，否則計算梯度的時候會出現問題。詳情參見 TensorFlow 的官方文檔：https://www.tensorflow.org/apidocs/python/tf/nn/softmaxcrossentropywithlogits：

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

可是如今的 label 參數是 CAPTCHALENGTH 個 One-Hot 向量拼合而成，因此這裏各個元素之和爲 CAPTCHALENGTH，因此咱們須要從新 reshape 一下，確保最後一維各個元素之和爲 1：

y_reshape = tf.reshape(y, [-1, VOCAB_LENGTH])y_label_reshape = tf.reshape(y_label, [-1, VOCAB_LENGTH])

這樣咱們就能夠確保最後一維是 VOCAB_LENGTH 長度，而它就是一個 One-Hot 向量，因此各元素之和一定爲 1。

而後 Loss 和 Accuracy 就好計算了：

# losscross_entropy = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=y_reshape, labels=y_label_reshape))# accuracymax_index_predict = tf.argmax(y_reshape, axis=-1)max_index_label = tf.argmax(y_label_reshape, axis=-1)correct_predict = tf.equal(max_index_predict, max_index_label)accuracy = tf.reduce_mean(tf.cast(correct_predict, tf.float32))

再接下來執行訓練便可：

# traintrain_op = tf.train.RMSPropOptimizer(FLAGS.learning_rate).minimize(cross_entropy, global_step=global_step)for epoch in range(FLAGS.epoch_num):    tf.train.global_step(sess, global_step_tensor=global_step)    # train    sess.run(train_initializer)    for step in range(int(train_steps)):        loss, acc, gstep, _ = sess.run([cross_entropy, accuracy, global_step, train_op],                                       feed_dict={keep_prob: FLAGS.keep_prob})        # print log        if step % FLAGS.steps_per_print == 0:            print('Global Step', gstep, 'Step', step, 'Train Loss', loss, 'Accuracy', acc)    if epoch % FLAGS.epochs_per_dev == 0:        # dev        sess.run(dev_initializer)        for step in range(int(dev_steps)):            if step % FLAGS.steps_per_print == 0:                print('Dev Accuracy', sess.run(accuracy, feed_dict={keep_prob: 1}), 'Step', step)

在這裏咱們首先初始化 traininitializer，將 iterator 綁定到 Train Dataset 上，而後執行 trainop，得到 loss、acc、gstep 等結果並輸出。

訓練

運行訓練過程，結果相似以下：

...Dev Accuracy 0.9580078 Step 0Dev Accuracy 0.9472656 Step 2Dev Accuracy 0.9501953 Step 4Dev Accuracy 0.9658203 Step 6Global Step 3243 Step 0 Train Loss 1.1920928e-06 Accuracy 1.0Global Step 3245 Step 2 Train Loss 1.5497207e-06 Accuracy 1.0Global Step 3247 Step 4 Train Loss 1.1920928e-06 Accuracy 1.0Global Step 3249 Step 6 Train Loss 1.7881392e-06 Accuracy 1.0...

驗證集準確率 95% 以上。

測試

訓練過程咱們還能夠每隔幾個 Epoch 保存一下模型：

# save modelif epoch % FLAGS.epochs_per_save == 0:    saver.save(sess, FLAGS.checkpoint_dir, global_step=gstep)

固然也能夠取驗證集上準確率最高的模型進行保存。

驗證時咱們能夠從新 Reload 一下模型，而後進行驗證：

# load modelckpt = tf.train.get_checkpoint_state('ckpt')if ckpt:    saver.restore(sess, ckpt.model_checkpoint_path)    print('Restore from', ckpt.model_checkpoint_path)    sess.run(test_initializer)    for step in range(int(test_steps)):        if step % FLAGS.steps_per_print == 0:            print('Test Accuracy', sess.run(accuracy, feed_dict={keep_prob: 1}), 'Step', step)else:    print('No Model Found')

驗證以後其準確率基本是差很少的。

若是要進行新的 Inference 的話，能夠替換下 test_x 便可。