Tensorflow項目練習：CNN圖片驗證碼識別

時間 2019-11-24

標籤 tensorflow 項目練習 cnn 圖片驗證碼識別简体版

原文原文鏈接

前言：學習了一些基礎的機器學習和深度學習的知識，對tensorflow框架有了簡單的瞭解，想本身作個小項目練習一下，沒想到遇到了許多坑，故把此次項目記錄下來以備之後回顧。python

1.數據挖掘的流程

2.數據集

Captcha-dataset:一個驗證碼數據集git

Feature：github

Target:bash

通過對數據集的分析，決定先將清洗後的數據寫入tfr文件，而後進行訓練。

3.數據清洗

爲了將目標值爲空、重複的數據和錯誤的數據刪除，並將圖片重命名爲n.png形式與csv文件中的目標值相對應，定義了一個DataClean類。網絡

3.1 讀取txt文件並分割文件名與目標值

def readTxt(self):
        """ 讀取txt文件並分割文件名與目標值 :return: """
        with open("./yzm_labels.txt", "r") as f:
            text_list = f.readlines()
            for text in text_list:
                # 取出目標值，因爲最後一位爲\n，故有[:-1]取到倒數第二位
                feature = text.split(",")[1][:-1]
                # print(feature)
                # 取出文件名
                file_name = text.split(",")[0].split("/")[1]
                # print(file_name)
                # 存入字典
                self.file_dic[file_name] = feature
                # print(len(self.file_dic))

複製代碼

3.2 修改文件名並創建新的目標值文檔

def renameFile(self):
        """ 修改文件名並創建新的目標值文檔 :return: """
        # 命名圖片
        n = 0
        with open("labels.csv", "a") as f:
            for key in self.file_dic:
                oldname = "./yzm/" + key
                newname = "./yzm/" + str(n) + ".png"
               # 刪除目標值爲空、圖片錯誤、重複的圖片和目標值不足的圖片
                if len(self.file_dic[key]) == 4 and self.file_dic[key] not in self.value_list:
                    try:
                        os.rename(oldname, newname)
                    except FileNotFoundError:
                        print("出現錯誤")
                        continue
                    self.value_list.append(self.file_dic[key])
                    f.writelines(self.file_dic[key] + ",")
                    n += 1
                else:
                    os.remove("./yzm/{}".format(key))
複製代碼

3.3 刪除不存在的圖片

def delFile(self):
        """ 刪除不存在的圖片 :return: """
        name_list = os.listdir("./yzm")
        for name in name_list:
            if len(name) > 10:
                os.remove("./yzm/{}".format(name))
複製代碼

3.4 目標值文件轉置

def martixT(self):
        """ labels.txt轉置 :return: """
        df = pd.read_csv("./labels.csv")
        df.T.to_csv("./newlabels.csv")
        os.remove("./labels.csv")
複製代碼

處理後的數據app

Feature: 框架

Target:

4.讀取數據

此處有兩個大坑，會致使在sess.run(label_bat/image_bat)時報錯機器學習

在谷歌上搜索

OutOfRangeError (see above for traceback): FIFOQueue '_2_batch/fifo_queue' is closed and has insufficient elements (requested 8789, current size 0)

會出現各類各樣的解決方法，我搞了一下午都解決不了，最後發現問題出在數據上。先把代碼貼上來。

4.1 讀取圖片數據

def get_captcha_image():
    """ 獲取驗證碼圖片數據 :return: image """
    # 構造文件名
    filename = []

    for i in range(8789):
        string = str(i) + ".png"
        filename.append(string)

    # 構造路徑+文件
    file_list = [os.path.join(r'D:\My_project\CNN_captcha\yzm', file) for file in filename]
    # print(file_list)

    # 構造文件隊列
    file_queue = tf.train.string_input_producer(file_list, shuffle=False)

    # 構造閱讀器
    reader = tf.WholeFileReader()

    # 讀取圖片數據內容
    _, value = reader.read(file_queue)

    # 解碼圖片數據
    image = tf.image.decode_png(value)
    # 圖片分辨率：120 * 48 * 4，將一張png圖像使用PIL讀入的時候，發現是一個四通道圖像，即：RGBA，分別表明Red（紅色）Green（綠色）Blue（藍色）和Alpha的色彩空間。
    image.set_shape([48, 120, 4])
    print(image)

    # 批處理數據 [1000, 48, 120, 4]
    image_batch = tf.train.batch([image], batch_size=3000, num_threads=1, capacity=3000)

    return image_batch
複製代碼

這一段代碼在sess.run(image_batch)時報錯的緣由是我一開始將圖片數據的形狀設置成了image.set_shape([48, 120, 3])而非image.set_shape([48, 120, 4])，但其指望的是一個四通道圖像，由於PNG圖像有一個透明空間。另外當batch_size設置過大時也會報一樣的錯誤。函數

4.2 讀取驗證碼圖片標籤數據

def get_captcha_label():
    """ 讀取驗證碼圖片標籤數據 :return: label_bat """
    # 構造文件隊列
    file_queue = tf.train.string_input_producer([r"D:\My_project\CNN_captcha\newlabels.csv"], shuffle=False)
    # 構造閱讀器
    reader = tf.TextLineReader()

    _, value = reader.read(file_queue)
    # 這裏的參數設置取決於讀取的值，讀取的值有幾列就設置幾行，其中[1]表明整形，[1.]表明float型,["None"]表明字符型
    records = [["None"]]

    label = tf.decode_csv(records=value, record_defaults=records)
    # print(label)

    # [b'95m8'],[b'sr3e']
    label_batch = tf.train.batch([label], batch_size=3000, num_threads=1, capacity=3000)

    return label_batch
複製代碼

這一段代碼報錯的緣由是將records = [["None"]]的值設置錯誤，這裏的records取決於讀取的值，讀取的值有幾列就設置幾行，其中[1]表明整形，[1.]表明float型,["None"]表明字符型。學習

4.3 將標籤數據處理爲數字

def dealWithLabel(self, label_str):
        """ 將標籤數據處理爲數字 :param label_str: :return: """
        # 構建字符索引 {0：'A', 1:'B'......}
        num_letter = dict(enumerate(list(self.letter)))

        # 鍵值對反轉 {'A':0, 'B':1......}
        letter_num = dict(zip(num_letter.values(), num_letter.keys()))

        # print(letter_num)

        # 構建標籤的列表
        array = []

        # 給標籤數據進行處理[[b"NZPP"], ......]
        for string in label_str:

            letter_list = []  # [1,2,3,4]

            # 修改編碼，b'FVQJ'到字符串，而且循環找到每張驗證碼的字符對應的數字標記
            for letter in string[0].decode('utf-8'):
                letter_list.append(letter_num[letter])

            array.append(letter_list)

        # [[13, 25, 15, 15], [22, 10, 7, 10], [22, 15, 18, 9], [16, 6, 13, 10], [1, 0, 8, 17], [0, 9, 24, 14].....]
        # print(array[:10])

        # 將array轉換成tensor類型
        label = tf.constant(array)

        return label
複製代碼

這裏在label = tf.constant(array)進行類型轉換的時候有個坑，若是array裏有壞數據（例如標籤爲ABC，即4位驗證碼缺乏了一位），則會報ValueError: Argument must be a dense tensor錯誤，即數據類型不符。

4.4 將打包好的數據寫入tfr

這裏一次性寫入了3000個數據。

def write_to_tfr(self, image_batch, label_deal):
        """ 寫入tfr文件 :param image_batch: 特徵值 :param label_deal: 目標值 :return: """
        # 轉換類型
        label_uint8 = tf.cast(label_deal, tf.uint8)

        # print(label_batch)

        # 創建TFRecords 存儲器
        writer = tf.python_io.TFRecordWriter(self.dir)

        # 循環將每個圖片上的數據構造example協議塊，序列化後寫入
        for i in range(3000):
            # 取出第i個圖片數據，轉換相應類型,圖片的特徵值要轉換成字符串形式
            image_string = image_batch[i].eval().tostring()

            # 標籤值，轉換成整型
            label_string = label_uint8[i].eval().tostring()
            print(i)
            # 構造協議塊
            example = tf.train.Example(features=tf.train.Features(feature={
                "image": tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_string])),
                "label": tf.train.Feature(bytes_list=tf.train.BytesList(value=[label_string]))
            }))

            writer.write(example.SerializeToString())

        # 關閉文件
        writer.close()
複製代碼

5.全鏈接神經網絡訓練模型

5.1 tfr文件的讀取

def read_captcha_tfrecords(self):
        """ 讀取驗證碼特徵值和目標值數據 :return: """
        # 一、構造文件的隊列
        file_queue = tf.train.string_input_producer([self.dir])

        # 二、tf.TFRecordReader 讀取TFRecords數據
        reader = tf.TFRecordReader()

        # 單個樣本數據
        _, value = reader.read(file_queue)

        # 三、解析example協議
        feature = tf.parse_single_example(value, features={
            "image": tf.FixedLenFeature([], tf.string),
            "label": tf.FixedLenFeature([], tf.string)
        })

        # 四、解碼操做、數據類型、形狀
        image = tf.decode_raw(feature["image"], tf.uint8)
        label = tf.decode_raw(feature["label"], tf.uint8)

        # 肯定類型和形狀
        # 圖片形狀 [48, 120, 4]
        # 目標值 [4]
        image_reshape = tf.reshape(image, [self.height, self.width, self.channel])
        label_reshape = tf.reshape(label, [self.label_num])

        # 類型
        image_type = tf.cast(image_reshape, tf.float32)
        label_type = tf.cast(label_reshape, tf.int32)

        # 五、 批處理
        # print(image_type, label_type)
        # 提供每批次多少樣本去進行訓練
        image_batch, label_batch = tf.train.batch([image_type, label_type],
                                                  batch_size=self.train_batch,
                                                  num_threads=1,
                                                  capacity=self.train_batch)
        print(image_batch, label_batch)
        return image_batch, label_batch
複製代碼

5.2 損失計算

def loss(self, y_true, y_predict):
        """ 創建驗證碼4個目標值的損失 :param y_true: 真實值 :param y_predict: 預測值 :return: loss """
        with tf.variable_scope("loss"):
            # 先進行網絡輸出的值的機率計算softmax,在進行交叉熵損失計算
            # y_true:[100, 4, 63]------>[100, 252]
            # y_predict:[100, 252]
            y_reshape = tf.reshape(y_true,
                                   [self.train_batch, self.label_num * self.feature_num])

            all_loss = tf.nn.softmax_cross_entropy_with_logits(labels=y_reshape,
                                                               logits=y_predict,
                                                               name="compute_loss")
            # 求出平均損失
            loss = tf.reduce_mean(all_loss)

        return loss
複製代碼

5.3 梯度降低優化

def sgd(loss):
        """ 梯度降低優化損失 :param loss: :return: train_op """
        with tf.variable_scope("sgd"):
            train_op = tf.train.AdamOptimizer(0.001).minimize(loss)

        return train_op
複製代碼

5.4 計算準確率

四個目標值徹底符合結果爲True

def acc(self, y_true, y_predict):
        """ 計算準確率 :param y_true: 真實值 :param y_predict: 預測值 :return: accuracy """
        with tf.variable_scope("acc"):
            # y_true:[100, 4, 63]
            # y_predict：[100, 252] --> [100,4,63]
            y_predict_reshape = tf.reshape(y_predict, [self.train_batch, self.label_num, self.feature_num])

            # 先對最大值的位置去求解
            euqal_list = tf.equal(tf.argmax(y_true, 2),
                                  tf.argmax(y_predict_reshape, 2))

            # 須要對每一個樣本進行判斷
            # euqal_list:[True, True,True, True], [True, False,True, True]
            # x = tf.constant([[True, True], [True, False]])
            # tf.reduce_all(x, 1)，四個特徵值全爲True結果爲T，求與邏輯[True, False]
            accuracy = tf.reduce_mean(tf.cast(tf.reduce_all(euqal_list, 1), tf.float32))

        return accuracy
複製代碼

5.5 創建全鏈接層神經網絡

def model_nn(self, image_batch):
        """ 創建全鏈接模型 :param image_batch:特徵值 :return: y_predict """
        # 全鏈接層
        # [100,48,120,4] --> [100,48*120*4]
        # y_pre:[100,48*120*4] * [48*120*4,252] = [100,252]
        with tf.variable_scope("model"):
            # 初始化權重和偏置
            weight = self.weight_variables([48 * 120 * 4, 252])
            bias = self.bias_variables([252])
            # 特徵值四維轉二維
            x_re = tf.reshape(image_batch, [self.train_batch, 48 * 120 * 4])
            y_predict = tf.matmul(x_re, weight) + bias
        return y_predict
複製代碼

全鏈接層模型訓練結果，效果很差

這裏使用全鏈接模型，最多隻能訓練到20%的準確率。

6.卷積神經網絡訓練模型

最初直接使用上面的方法，只將5.5的全鏈接層神經網絡拓展爲兩層卷積神經網絡，發現訓練準確率一直爲0。通過查閱資料，將損失計算函數作了修改。

6.1 損失計算函數

def loss(self, y_true, y_predict):
        """ 創建驗證碼4個目標值的損失 :param y_true: 真實值 :param y_predict: 預測值 :return: loss """
        with tf.variable_scope("loss"):
            # 先進行網絡輸出的值的機率計算sigmiod,在進行交叉熵損失計算
            # y_true:[100, 4, 63]------>[100, 252]
            # y_predict:[100, 252]
            y_reshape = tf.reshape(y_true,
                                   [self.train_batch, self.label_num * self.feature_num])

            all_loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=y_predict, labels=y_reshape, name="compute_loss")
            # 求出平均損失
            loss = tf.reduce_mean(all_loss)
複製代碼

將softmax損失計算改成了sigmoid損失計算，訓練效果大大提高。

其緣由在於：

sigmoid：計算網絡輸出logits和標籤labels的sigmoid cross entropy loss用來衡量獨立不互斥離散分類任務的偏差。說獨立不互斥離散分類任務是由於，在這些任務中類與類之間是獨立可是不互斥的。拿多分類任務中的多目標檢測來舉例子，一張圖中能夠有各類instance，好比有一隻狗和一隻貓。對於一個總共有五類的多目標檢測任務，假如網絡的輸出層有5個節點，label的形式是[1,1,0,0,1]這種，1表示該圖片有某種instance，0表示沒有。那麼，每一個instance在這張圖中是否存在顯然是獨立事件，可是多個instance能夠存在一張圖中，這就說明事件們並非互斥的。因此咱們能夠直接將網絡的輸出用做該方法的logits輸入，從而進行輸出與label的cross entropy loss。

softmax：計算網絡輸出logits和標籤labels的softmax cross entropy loss，衡量獨立互斥離散分類任務的偏差。說獨立互斥離散分類任務是由於，在這些任務中類與類之間是獨立並且互斥的，好比VOC classification、Imagenet、CIFAR-10甚至MNIST，這些都是多分類任務，可是一張圖就對應着一個類，class在圖片中是否存在是獨立的，而且一張圖中只能有一個class，因此是獨立且互斥事件。

顯然，在本次項目的場景，使用sigmoid函數來衡量偏差是更加合適的。

6.2 準確率計算

此外，對準確率計算函數進行了優化，除計算每一個樣本的準確率外，還計算了單個字符的準確率，以便更直觀的觀察訓練過程。單個字符的準確率可理解爲將每一個樣本都拆分爲4個字符，而後計算全部字符的當前訓練準確率。

def acc(self, y_true, y_predict):
        """ 計算準確率 :param y_true: 真實值 :param y_predict: 預測值 :return: accuracy """
        with tf.variable_scope("acc"):
            # y_true:[100, 4, 63]
            # y_predict：[100, 252] --> [100,4,63]
            y_predict_reshape = tf.reshape(y_predict, [self.train_batch, self.label_num, self.feature_num])

            # 先對最大值的位置去求解
            equal_list = tf.equal(tf.argmax(y_true, 2),
                                  tf.argmax(y_predict_reshape, 2))
            # 字符準確率
            # 直接對equal_list求平均
            # equal_list:[True, True,True, True], [True, False,True, True]
            accuracy_char = tf.reduce_mean(tf.cast(equal_list, tf.float32))
            # 圖片準確率
            # 須要對每一個樣本進行判斷
            # equal_list:[True, True,True, True], [True, False,True, True]
            # x = tf.constant([[True, True], [True, False]])
            # tf.reduce_all(x, 1)，四個特徵值全爲True結果爲T，求與邏輯[True, False]
            accuracy_image = tf.reduce_mean(tf.cast(tf.reduce_all(equal_list, 1), tf.float32))
        return accuracy_char, accuracy_image
複製代碼

6.3 訓練結果1

網絡結構

序號	層級
輸入	input
1	卷積層 + 池化層 + 降採樣層 + ReLU
2	全鏈接 + sigmoid
輸出	output

訓練結果

結果仍是不算理想，再加一層卷積！

6.4 訓練結果2

網絡結構

序號	層級
輸入	input
1	卷積層 + 池化層 + 降採樣層 + ReLU
2	卷積層 + 池化層 + 降採樣層 + ReLU
2	全鏈接 + sigmoid
輸出	output

訓練結果

準確率已經達到100%

7.識別

用測試集進行預測並進行結果展現

def predict(self):
        """ 進行預測，打印結果 :return: """
        # 構建字符索引 {0：'A', 1:'B'......}
        num_letter = dict(enumerate(list(self.letter)))
        # 更改獲取的樣本數
        self.get_batch = 10
        # 經過接口獲取特徵值和目標值
        # image_batch:[100, 48, 120, 4]
        # label_batch: [100, 4]
        # [[13, 25, 15, 15], [22, 10, 7, 10]]
        image_batch, label_batch = self.read_captcha_tfrecords(self.testdir)
        # 創建卷積模型,y_predict:[100,252]
        # CNN
        y_predict = self.model_cnn(image_batch)
        # 轉換label_batch到one_hot編碼
        # y_true:[100, 4, 63]
        y_true = self.turn_to_onehot(label_batch)
        # 計算準確率，獲取reshape後的y_predict：[100, 4, 63]
        char_acc, image_acc, y_predict = self.acc(y_true, y_predict)
        # 建立讀取模型的OP
        saver = tf.train.Saver()
        # 會話訓練
        with tf.Session() as sess:
            # 初始化變量
            sess.run(tf.global_variables_initializer())
            # 生成線程的管理
            coord = tf.train.Coordinator()
            # 指定開啓子線程去讀取數據
            threads = tf.train.start_queue_runners(sess=sess, coord=coord)

            # 加載保存的模型,從模型中找出與當前代碼中名字同樣的OP操做,覆蓋原來的值
            ckpt = tf.train.latest_checkpoint(self.modeldir)
            if ckpt:
                saver.restore(sess, ckpt)
            # 獲取準確率，預測值和真實值
            char_run, image_run, predict, label = sess.run([char_acc, image_acc, tf.argmax(y_predict, 2), label_batch])
            print("預測準確率爲:%f,字符準確率爲:%f" % (image_run, char_run))
            # 將數字目標值改成字母
            # 打印預測結果
            array_true = []  # 真實值列表
            array_predict = []  # 預測值列表
            for i in range(self.get_batch):
                array_true_single = []  # 真實值單個樣本
                array_predict_sin = []  # 預測值單個樣本
                for num in label[i]:
                    # 將數字轉換爲字母加入單個樣本列表
                    array_true_single.append(num_letter[num])
                # 將單個樣本加入總列表
                array_true.append(array_true_single)
                for num in predict[i]:
                    # 將數字轉換爲字母加入單個樣本列表
                    array_predict_sin.append(num_letter[num])
                # 將單個樣本加入總列表
                array_predict.append(array_predict_sin)
                # 打印預測結果
                print("第 %d 次預測,真實值:" % i, array_true[i], ",", "預測值:", array_predict[i])

            # 回收線程
            coord.request_stop()
            coord.join(threads)
複製代碼

下面這段代碼的目的是將以數字形式進行打包的樣本轉換爲原始的字母形式，並將真實值與預測值進行比對。

# 將數字目標值改成字母
            # 打印預測結果
            array_true = []  # 真實值列表
            array_predict = []  # 預測值列表
            for i in range(self.get_batch):
                array_true_single = []  # 真實值單個樣本
                array_predict_sin = []  # 預測值單個樣本
                for num in label[i]:
                    # 將數字轉換爲字母加入單個樣本列表
                    array_true_single.append(num_letter[num])
                # 將單個樣本加入總列表
                array_true.append(array_true_single)
                for num in predict[i]:
                    # 將數字轉換爲字母加入單個樣本列表
                    array_predict_sin.append(num_letter[num])
                # 將單個樣本加入總列表
                array_predict.append(array_predict_sin)
                # 打印預測結果
                print("第 %d 次預測,真實值:" % i, array_true[i], ",", "預測值:", array_predict[i])
複製代碼