"笨方法"學習CNN圖像識別（三）—— ResNet網絡訓練及預測

時間 2019-11-12

原文原文鏈接

在本文中，你將學習到如下內容：

TensorFlow中調用ResNet網絡
訓練網絡並保存模型
加載模型預測結果

前言

在深度學習中，隨着網絡深度的增長，模型優化會變得愈來愈困難，甚至會發生梯度爆炸，致使整個網絡訓練沒法收斂。ResNet(Residual Networks)的提出解決了這個問題。在這裏咱們直接調用ResNet網絡進行訓練，講解ResNet細節的文章有不少，這裏找了一篇供參考。python

搭建訓練網絡

若是你看過了前面的準備工做，圖片預處理和製做tfrecord格式，默認已經有tfrecord格式的數據文件了。咱們接着搭建網絡，來處理100類商標圖片的分類問題。將製做好的tfrecord數據經過隊列系統傳入ResNet網絡進行訓練。git

首先導入必要的庫：網絡

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets

nets庫裏面集成了現有的不少網絡(AlexNet，Inception，ResNet，VGG)能夠直接調用，咱們在這裏使用ResNet_50，即50層的網絡訓練。函數

接下來咱們先定義一個讀取tfrecord文件的函數：學習

def read_and_decode_tfrecord(filename):
    filename_deque = tf.train.string_input_producer(filename)
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_deque)
    features = tf.parse_single_example(serialized_example, features={
        'label': tf.FixedLenFeature([], tf.int64),
        'img_raw': tf.FixedLenFeature([], tf.string)})
    label = tf.cast(features['label'], tf.int32)
    img = tf.decode_raw(features['img_raw'], tf.uint8)
    img = tf.reshape(img, [224, 224, 3])
    img = tf.cast(img, tf.float32) / 255.0      #將矩陣歸一化0-1之間
    return img, label

定義模型保存地址，batch_sizes設置的小一點訓練效果更好，將當前目錄下的tfrecord文件放入列表中:測試

save_dir = r"./train_image_63.model"  # 模型保存路徑
batch_size_ = 2
lr = tf.Variable(0.0001, dtype=tf.float32)  # 學習速率
x = tf.placeholder(tf.float32, [None, 224, 224, 3])  # 圖片大小爲224*224*3
y_ = tf.placeholder(tf.float32, [None])
train_list = ['traindata_63.tfrecords-000', 'traindata_63.tfrecords-001', 'traindata_63.tfrecords-002','traindata_63.tfrecords-003', 'traindata_63.tfrecords-004', 'traindata_63.tfrecords-005','traindata_63.tfrecords-006', 'traindata_63.tfrecords-007', 'traindata_63.tfrecords-008','traindata_63.tfrecords-009', 'traindata_63.tfrecords-010', 'traindata_63.tfrecords-011','traindata_63.tfrecords-012', 'traindata_63.tfrecords-013', 'traindata_63.tfrecords-014',
'traindata_63.tfrecords-015', 'traindata_63.tfrecords-016', 'traindata_63.tfrecords-017','traindata_63.tfrecords-018', 'traindata_63.tfrecords-019', 'traindata_63.tfrecords-020','traindata_63.tfrecords-021']    #製做成的全部tfrecord數據，每一個最多包含1000個圖片數據

# 隨機打亂順序
img, label = read_and_decode_tfrecord(train_list)
img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_, capacity=10000,min_after_dequeue=9900)

注意這裏使用了tf.train.shuffle_batch隨機打亂隊列裏面的數據順序，num_threads表示線程數，capacity表示隊列的容量，在這裏設置成10000， min_after_dequeue隊列裏保留的最小數據量，而且控制着隨機的程度，設置成9900的意思是，當隊列中的數據出列100個，剩下9900個的時候，就要從新補充100個數據進來並打亂順序。若是你要按順序導入隊列，改爲tf.train.batch函數，並刪除min_after_dequeue參數。這些參數都要根據本身的電腦配置進行相應的設置。優化

接下來將label值進行onehot編碼，直接調用tf.one_hot函數。由於咱們這裏有100類，depth設置成100:ui

# 將label值進行onehot編碼
one_hot_labels = tf.one_hot(indices=tf.cast(y_, tf.int32), depth=100)
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=100, is_training=True)
pred = tf.reshape(pred, shape=[-1, 100])

咱們經過nets.resnet_v2.resnet_v2_50直接調用ResNet_50網絡，一樣num_classes等於類別總數，is_training表示咱們是否要訓練網絡裏面固定層的參數，True表示全部參數都從新訓練，False表示只訓練後面幾層的參數。編碼

網絡搭好後，咱們繼續定義損失函數和優化器，損失函數選擇sigmoid交叉熵，優化器選擇Adam：spa

# 定義損失函數和優化器
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=one_hot_labels))
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)

定義準確率函數，tf.argmax函數返回最大值所在位置：

# 準確度
a = tf.argmax(pred, 1)
b = tf.argmax(one_hot_labels, 1)
correct_pred = tf.equal(a, b)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

最後咱們構建Session，讓網絡跑起來：

saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # 建立一個協調器，管理線程
    coord = tf.train.Coordinator()
    # 啓動QueueRunner,此時文件名隊列已經進隊
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    i = 0
    while True:
        i += 1
        b_image, b_label = sess.run([img_batch, label_batch])
        _, loss_, y_t, y_p, a_, b_ = sess.run([optimizer, loss, one_hot_labels, pred, a, b], feed_dict={x: b_image,y_: b_label})
        print('step: {}, train_loss: {}'.format(i, loss_))
        if i % 20 == 0:
            _loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label})
            print('--------------------------------------------------------')
            print('step: {}  train_acc: {}  loss: {}'.format(i, acc_train, _loss))
            print('--------------------------------------------------------')
            if i == 200000:
                saver.save(sess, save_dir, global_step=i)
            elif i == 300000:
                saver.save(sess, save_dir, global_step=i)
            elif i == 400000:
                saver.save(sess, save_dir, global_step=i)
                break
    coord.request_stop()
    # 其餘全部線程關閉以後，這一函數才能返回
    coord.join(threads)

當咱們使用隊列系統時，在Session部分必定要建立一個協調器管理線程。咱們每20步輸出一次準確率，在200000,300000,400000步的時候自動保存模型。

訓練結束後會獲得以下模型文件，我在這裏只保留了300000步的模型：

模型文件

附上訓練網絡完整代碼：

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets



def read_and_decode_tfrecord(filename):
    filename_deque = tf.train.string_input_producer(filename)
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_deque)
    features = tf.parse_single_example(serialized_example, features={
        'label': tf.FixedLenFeature([], tf.int64),
        'img_raw': tf.FixedLenFeature([], tf.string)})
    label = tf.cast(features['label'], tf.int32)
    img = tf.decode_raw(features['img_raw'], tf.uint8)
    img = tf.reshape(img, [224, 224, 3])
    img = tf.cast(img, tf.float32) / 255.0        #將矩陣歸一化0-1之間
    return img, label

save_dir = r"./train_image_63.model"
batch_size_ = 2
lr = tf.Variable(0.0001, dtype=tf.float32)
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
y_ = tf.placeholder(tf.float32, [None])

train_list = ['traindata_63.tfrecords-000','traindata_63.tfrecords-001','traindata_63.tfrecords-002','traindata_63.tfrecords-003','traindata_63.tfrecords-004','traindata_63.tfrecords-005','traindata_63.tfrecords-006','traindata_63.tfrecords-007','traindata_63.tfrecords-008''traindata_63.tfrecords-009','traindata_63.tfrecords-010','traindata_63.tfrecords-011','traindata_63.tfrecords-012','traindata_63.tfrecords-013','traindata_63.tfrecords-014','traindata_63.tfrecords-015','traindata_63.tfrecords-016','traindata_63.tfrecords-017','traindata_63.tfrecords-018','traindata_63.tfrecords-019','traindata_63.tfrecords-020','traindata_63.tfrecords-021']

# 隨機打亂順序
img, label = read_and_decode_tfrecord(train_list)
img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_, capacity=10000,min_after_dequeue=9900)

# 將label值進行onehot編碼
one_hot_labels = tf.one_hot(indices=tf.cast(y_, tf.int32), depth=100)
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=100, is_training=True)
pred = tf.reshape(pred, shape=[-1, 100])


# 定義損失函數和優化器
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=one_hot_labels))
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)

# 準確度
a = tf.argmax(pred, 1)
b = tf.argmax(one_hot_labels, 1)
correct_pred = tf.equal(a, b)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # 建立一個協調器，管理線程
    coord = tf.train.Coordinator()
    # 啓動QueueRunner,此時文件名隊列已經進隊
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    i = 0
    while True:
        i += 1
        b_image, b_label = sess.run([img_batch, label_batch])
        _, loss_, y_t, y_p, a_, b_ = sess.run([optimizer, loss, one_hot_labels, pred, a, b], feed_dict={x: b_image,y_: b_label})
        print('step: {}, train_loss: {}'.format(i, loss_))
        if i % 20 == 0:
            _loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label})
            print('--------------------------------------------------------')
            print('step: {}  train_acc: {}  loss: {}'.format(i, acc_train, _loss))
            print('--------------------------------------------------------')
            if i == 200000:
                saver.save(sess, save_dir, global_step=i)
            elif i == 300000:
                saver.save(sess, save_dir, global_step=i)
            elif i == 400000:
                saver.save(sess, save_dir, global_step=i)
                break
    coord.request_stop()
    # 其餘全部線程關閉以後，這一函數才能返回
    coord.join(threads)

預測結果

咱們利用1000張測試數據評估咱們的模型，直接放代碼：

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
from PIL import Image
import os

test_dir = r'./test'     # 原始的test文件夾，含帶預測的圖片
model_dir = r'./train_image_63.model-300000'     # 模型地址
test_txt_dir = r'./test.txt'     # 原始的test.txt文件
result_dir = r'./result.txt'     # 生成輸出結果
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
classes = ['1', '10', '100', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '20', '21', '22', '23', '24','25', '26', '27', '28', '29', '3', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '4', '40','41', '42', '43', '44', '45', '46', '47', '48', '49', '5', '50', '51', '52', '53', '54', '55', '56', '57','58', '59', '6', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '7', '70', '71', '72', '73','74', '75', '76', '77', '78', '79', '8', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '9','90', '91', '92', '93', '94', '95', '96', '97', '98', '99']# 標籤順序

pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=100, is_training=True)
pred = tf.reshape(pred, shape=[-1, 100])
a = tf.argmax(pred, 1)
saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.restore(sess, model_dir)
    with open(test_txt_dir, 'r') as f:
        data = f.readlines()
        for i in data:
            test_name = i.split()[0]
            for pic in os.listdir(test_dir):
                if pic == test_name:
                    img_path = os.path.join(test_dir, pic)
                    img = Image.open(img_path)
                    img = img.resize((224, 224))
                    img = tf.reshape(img, [1, 224, 224, 3])
                    img1 = tf.reshape(img, [1, 224, 224, 3])
                    img = tf.cast(img, tf.float32) / 255.0
                    b_image, b_image_raw = sess.run([img, img1])
                    t_label = sess.run(a, feed_dict={x: b_image})
                    index_ = t_label[0]
                    predict = classes[index_]
                    with open(result_dir, 'a') as f1:
                        print(test_name, predict, file=f1)
                    break

須要注意的是test數據集並無處理成tfrecord格式，在這裏直接將圖片一張張導入用模型預測，生成的結果文件主要是爲了提交比賽使用。原始數據和模型我會放在這裏，密碼：8xbi。有興趣自提。

至此，咱們就完成了一個CNN圖像識別項目。