上一篇博客【用tensorflow遷移學習貓狗分類】筆者講到用tensorlayer的【VGG16模型】遷移學習圖像分類,那麼問題來了,tensorlayer沒提供的模型怎麼辦呢?別擔憂,tensorlayer提供了tensorflow中的【slim模型】導入功能,代碼例子在tutorial_inceptionV3_tfslim。
那麼什麼是slim?slim到底有什麼用?
slim是一個使構建,訓練,評估神經網絡變得簡單的庫。它能夠消除原生tensorflow裏面不少重複的模板性的代碼,讓代碼更緊湊,更具有可讀性。另外slim提供了不少計算機視覺方面的著名模型(VGG, AlexNet等),咱們不只能夠直接使用,甚至能以各類方式進行擴展。(筆者注:總之功能跟tensorlayer差很少嘛)更多介紹能夠看這篇文章:【Tensorflow】輔助工具篇——tensorflow slim(TF-Slim)介紹】
要進行遷移學習,首先須要slim模型代碼以及預訓練好的權重參數,這些谷歌都有提供下載,能夠看到主頁下面有各個模型以及在imagenet訓練集下的參數地址。
列表還列出了各個模型的top一、top5的正確率,模型不少了。
好了咱們下載Inception-ResNet-v2以及inception_resnet_v2_2016_08_30.tar.gz,py文件和解壓出來的.ckpt文件放到項目根目錄下面。至於爲何不用tensorlayer例子提供的Inception V3?由於Inception-ResNet-v2正確率高啊。(哈哈真正緣由最後來說)。
咱們依舊進行貓狗分類,按照教程導入模型修改num_classes再導入訓練數據,直接訓練是會報錯的,由於最後的Logits層幾個參數在恢復時維度不匹配。
最後幾個參數是不能恢復了,筆者也沒有找到選擇性恢復.ckpt參數的tensorflow方法。怎麼辦呢?幸虧羣裏面有位朋友提供了一個方法,參見【Tensorflow 遷移學習】:
主要思想是:先把全部.ckpt參數恢復成npz格式,再選擇恢復npz中的參數,恢復npz中的參數就跟前一篇博客操做同樣的了。
因此整個過程分兩步走:
1.將參數恢復而後保存爲npz格式:
下面是具體代碼:html
import os import time from recordutil import * import numpy as np # from tensorflow.contrib.slim.python.slim.nets.resnet_v2 import resnet_v2_152 # from tensorflow.contrib.slim.python.slim.nets.vgg import vgg_16 import skimage import skimage.io import skimage.transform import tensorflow as tf from tensorlayer.layers import * # from scipy.misc import imread, imresize # from tensorflow.contrib.slim.python.slim.nets.alexnet import alexnet_v2 from inception_resnet_v2 import (inception_resnet_v2_arg_scope, inception_resnet_v2) from scipy.misc import imread, imresize from tensorflow.python.ops import variables import tensorlayer as tl slim = tf.contrib.slim try: from data.imagenet_classes import * except Exception as e: raise Exception( "{} / download the file from: https://github.com/zsdonghao/tensorlayer/tree/master/example/data".format(e)) n_epoch = 200 learning_rate = 0.0001 print_freq = 2 batch_size = 32 ## InceptionV3 / All TF-Slim nets can be merged into TensorLayer x = tf.placeholder(tf.float32, shape=[None, 299, 299, 3]) # 輸出 y_ = tf.placeholder(tf.int32, shape=[None, ], name='y_') net_in = tl.layers.InputLayer(x, name='input_layer') with slim.arg_scope(inception_resnet_v2_arg_scope()): network = tl.layers.SlimNetsLayer( prev_layer=net_in, slim_layer=inception_resnet_v2, slim_args={ 'num_classes': 1001, 'is_training': True, }, name='InceptionResnetV2' # <-- the name should be the same with the ckpt model ) # network = fc_layers(net_cnn) sess = tf.InteractiveSession() network.print_params(False) # network.print_layers() saver = tf.train.Saver() # 加載預訓練的參數 # tl.files.assign_params(sess, npz, network) tl.layers.initialize_global_variables(sess) saver.restore(sess, "inception_resnet_v2.ckpt") print("Model Restored") all_params = sess.run(network.all_params) np.savez('inception_resnet_v2.npz', params=all_params) sess.close()
執行成功以後,咱們獲得模型全部的908個參數。
2.部分恢復npz參數而後訓練模型:
首先咱們修改模型最後一層參數,因爲進行的是2分類學習,因此作以下修改:python
with slim.arg_scope(inception_resnet_v2_arg_scope()): network = tl.layers.SlimNetsLayer( prev_layer=net_in, slim_layer=inception_resnet_v2, slim_args={ 'num_classes': 2, 'is_training': True, }, name='InceptionResnetV2' # <-- the name should be the same with the ckpt model )
num_classes改成2,is_training爲True。
接着定義輸入輸出以及損失函數:git
sess = tf.InteractiveSession() # saver = tf.train.Saver() y = network.outputs y_op = tf.argmax(tf.nn.softmax(y), 1) cost = tl.cost.cross_entropy(y, y_, name='cost') correct_prediction = tf.equal(tf.cast(tf.argmax(y, 1), tf.float32), tf.cast(y_, tf.float32)) acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
下面是定義訓練參數,咱們只訓練最後一層的參數,打印參數出來咱們看到:github
[TL] param 900: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights:0 (5, 5, 128, 768) float32_ref [TL] param 901: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0 (768,) float32_ref [TL] param 902: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean:0 (768,) float32_ref [TL] param 903: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance:0 (768,) float32_ref [TL] param 904: InceptionResnetV2/AuxLogits/Logits/weights:0 (768, 2) float32_ref [TL] param 905: InceptionResnetV2/AuxLogits/Logits/biases:0 (2,) float32_ref [TL] param 906: InceptionResnetV2/Logits/Logits/weights:0 (1536, 2) float32_ref [TL] param 907: InceptionResnetV2/Logits/Logits/biases:0 (2,) float32_ref [TL] num of params: 56940900
從param 904開始訓練就好了,參數恢復到param 903
下面是訓練函數以及恢復部分參數,加載樣本數據:網絡
# 定義 optimizer train_params = network.all_params[904:] print('訓練參數:', train_params) # # 加載預訓練的參數 # tl.files.assign_params(sess, params, network) train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost, var_list=train_params) img, label = read_and_decode("D:\\001-Python\\train299.tfrecords") # 使用shuffle_batch能夠隨機打亂輸入 X_train, y_train = tf.train.shuffle_batch([img, label], batch_size=batch_size, capacity=200, min_after_dequeue=100) tl.layers.initialize_global_variables(sess) params = tl.files.load_npz('', 'inception_resnet_v2.npz') params = params[0:904] print('當前參數大小:', len(params)) tl.files.assign_params(sess, params=params, network=network)
下面依舊是訓練模型的代碼,跟上一篇同樣:函數
# # 訓練模型 coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=sess, coord=coord) step = 0 filelist = getfilelist() for epoch in range(n_epoch): start_time = time.time() val, l = sess.run([X_train, y_train])#next_data(filelist, batch_size) # for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True): sess.run(train_op, feed_dict={x: X_train_a, y_: y_train_a}) if epoch + 1 == 1 or (epoch + 1) % print_freq == 0: print("Epoch %d of %d took %fs" % (epoch + 1, n_epoch, time.time() - start_time)) train_loss, train_acc, n_batch = 0, 0, 0 for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True): err, ac = sess.run([cost, acc], feed_dict={x: X_train_a, y_: y_train_a}) train_loss += err train_acc += ac n_batch += 1 print(" train loss: %f" % (train_loss / n_batch)) print(" train acc: %f" % (train_acc / n_batch)) # tl.files.save_npz(network.all_params, name='model_vgg_16_2.npz', sess=sess) coord.request_stop() coord.join(threads)
batchsize爲20訓練200代,部分結果以下:工具
Epoch 156 of 200 took 12.568609s train loss: 0.382517 train acc: 0.950000 Epoch 158 of 200 took 12.457161s train loss: 0.382509 train acc: 0.850000 Epoch 160 of 200 took 12.385407s train loss: 0.320393 train acc: 1.000000 Epoch 162 of 200 took 12.489218s train loss: 0.480686 train acc: 0.700000 Epoch 164 of 200 took 12.388841s train loss: 0.329189 train acc: 0.850000 Epoch 166 of 200 took 12.446472s train loss: 0.379127 train acc: 0.900000 Epoch 168 of 200 took 12.888571s train loss: 0.365938 train acc: 0.900000 Epoch 170 of 200 took 12.850605s train loss: 0.353434 train acc: 0.850000 Epoch 172 of 200 took 12.855129s train loss: 0.315443 train acc: 0.950000 Epoch 174 of 200 took 12.906666s train loss: 0.460817 train acc: 0.750000 Epoch 176 of 200 took 12.830738s train loss: 0.421025 train acc: 0.900000 Epoch 178 of 200 took 12.852572s train loss: 0.418784 train acc: 0.800000 Epoch 180 of 200 took 12.951322s train loss: 0.316057 train acc: 0.950000 Epoch 182 of 200 took 12.866213s train loss: 0.363328 train acc: 0.900000 Epoch 184 of 200 took 13.012520s train loss: 0.379462 train acc: 0.850000 Epoch 186 of 200 took 12.934583s train loss: 0.472857 train acc: 0.750000 Epoch 188 of 200 took 13.038168s train loss: 0.236005 train acc: 1.000000 Epoch 190 of 200 took 13.056378s train loss: 0.266042 train acc: 0.950000 Epoch 192 of 200 took 13.016137s train loss: 0.255430 train acc: 0.950000 Epoch 194 of 200 took 13.013147s train loss: 0.422342 train acc: 0.900000 Epoch 196 of 200 took 12.980659s train loss: 0.353984 train acc: 0.900000 Epoch 198 of 200 took 13.033676s train loss: 0.320018 train acc: 0.950000 Epoch 200 of 200 took 12.945982s train loss: 0.288049 train acc: 0.950000
好了,遷移學習Inception-ResNet-v2結束。
做者說SlimNetsLayer是能導入任何Slim Model的。筆者已經驗證過導入Inception-ResNet-v2和VGG16成功,Inception V3導入後訓練了兩三天,正確率一直在10到70之間波動(跟筆者的心情同樣不穩定),筆者一直找不出緣由,心累,但願哪位朋友再去驗證一下Inception V3咯。學習