超詳細的Tensorflow模型的保存和加載（理論與實戰詳解）

時間 2019-11-24

標籤詳細 tensorflow 模型保存加載理論實戰詳解简体版

原文原文鏈接

1.Tensorflow的模型究竟是什麼樣的？

Tensorflow模型主要包含網絡的設計（圖）和訓練好的各參數的值等。因此，Tensorflow模型有兩個主要的文件：git

a) Meta graph:網絡

這是一個協議緩衝區(protocol buffer)，它完整地保存了Tensorflow圖；即全部的變量、操做、集合等。此文件以 .meta 爲拓展名。session

b) Checkpoint 文件：ide

這是一個二進制文件，包含weights、biases、gradients 和其餘全部變量的值。此文件以 .ckpt 爲擴展名. 可是，從Tensorflow 0.11版本以後作出了一些改變。如今，再也不是單一的 .ckpt 文件，而是兩個文件（.data和.index）.data文件包含了咱們的訓練變量，稍後再說。函數

另外，Tensorflow還有一個名爲 checkpoint 的文件，僅用於保存最新checkpoint文件保存的記錄。學習

2. 保存一個Tensorflow模型：

比方說你正在訓練一個卷積神經網絡用於圖像分類，你會關注於loss值和accuracy. 一旦你看到網絡converged，你就能夠手工中止訓練或設置固定的訓練迭代次數。訓練完成以後，咱們想把全部變量值和網絡圖保存到文件中方便之後使用。因此，爲了保存Tensorflow中的圖和全部參數的值，咱們建立一個tf.train.Saver()類的實例。測試

saver = tf.train.Saver()

別忘了Tensorflow變量僅存在於session內，因此你必須在session內進行保存，可經過調用建立的saver對象的sava方法實現。優化

saver.save(sess, path+"model_conv/my-model", global_step=epoch)

其中，sess是session對象，path+"model_conv/my-model"是你對本身模型的路徑+命名，global_step表示迭代多少次就保存模型（好比每迭代1000次後保存模型：global_step=1000）；若是你想保存最近的4個模型而且每訓練兩個小時保存一次，可使用 max_to_keep=4 和 keep_checkpoint_every_n_hours=2spa

若是咱們沒有在tf.train.Saver()中指定任何參數，它會保存全部變量。若是咱們不想保存所有變量而只是想保存一部分的話，咱們能夠指定想保存的variables/collections.在建立tf.train.Saver實例時，咱們將它傳遞給咱們想要保存的變量的列表或字典。看一個例子：.net

path = '/data/User/zcc/' import tensorflow as tf v1 = tf.Variable(tf.constant(1.0, shape=[1]), name="v1") v2 = tf.Variable(tf.constant(2.0, shape=[1]), name="v2") result = v1 + v2 saver = tf.train.Saver([v1,v2]) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) saver.save(sess, path+"Model_new/model.ckpt")

3.實戰詳解（簡單的卷積神經網絡）

下面定義了一個簡單的卷積神經網絡：有兩個卷積層、兩個池化層和兩個全鏈接層。而且加載的數據是無心義的數據，模擬的是10張32x32的RGB圖像，共4個類別0、一、二、3。這裏主要是爲了學習模型的保存和調用，對於數據怎樣得來和準確率不用在乎。

import tensorflow as tf import numpy as np import os # 自定義要加載的訓練集 def load_data(resultpath): datapath = os.path.join(resultpath, "data10_4.npz") # 若是有已經存在的數據，則加載 if os.path.exists(datapath): data = np.load(datapath) # 注意提取數值的方法 X, Y = data["X"], data["Y"] else: # 加載的數據是無心義的數據，模擬的是10張32x32的RGB圖像，共4個類別:0、1、2、3 # 將30720個數字化成10*32*32*32*3的張量 X = np.array(np.arange(30720)).reshape(10, 32, 32, 3) Y = [0, 0, 1, 1, 2, 2, 3, 3, 2, 0] X = X.astype('float32') Y = np.array(Y) # 把數據保存成dataset.npz的格式 np.savez(datapath, X=X, Y=Y) print('Saved dataset to dataset.npz') # 一種很好用的打印輸出顯示方式 print('X_shape:{}\nY_shape:{}'.format(X.shape, Y.shape)) return X, Y

# 搭建卷積網絡：有兩個卷積層、兩個池化層和兩個全鏈接層。 def define_model(x): x_image = tf.reshape(x, [-1, 32, 32, 3]) print ('x_image.shape:',x_image.shape) def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial, name="w") def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial, name="b") def conv3d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2d(x): return tf.nn.max_pool(x, ksize=[1, 3, 3, 1], strides=[1, 3, 3, 1], padding='SAME') with tf.variable_scope("conv1"):  # [-1,32,32,3] weights = weight_variable([3, 3, 3, 32]) biases = bias_variable([32]) conv1 = tf.nn.relu(conv3d(x_image, weights) + biases) pool1 = max_pool_2d(conv1)  # [-1,11,11,32] with tf.variable_scope("conv2"): weights = weight_variable([3, 3, 32, 64]) biases = bias_variable([64]) conv2 = tf.nn.relu(conv3d(pool1, weights) + biases) pool2 = max_pool_2d(conv2) # [-1,4,4,64] with tf.variable_scope("fc1"): weights = weight_variable([4 * 4 * 64, 128]) # [-1,1024] biases = bias_variable([128]) fc1_flat = tf.reshape(pool2, [-1, 4 * 4 * 64]) fc1 = tf.nn.relu(tf.matmul(fc1_flat, weights) + biases) fc1_drop = tf.nn.dropout(fc1, 0.5) # [-1,128] with tf.variable_scope("fc2"): weights = weight_variable([128, 4]) biases = bias_variable([4]) fc2 = tf.matmul(fc1_drop, weights) + biases # [-1,4] return fc2

path = '/data/User/zcc/' # 訓練模型 def train_model(): # 訓練數據的佔位符 x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name="x") y_ = tf.placeholder('int64', shape=[None], name="y_") # 學習率 initial_learning_rate = 0.001 # 定義網絡結構，前向傳播，獲得預測輸出 y_fc2 = define_model(x) # 定義訓練集的one-hot標籤 y_label = tf.one_hot(y_, 4, name="y_labels") # 定義損失函數 loss_temp = tf.losses.softmax_cross_entropy(onehot_labels=y_label, logits=y_fc2) cross_entropy_loss = tf.reduce_mean(loss_temp) # 訓練時的優化器 train_step = tf.train.AdamOptimizer(learning_rate=initial_learning_rate, beta1=0.9, beta2=0.999, epsilon=1e-08).minimize(cross_entropy_loss) # 同樣返回True,不然返回False correct_prediction = tf.equal(tf.argmax(y_fc2, 1), tf.argmax(y_label, 1)) # 將correct_prediction，轉換成指定tf.float32類型 accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # 保存模型，這裏作多保存4個模型 saver = tf.train.Saver(max_to_keep=4) # 把預測值加入predict集合 tf.add_to_collection("predict", y_fc2) tf.add_to_collection("acc", accuracy ) # 定義會話 with tf.Session() as sess: # 全部變量初始化 sess.run(tf.global_variables_initializer()) print ("------------------------------------------------------") # 加載訓練數據，這裏的訓練數據是構造的，旨在保存/加載模型的學習 X, Y = load_data(path+"model_conv/") # 這裏須要提早新建一個文件夾 X = np.multiply(X, 1.0 / 255.0) for epoch in range(200): if epoch % 10 == 0: print ("------------------------------------------------------") train_accuracy = accuracy.eval(feed_dict={x: X, y_: Y}) train_loss = cross_entropy_loss.eval(feed_dict={x: X, y_: Y}) print ("after epoch %d, the loss is %6f" % (epoch, train_loss)) # 這裏的正確率是以總體的訓練樣本爲訓練樣例的 print ("after epoch %d, the acc is %6f" % (epoch, train_accuracy)) saver.save(sess, path+"model_conv/my-model", global_step=epoch) print ("save the model") train_step.run(feed_dict={x: X, y_: Y}) print ("------------------------------------------------------")

# 訓練模型
train_model()

訓練結果：

('x_image.shape:', TensorShape([Dimension(None), Dimension(32), Dimension(32), Dimension(3)]))
------------------------------------------------------
Saved dataset to dataset.npz
X_shape:(10, 32, 32, 3)
Y_shape:(10,)
------------------------------------------------------
after epoch 0, the loss is 91.338860
after epoch 0, the acc is 0.200000
save the model
------------------------------------------------------
after epoch 10, the loss is 19.594559
after epoch 10, the acc is 0.200000
save the model
------------------------------------------------------
after epoch 20, the loss is 5.181785
after epoch 20, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 30, the loss is 2.592906
after epoch 30, the acc is 0.400000
save the model
------------------------------------------------------
after epoch 40, the loss is 1.611863
after epoch 40, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 50, the loss is 1.317069
after epoch 50, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 60, the loss is 1.313013
after epoch 60, the acc is 0.400000
save the model
------------------------------------------------------
after epoch 70, the loss is 1.268448
after epoch 70, the acc is 0.200000
save the model
------------------------------------------------------
after epoch 80, the loss is 1.323944
after epoch 80, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 90, the loss is 1.276046
after epoch 90, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 100, the loss is 1.284416
after epoch 100, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 110, the loss is 1.254741
after epoch 110, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 120, the loss is 1.354204
after epoch 120, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 130, the loss is 1.253812
after epoch 130, the acc is 0.300000
save the model
------------------------------------------------------
after epoch 140, the loss is 1.169439
after epoch 140, the acc is 0.200000
save the model
------------------------------------------------------
after epoch 150, the loss is 1.263069
after epoch 150, the acc is 0.500000
save the model
------------------------------------------------------
after epoch 160, the loss is 1.257510
after epoch 160, the acc is 0.400000
save the model
------------------------------------------------------
after epoch 170, the loss is 1.223609
after epoch 170, the acc is 0.500000
save the model
------------------------------------------------------
after epoch 180, the loss is 1.214603
after epoch 180, the acc is 0.500000
save the model
------------------------------------------------------
after epoch 190, the loss is 1.237759
after epoch 190, the acc is 0.500000
save the model
------------------------------------------------------

保存模型的文件夾內容以下：

# 利用保存的模型預測新的值，並計算準確值acc path = '/data/User/zcc/' def load_model(): # 測試數據構造：模擬2張32x32的RGB圖 X = np.array(np.arange(6144, 12288)).reshape(2, 32, 32, 3)   #2:張，32*32：圖片大小，3：RGB Y = [3, 1] Y = np.array(Y) X = X.astype('float32') X = np.multiply(X, 1.0 / 255.0) with tf.Session() as sess: # 加載元圖和權重 saver = tf.train.import_meta_graph(path+'model_conv/my-model-190.meta') saver.restore(sess, tf.train.latest_checkpoint(path+"model_conv/")) # 獲取權重 graph = tf.get_default_graph() #獲取當前默認計算圖 fc2_w = graph.get_tensor_by_name("fc2/w:0") #get_tensor_by_name後面傳入的參數，若是沒有重複，須要在後面加上「:0」 fc2_b = graph.get_tensor_by_name("fc2/b:0") print ("------------------------------------------------------") #print ('fc2_w:',sess.run(fc2_w))能夠打印查看，這裏由於數據太多了，顯示太佔地方了，就不打印了 print ("#######################################") print ('fc2_b:',sess.run(fc2_b)) print ("------------------------------------------------------") # 預測輸出 feed_dict = {"x:0":X, "y_:0":Y} y = graph.get_tensor_by_name("y_labels:0") yy = sess.run(y, feed_dict) #將Y轉爲one-hot類型 print ('yy:',yy) print ("the answer is: ", sess.run(tf.argmax(yy, 1))) print ("------------------------------------------------------") pred_y = tf.get_collection("predict")  #拿到原來模型中的"predict",也就是原來模型中計算獲得結果y_fc2 print('我用加載的模型來預測新輸入的值了！') pred = sess.run(pred_y, feed_dict)[0] #利用原來計算y_fc2的方式計算新餵給網絡的數據，即feed_dict = {"x:0":X, "y_:0":Y} print ('pred:',pred, '\n') #pred是新數據下獲得的預測值 pred = sess.run(tf.argmax(pred, 1)) print ("the predict is: ", pred) print ("------------------------------------------------------") acc = tf.get_collection("acc") #一樣利用原模型中的計算圖acc來計算新預測的準確值 #acc = graph.get_operation_by_name("acc") acc = sess.run(acc, feed_dict) #acc是新數據下獲得的準確值 #print(acc.eval()) print ("the accuracy is: ", acc) print ("------------------------------------------------------")
load_model()

運行結果：

------------------------------------------------------
#######################################
('fc2_b:', array([0.10513018, 0.07008364, 0.15466481, 0.06231203], dtype=float32))
------------------------------------------------------
('yy:', array([[0., 0., 0., 1.],
       [0., 1., 0., 0.]], dtype=float32))
('the answer is: ', array([3, 1]))
------------------------------------------------------
我用加載的模型來預測新輸入的值了！
('pred:', array([[ 0.54676336, -0.07104626, -0.02205519, -0.24077414],
       [ 0.10513018,  0.07008364,  0.15466481,  0.06231203]],
      dtype=float32), '\n')
('the predict is: ', array([0, 2]))
------------------------------------------------------
('the accuracy is: ', [0.0, 0.0])
------------------------------------------------------

4.模型微調fine-tuning

使用已經預訓練好的模型，本身fine-tuning。首先得到pre-traing的graph結構

saver = tf.train.import_meta_graph(path+'my_test_model-1000.meta')

加載參數

saver.restore(sess,tf.train.latest_checkpoint(path))

準備feed_dict:新的訓練數據或者測試數據。這樣就可使用一樣的模型，訓練或者測試不一樣的數據。

若是想在已有的網絡結構上添加新的層，如前面卷積網絡，得到fc2時，而後添加了一個全鏈接層和輸出層。(這裏的添加網絡層沒有進行測試)

# pre-train and fine-tuning fc2 = graph.get_tensor_by_name("fc2/add:0") fc2 = tf.stop_gradient(fc2) # 將模型的一部分進行凍結 fc2_shape = fc2.get_shape().as_list() # fine -tuning new_nums = 6 weights = tf.Variable(tf.truncated_normal([fc2_shape[1], new_nums], stddev=0.1), name="w") biases = tf.Variable(tf.constant(0.1, shape=[new_nums]), name="b") conv2 = tf.matmul(fc2, weights) + biases output2 = tf.nn.softmax(conv2)