AI - TensorFlow - 示例05：保存和恢復模型

時間 2019-12-12

標籤 tensorflow 示例保存恢復模型简体版

原文原文鏈接

保存和恢復模型（Save and restore models）

官網示例：https://www.tensorflow.org/tutorials/keras/save_and_restore_modelspython

在訓練期間保存檢查點

在訓練期間或訓練結束時自動保存檢查點。
權重存儲在檢查點格式的文件集合中，這些文件僅包含通過訓練的權重（採用二進制格式）。
可使用通過訓練的模型，而無需從新訓練該模型，或從上次暫停的地方繼續訓練，以防訓練過程當中斷git

檢查點回調用法：建立檢查點回調，訓練模型並將ModelCheckpoint回調傳遞給該模型，獲得檢查點文件集合，用於分享權重
檢查點回調選項：該回調提供了多個選項，用於爲生成的檢查點提供獨一無二的名稱，以及調整檢查點建立頻率。

手動保存權重

使用 Model.save_weights 方法便可手動保存權重github

保存整個模型

整個模型能夠保存到一個文件中，其中包含權重值、模型配置（架構）、優化器配置。
能夠爲模型設置檢查點，並稍後從徹底相同的狀態繼續訓練，而無需訪問原始代碼。
Keras經過檢查架構來保存模型，使用HDF5標準提供基本的保存格式。
特別注意：api

目前沒法保存TensorFlow優化器（來自tf.train）。
使用此類優化器時，須要在加載模型後對其進行從新編譯，使優化器的狀態變鬆散。

MNIST數據集

MNIST（Mixed National Institute of Standards and Technology database）是一個計算機視覺數據集架構

官方下載地址：http://yann.lecun.com/exdb/mnist/
包含70000張手寫數字的灰度圖片，其中60000張爲訓練圖像和10000張爲測試圖像
每一張圖片都是28*28個像素點大小的灰度圖像
https://keras.io/datasets/#mnist-database-of-handwritten-digits
TensorFlow：https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist

示例

腳本內容

GitHub：https://github.com/anliven/Hello-AI/blob/master/Google-Learn-and-use-ML/5_save_and_restore_models.pyide

  1 # coding=utf-8
  2 import tensorflow as tf
  3 from tensorflow import keras
  4 import numpy as np
  5 import pathlib
  6 import os
  7 
  8 os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  9 print("# TensorFlow version: {}  - tf.keras version: {}".format(tf.VERSION, tf.keras.__version__))  # 查看版本
 10 
 11 # ### 獲取示例數據集
 12 
 13 ds_path = str(pathlib.Path.cwd()) + "\\datasets\\mnist\\"  # 數據集路徑
 14 np_data = np.load(ds_path + "mnist.npz")  # 加載numpy格式數據
 15 print("# np_data keys: ", list(np_data.keys()))  # 查看全部的鍵
 16 
 17 # 加載mnist數據集
 18 (train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data(path=ds_path + "mnist.npz")
 19 train_labels = train_labels[:1000]
 20 test_labels = test_labels[:1000]
 21 train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
 22 test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0
 23 
 24 
 25 # ### 定義模型
 26 def create_model():
 27     model = tf.keras.models.Sequential([
 28         keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(784,)),
 29         keras.layers.Dropout(0.2),
 30         keras.layers.Dense(10, activation=tf.nn.softmax)
 31     ])  # 構建一個簡單的模型
 32     model.compile(optimizer=tf.keras.optimizers.Adam(),
 33                   loss=tf.keras.losses.sparse_categorical_crossentropy,
 34                   metrics=['accuracy'])
 35     return model
 36 
 37 
 38 mod = create_model()
 39 mod.summary()
 40 
 41 # ### 在訓練期間保存檢查點
 42 
 43 # 檢查點回調用法
 44 checkpoint_path = "training_1/cp.ckpt"
 45 checkpoint_dir = os.path.dirname(checkpoint_path)  # 檢查點存放目錄
 46 cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
 47                                                  save_weights_only=True,
 48                                                  verbose=2)  # 建立檢查點回調
 49 model1 = create_model()
 50 model1.fit(train_images, train_labels,
 51            epochs=10,
 52            validation_data=(test_images, test_labels),
 53            verbose=0,
 54            callbacks=[cp_callback]  # 將ModelCheckpoint回調傳遞給該模型
 55            )  # 訓練模型，將建立一個TensorFlow檢查點文件集合，這些文件在每一個週期結束時更新
 56 
 57 model2 = create_model()  # 建立一個未經訓練的全新模型（與原始模型架構相同，才能分享權重）
 58 loss, acc = model2.evaluate(test_images, test_labels)  # 使用測試集進行評估
 59 print("# Untrained model2, accuracy: {:5.2f}%".format(100 * acc))  # 未訓練模型的表現（準確率約爲10%）
 60 
 61 model2.load_weights(checkpoint_path)  # 從檢查點加載權重
 62 loss, acc = model2.evaluate(test_images, test_labels)  # 使用測試集，從新進行評估
 63 print("# Restored model2, accuracy: {:5.2f}%".format(100 * acc))  # 模型表現獲得大幅提高
 64 
 65 # 檢查點回調選項
 66 checkpoint_path2 = "training_2/cp-{epoch:04d}.ckpt"  # 使用「str.format」方式爲每一個檢查點設置惟一名稱
 67 checkpoint_dir2 = os.path.dirname(checkpoint_path)
 68 cp_callback2 = tf.keras.callbacks.ModelCheckpoint(checkpoint_path2,
 69                                                   verbose=1,
 70                                                   save_weights_only=True,
 71                                                   period=5  # 每隔5個週期保存一次檢查點
 72                                                   )  # 建立檢查點回調
 73 model3 = create_model()
 74 model3.fit(train_images, train_labels,
 75            epochs=50,
 76            callbacks=[cp_callback2],  # 將ModelCheckpoint回調傳遞給該模型
 77            validation_data=(test_images, test_labels),
 78            verbose=0)  # 訓練一個新模型，每隔5個週期保存一次檢查點並設置惟一名稱
 79 latest = tf.train.latest_checkpoint(checkpoint_dir2)
 80 print("# latest checkpoint: {}".format(latest))  # 查看最新的檢查點
 81 
 82 model4 = create_model()  # 從新建立一個全新的模型
 83 loss, acc = model2.evaluate(test_images, test_labels)  # 使用測試集進行評估
 84 print("# Untrained model4, accuracy: {:5.2f}%".format(100 * acc))  # 未訓練模型的表現（準確率約爲10%）
 85 
 86 model4.load_weights(latest)  # 加載最新的檢查點
 87 loss, acc = model4.evaluate(test_images, test_labels)  #
 88 print("# Restored model4, accuracy: {:5.2f}%".format(100 * acc))  # 模型表現獲得大幅提高
 89 
 90 # ### 手動保存權重
 91 model5 = create_model()
 92 model5.fit(train_images, train_labels,
 93            epochs=10,
 94            validation_data=(test_images, test_labels),
 95            verbose=0)  # 訓練模型
 96 model5.save_weights('./training_3/my_checkpoint')  # 手動保存權重
 97 
 98 model6 = create_model()
 99 loss, acc = model6.evaluate(test_images, test_labels)
100 print("# Restored model6, accuracy: {:5.2f}%".format(100 * acc))
101 model6.load_weights('./training_3/my_checkpoint')
102 loss, acc = model6.evaluate(test_images, test_labels)
103 print("# Restored model6, accuracy: {:5.2f}%".format(100 * acc))
104 
105 # ### 保存整個模型
106 model7 = create_model()
107 model7.fit(train_images, train_labels, epochs=5)
108 model7.save('my_model.h5')  # 保存整個模型到HDF5文件
109 
110 model8 = keras.models.load_model('my_model.h5')  # 重建徹底同樣的模型，包括權重和優化器
111 model8.summary()
112 loss, acc = model8.evaluate(test_images, test_labels)
113 print("Restored model8, accuracy: {:5.2f}%".format(100 * acc))

運行結果

C:\Users\anliven\AppData\Local\conda\conda\envs\mlcc\python.exe D:/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML/5_save_and_restore_models.py
# TensorFlow version: 1.12.0  - tf.keras version: 2.1.6-tf
# np_data keys:  ['x_test', 'x_train', 'y_train', 'y_test']
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________

Epoch 00001: saving model to training_1/cp.ckpt
Epoch 00002: saving model to training_1/cp.ckpt
Epoch 00003: saving model to training_1/cp.ckpt
Epoch 00004: saving model to training_1/cp.ckpt
Epoch 00005: saving model to training_1/cp.ckpt
Epoch 00006: saving model to training_1/cp.ckpt
Epoch 00007: saving model to training_1/cp.ckpt
Epoch 00008: saving model to training_1/cp.ckpt
Epoch 00009: saving model to training_1/cp.ckpt
Epoch 00010: saving model to training_1/cp.ckpt

  32/1000 [..............................] - ETA: 3s
1000/1000 [==============================] - 0s 140us/step
# Untrained model2, accuracy:  8.20%

  32/1000 [..............................] - ETA: 0s
1000/1000 [==============================] - 0s 40us/step
# Restored model2, accuracy: 86.40%

Epoch 00005: saving model to training_2/cp-0005.ckpt
Epoch 00010: saving model to training_2/cp-0010.ckpt
Epoch 00015: saving model to training_2/cp-0015.ckpt
Epoch 00020: saving model to training_2/cp-0020.ckpt
Epoch 00025: saving model to training_2/cp-0025.ckpt
Epoch 00030: saving model to training_2/cp-0030.ckpt
Epoch 00035: saving model to training_2/cp-0035.ckpt
Epoch 00040: saving model to training_2/cp-0040.ckpt
Epoch 00045: saving model to training_2/cp-0045.ckpt
Epoch 00050: saving model to training_2/cp-0050.ckpt

# latest checkpoint: training_1\cp.ckpt

  32/1000 [..............................] - ETA: 3s
1000/1000 [==============================] - 0s 140us/step
# Untrained model4, accuracy: 86.40%

  32/1000 [..............................] - ETA: 2s
1000/1000 [==============================] - 0s 110us/step
# Restored model4, accuracy: 86.40%

  32/1000 [..............................] - ETA: 5s
1000/1000 [==============================] - 0s 220us/step
# Restored model6, accuracy: 18.20%

  32/1000 [..............................] - ETA: 0s
1000/1000 [==============================] - 0s 40us/step
# Restored model6, accuracy: 87.40%
Epoch 1/5

  32/1000 [..............................] - ETA: 9s - loss: 2.4141 - acc: 0.0625
 320/1000 [========>.....................] - ETA: 0s - loss: 1.8229 - acc: 0.4469
 576/1000 [================>.............] - ETA: 0s - loss: 1.4932 - acc: 0.5694
 864/1000 [========================>.....] - ETA: 0s - loss: 1.2624 - acc: 0.6481
1000/1000 [==============================] - 1s 530us/step - loss: 1.1978 - acc: 0.6620
Epoch 2/5

  32/1000 [..............................] - ETA: 0s - loss: 0.5490 - acc: 0.8750
 320/1000 [========>.....................] - ETA: 0s - loss: 0.4832 - acc: 0.8594
 576/1000 [================>.............] - ETA: 0s - loss: 0.4630 - acc: 0.8715
 864/1000 [========================>.....] - ETA: 0s - loss: 0.4356 - acc: 0.8808
1000/1000 [==============================] - 0s 200us/step - loss: 0.4298 - acc: 0.8790
Epoch 3/5

  32/1000 [..............................] - ETA: 0s - loss: 0.1681 - acc: 0.9688
 320/1000 [========>.....................] - ETA: 0s - loss: 0.2826 - acc: 0.9437
 576/1000 [================>.............] - ETA: 0s - loss: 0.2774 - acc: 0.9340
 832/1000 [=======================>......] - ETA: 0s - loss: 0.2740 - acc: 0.9327
1000/1000 [==============================] - 0s 200us/step - loss: 0.2781 - acc: 0.9280
Epoch 4/5

  32/1000 [..............................] - ETA: 0s - loss: 0.1589 - acc: 0.9688
 288/1000 [=======>......................] - ETA: 0s - loss: 0.2169 - acc: 0.9410
 608/1000 [=================>............] - ETA: 0s - loss: 0.2186 - acc: 0.9457
 864/1000 [========================>.....] - ETA: 0s - loss: 0.2231 - acc: 0.9479
1000/1000 [==============================] - 0s 200us/step - loss: 0.2164 - acc: 0.9480
Epoch 5/5

  32/1000 [..............................] - ETA: 0s - loss: 0.1095 - acc: 1.0000
 352/1000 [=========>....................] - ETA: 0s - loss: 0.1631 - acc: 0.9744
 608/1000 [=================>............] - ETA: 0s - loss: 0.1671 - acc: 0.9638
 864/1000 [========================>.....] - ETA: 0s - loss: 0.1545 - acc: 0.9688
1000/1000 [==============================] - 0s 210us/step - loss: 0.1538 - acc: 0.9670
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_14 (Dense)             (None, 512)               401920    
_________________________________________________________________
dropout_7 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_15 (Dense)             (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________

  32/1000 [..............................] - ETA: 3s
1000/1000 [==============================] - 0s 150us/step
Restored model8, accuracy: 86.10%

Process finished with exit code 0

生成的文件測試

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$ ll training_1
total 1601
-rw-r--r-- 1 anliven 197121      71 5月   5 23:36 checkpoint
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:36 cp.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:36 cp.ckpt.index

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$ ls -l training_1
total 1601
-rw-r--r-- 1 anliven 197121      71 5月   5 23:36 checkpoint
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:36 cp.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:36 cp.ckpt.index

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$ ls -l training_2
total 16001
-rw-r--r-- 1 anliven 197121      81 5月   5 23:37 checkpoint
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:36 cp-0005.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:36 cp-0005.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:36 cp-0010.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:36 cp-0010.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:36 cp-0015.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:36 cp-0015.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:36 cp-0020.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:36 cp-0020.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:36 cp-0025.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:36 cp-0025.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:37 cp-0030.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:37 cp-0030.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:37 cp-0035.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:37 cp-0035.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:37 cp-0040.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:37 cp-0040.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:37 cp-0045.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:37 cp-0045.ckpt.index
-rw-r--r-- 1 anliven 197121 1631508 5月   5 23:37 cp-0050.ckpt.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:37 cp-0050.ckpt.index

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$ ls -l training_3
total 1601
-rw-r--r-- 1 anliven 197121      83 5月   5 23:37 checkpoint
-rw-r--r-- 1 anliven 197121 1631517 5月   5 23:37 my_checkpoint.data-00000-of-00001
-rw-r--r-- 1 anliven 197121     647 5月   5 23:37 my_checkpoint.index

anliven@ANLIVEN MINGW64 /d/Anliven/Anliven-Code/PycharmProjects/Google-Learn-and-use-ML
$ ls -l my_model.h5
-rw-r--r-- 1 anliven 197121 4909112 5月   5 23:37 my_model.h5

問題處理

問題描述：出現以下告警信息。優化

WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x00000280FD318780>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.

Consider using a TensorFlow optimizer from `tf.train`.

問題處理：lua

正常告警，對腳本運行和結果無影響，暫不關注。spa

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。