個人 tensorflow+keras 版本:python
print(tf.VERSION) # '1.10.0' print(tf.keras.__version__) # '2.1.6-tf'
tf.keras 沒有實現 AdamW,即 Adam with Weight decay。論文《DECOUPLED WEIGHT DECAY REGULARIZATION》提出,在使用 Adam 時,weight decay 不等於 L2 regularization。具體能夠參見 當前訓練神經網絡最快的方式:AdamW優化算法+超級收斂 或 L2正則=Weight Decay?並非這樣。git
keras 中沒有實現 AdamW 這個 optimizer,而 tensorflow 中實現了,因此在 tf.keras 中引入 tensorflow 的 optimizer 就好。github
以下所示:算法
import tensorflow as tf from tensorflow.contrib.opt import AdamWOptimizer mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) # adam = tf.train.AdamOptimizer() # adam with weight decay adamw = AdamWOptimizer(weight_decay=1e-4) model.compile(optimizer=adamw, loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=10, validation_split=0.1) print(model.evaluate(x_test, y_test))
若是隻是像上面這樣使用的話,已經沒問題了。可是若是要加入 tf.keras.callbacks 中的某些元素,如 tf.keras.callbacks.ReduceLROnPlateau(),可能就會出現異常 AttributeError: 'TFOptimizer' object has no attribute 'lr'。網絡
如下代碼將出現 AttributeError: 'TFOptimizer' object has no attribute 'lr',就是由於加入了 tf.keras.callbacks.ReduceLROnPlateau(),其它兩個 callbacks 不會引起異常。優化
import tensorflow as tf from tensorflow.contrib.opt import AdamWOptimizer mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) # 按照 val_acc 的值來保存模型的參數,val_acc 有提高才保存新的參數 ck_callback = tf.keras.callbacks.ModelCheckpoint('checkpoints/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5', monitor='val_acc', mode='max', verbose=1, save_best_only=True, save_weights_only=True) # 使用 tensorboard 監控訓練過程 tb_callback = tf.keras.callbacks.TensorBoard(log_dir='logs') # 在 patience 個 epochs 內,被監控的 val_loss 都沒有降低,那麼就下降 learning rate,新的值爲 lr = factor * lr_old lr_callback = tf.keras.callbacks.ReduceLROnPlateau(patience=3) adam = tf.train.AdamOptimizer() # adam with weight decay # adamw = AdamWOptimizer(weight_decay=1e-4) model.compile(optimizer=adam, loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=10, validation_split=0.1, callbacks=[ck_callback, tb_callback, lr_callback]) print(model.evaluate(x_test, y_test))
解決辦法以下所示:lua
import tensorflow as tf from tensorflow.contrib.opt import AdamWOptimizer from tensorflow.keras import backend as K from tensorflow.python.keras.optimizers import TFOptimizer mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) # 按照 val_acc 的值來保存模型的參數,val_acc 有提高才保存新的參數 ck_callback = tf.keras.callbacks.ModelCheckpoint('checkpoints/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5', monitor='val_acc', mode='max', verbose=1, save_best_only=True, save_weights_only=True) # 使用 tensorboard 監控訓練過程 tb_callback = tf.keras.callbacks.TensorBoard(log_dir='logs') # 在 patience 個 epochs 內,被監控的 val_loss 都沒有降低,那麼就下降 learning rate,新的值爲 lr = factor * lr_old lr_callback = tf.keras.callbacks.ReduceLROnPlateau(patience=3) learning_rate = 0.001 learning_rate = K.variable(learning_rate) # adam = tf.train.AdamOptimizer() # # 在 tensorflow 1.10 版中,TFOptimizer 在 tensorflow.python.keras.optimizers 中能夠找到,而 tensorflow.keras.optimizers 中沒有 # adam = TFOptimizer(adam) # adam.lr = learning_rate # adam with weight decay adamw = AdamWOptimizer(weight_decay=1e-4) adamw = TFOptimizer(adamw) adamw.lr = learning_rate model.compile(optimizer=adamw, loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=10, validation_split=0.1, callbacks=[ck_callback, tb_callback, lr_callback]) print(model.evaluate(x_test, y_test))
用 TFOptimizer 包裹一層就好了,這樣在使用 tf.keras.callbacks.ReduceLROnPlateau() 時也沒有問題了。spa
在導入 TFOptimizer 時,注意它所在的位置。1.10 版本的 tensorflow 導入 keras 就有兩種方式——tensorflow.keras 和 tensorflow.python.keras,這樣其實有點混亂,而 TFOptimizer 的導入只在後者能找到。(有點神奇。。。彷佛 1.14 版本 tensorflow 去掉了第一種導入方式,但 tensorflow 2.0 又有了。。。)code
當前訓練神經網絡最快的方式:AdamW優化算法+超級收斂 -- 機器之心
L2正則=Weight Decay?並非這樣 -- 楊鎰銘
ReduceLROnPlateau with native optimizer: 'TFOptimizer' object has no attribute 'lr' #20619get