爲了使訓練模型在測試數據上有更好的效果,能夠引入一種新的方法:滑動平均模型。經過維護一個影子變量,來代替最終訓練參數,進行訓練模型的驗證。python
在tensorflow中提供了ExponentialMovingAverage來實行滑動平均模型,模型會維護一個影子變量,其計算公式爲:app
shadow_variable = decay * shadow_variable + (1 - decay) * variable函數
當訓練模型時,維護訓練參數的滑動平均值是有好處的。相比較最終訓練值,驗證時使用滑動平均值有時能產生更好的結果。
apply()函數方法會添加一個影子拷貝到訓練變量中,而後在他們影子副本上維護訓練參數的滑動平均值計算操做。這個操做一般在一輪訓練完以後進行。
average()和average_name()函數提供了訪問影子變量和他們名字的方法。這在構建一個評估模型或者從checkpoint文件中重載模型時很是有用。在驗證時,能夠幫助使用滑動平均值替換最後訓練值。要使用這個模型,須要有3個步驟:測試
一、 建立一個滑動平均模型對象ui
step = tf.Variable(initial_value=0,dtype=tf.float32,trainable=False)
ema = tf.train.ExponentialMovingAverage(decay=0.99,num_updates=step)
decay就是前面公式裏面的衰減所以,合理的decay值能夠是接近1.0,例如0.999,0.9999等多個9中變換。num_updates爲一個可選的參數,decay值由以下公式決定:this
min(decay, (1 + num_updates) / (10 + num_updates))。目的是使影子變量在剛開始訓練的時候,更新的更快。 所以num_updates一般能夠傳入一個遞增的訓練步數變量。lua
二、 加入訓練參數列表到模型中進行維護spa
新建兩個訓練參數,並將其加入滑動平均模型對象中維護,apply()函數接受一個參數列表。rest
var0 = tf.Variable(initial_value=0,dtype=tf.float32,trainable=False) var1 = tf.Variable(initial_value=0,dtype=tf.float32,trainable=False) maintain_averages_op = ema.apply([var0,var1])
三、 訓練完成之後,更新滑動平均模型中各個影子變量的值code
sess.run(maintain_averages_op) print(sess.run([var0,ema.average(var0),var1,ema.average(var1)])) # 輸出[10,4.555,10,9.01]
完整的滑動平均模型測試樣例以下:
# 導入tensorflow庫 import tensorflow as tf # 建立一個滑動平均模型對象 step = tf.Variable(initial_value=0,dtype=tf.float32,trainable=False) ema = tf.train.ExponentialMovingAverage(decay=0.99,num_updates=step) # 建立兩個訓練參數,並將其加入滑動平均模型對象中,對象會爲這兩個訓練參數建立兩個影子變量 # 影子變量shadow_variable = decay * shadow_variable + (1 - decay) * variable # 若是滑動平均模型對象建立時,指定了num_updates,則decay = min{decay,(1 + num_updates)/(10 + num_updates)} var0 = tf.Variable(initial_value=0,dtype=tf.float32,trainable=False) var1 = tf.Variable(initial_value=0,dtype=tf.float32,trainable=False) maintain_averages_op = ema.apply([var0,var1]) # 測試更新影子變量值 with tf.Session() as sess: init = tf.global_variables_initializer() sess.run(init) # 第一次初始滑動平均 sess.run(maintain_averages_op) # decay = min(0.99,0.1) = 0.1 # 初始時: # shadow_variable_var0 = var0 = 0 # shadow_variable_var1 = var1 = 0 print(sess.run([var0,ema.average(var0),var1,ema.average(var1)])) # 第二次更新滑動平均 sess.run(tf.assign(var0,5.0)) sess.run(tf.assign(var1, 10.0)) # decay = min(0.99,(1+0)/(10+0)) = 0.1 # shadow_variable_var0 = decay * shadow_variable + (1 - decay) * variable = 0.1*0 + (1-0.1)*5 = 4.5 # shadow_variable_var1 = 9.0 sess.run(maintain_averages_op) print(sess.run([var0,ema.average(var0),var1,ema.average(var1)])) # 輸出[5.0,4.5,10,9.0] # 第三次更新滑動平均 sess.run(tf.assign(step,10000)) sess.run(tf.assign(var0,10)) # decay = min(0.99,(1+10000)/(10+10000)) = 0.99 # shadow_variable_var0 = decay * shadow_variable + (1 - decay) * variable = 0.99*4.5 + (1-0.99)*10 = 4.555 # shadow_variable_var1 = 0.99*9.0+(1-0.99)*10 = 9.01 sess.run(maintain_averages_op) print(sess.run([var0,ema.average(var0),var1,ema.average(var1)])) # 輸出[10,4.555,10,9.01] # 第四次更新滑動平均 # decay = min(0.99,(1+10000)/(10+10000)) = 0.99 # shadow_variable_var0 = decay * shadow_variable + (1 - decay) * variable = 0.99*4.555 + (1-0.99)*10 = 4.60945 # shadow_variable_var1 = 0.99*9.01+(1-0.99)*10 = 9.0199 sess.run(maintain_averages_op) print(sess.run([var0, ema.average(var0), var1, ema.average(var1)])) # 輸出[10,4.60945,10,9.0199]
下面是tensorflow官方給出的兩種滑動模型使用場景:
Example usage when creating a training model: ```python # Create variables. var0 = tf.Variable(...) var1 = tf.Variable(...) # ... use the variables to build a training model... ... # Create an op that applies the optimizer. This is what we usually # would use as a training op. opt_op = opt.minimize(my_loss, [var0, var1]) # Create an ExponentialMovingAverage object ema = tf.train.ExponentialMovingAverage(decay=0.9999) with tf.control_dependencies([opt_op]): # Create the shadow variables, and add ops to maintain moving averages # of var0 and var1. This also creates an op that will update the moving # averages after each training step. This is what we will use in place # of the usual training op. training_op = ema.apply([var0, var1]) ...train the model by running training_op... ``` There are two ways to use the moving averages for evaluations: * Build a model that uses the shadow variables instead of the variables. For this, use the `average()` method which returns the shadow variable for a given variable. * Build a model normally but load the checkpoint files to evaluate by using the shadow variable names. For this use the `average_name()` method. See the @{tf.train.Saver} for more information on restoring saved variables. Example of restoring the shadow variable values: ```python # Create a Saver that loads variables from their saved shadow values. shadow_var0_name = ema.average_name(var0) shadow_var1_name = ema.average_name(var1) saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1}) saver.restore(...checkpoint filename...) # var0 and var1 now hold the moving average values ``` """