學習率:表示了每次更新參數的幅度大小。學習率過大,會致使待優化的參數在最小值附近波動,不收斂;學習率太小,會致使待優化的參數收斂緩慢。
在訓練過程當中,參數的更新相紙損失函數梯度降低的方向。參數的更新公式爲:
假設損失函數爲 loss = (w + 1)2。梯度是損失函數 loss 的導數爲 ∇=2w+2。如參數初值爲 5,學習率爲 0.2,則參數和損失函數更新以下:python
1 次 參數 w: 5 - 0.2 * (2 * 5 + 2) = 2.6 2 次 參數 w: 2.6 - 0.2 * (2 * 2.6 + 2) = 1.16 3 次 參數 w: 1.16 – 0.2 * (2 * 1.16 + 2) = 0.296 4 次 參數 w: 0.296
損失函數 loss = (w + 1)2 的圖像爲:函數
由圖可知,損失函數loss的最小值會在(-1,0)處獲得,此時損失函數的倒數爲0,獲得最終參數w = -1。代碼以下學習
#coding:utf-8 #設損失函數 loss=(w+1)^2,令w初始值爲5。反向傳播就是求最優w,即最小loss對應的w值 import tensorflow as tf w = tf.Variable(tf.constant(5, dtype=tf.float32)) loss = tf.square(w+1) train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss) with tf.Session() as sess: init_op = tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) w_val = sess.run(w) loss_val = sess.run(loss) print "After %s steps: w is %f, loss is %f." %(i, w_val, loss_val)
運行結果以下:優化
After 35 steps: w is -1.000000, loss is 0.000000. After 36 steps: w is -1.000000, loss is 0.000000. After 37 steps: w is -1.000000, loss is 0.000000. After 38 steps: w is -1.000000, loss is 0.000000. After 39 steps: w is -1.000000, loss is 0.000000.
若上述代碼學習率設置爲1,則運行結果以下:code
After 35 steps: w is 5.000000, loss is 36.000000. After 36 steps: w is -7.000000, loss is 36.000000. After 37 steps: w is 5.000000, loss is 36.000000. After 38 steps: w is -7.000000, loss is 36.000000. After 39 steps: w is 5.000000, loss is 36.000000.
若學習率設置爲0.001,則運行結果以下:
blog
After 35 steps: w is 4.582785, loss is 31.167484. After 36 steps: w is 4.571619, loss is 31.042938. After 37 steps: w is 4.560476, loss is 30.918892. After 38 steps: w is 4.549355, loss is 30.795341. After 39 steps: w is 4.538256, loss is 30.672281.
指數衰減學習率:隨着訓練輪數變化而動態更新
學習率計算公式以下:
用 Tensorflow 的函數表示爲:utf-8
global_step = tf.Variable(0, trainable=False) learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True/False)
其中, LEARNING_RATE_BASE 爲學習率初始值, LEARNING_RATE_DECAY 爲學習率衰減率,global_step 記錄了當前訓練輪數,爲不可訓練型參數。學習率 learning_rate 更新頻率爲輸入數據集總樣本數除以每次喂入樣本數。若 staircase 設置爲 True 時,表示 global_step/learning rate step 取整數,學習率階梯型衰減;若 staircase 設置爲 false 時,學習率會是一條平滑降低的曲線。
例如:在本例中,模型訓練過程不設定固定的學習率,使用指數衰減學習率進行訓練。其中,學習率初值設置爲 0.1,學習率衰減率設置爲 0.99, BATCH_SIZE 設置爲 1。it
#coding:utf-8 #設損失函數 loss=(w+1)^2,令w初始值爲10。反向傳播就是求最優w,即求最小loss對應的w值 #使用指數衰減的學習率,在迭代初期獲得較高的降低速度,能夠在較小的訓練輪數下取得更有效的收斂速度。 import tensorflow as tf LEARNING_RATE_BASE = 0.1 #最初學習率 LEARNING_RATE_DECAY = 0.99 #學習率衰減率 LEARNING_RATE_STEP = 1 #喂入多少輪BATCH_SIZE後,更新一此學習率,通常爲:總樣本數/BATCH_SIZE #運行了幾輪BATCH_SIZE的計數器,初始值爲0,設爲不被訓練 global_step = tf.Variable(0, trainable=False) #定義指數降低學習率 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, LEARNING_RATE_STEP, LEARNING_RATE_DECAY, staircase=True) w = tf.Variable(tf.constant(5, dtype=tf.float32)) loss = tf.square(w+1) train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step = global_step) with tf.Session() as sess: init_op = tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) learning_rate_val = sess.run(learning_rate) global_step_val = sess.run(global_step) w_val = sess.run(w) loss_val = sess.run(loss) print "After %s steps: gloabal_step is %f, w is %f, learning rate is %f, loss is %f" %(i, global_step_val, w_val, learning_rate_val, loss_val
運行結果以下:io
After 35 steps: gloabal_step is 36.000000, w is -0.992297, learning rate is 0.069641, loss is 0.000059 After 36 steps: gloabal_step is 37.000000, w is -0.993369, learning rate is 0.068945, loss is 0.000044 After 37 steps: gloabal_step is 38.000000, w is -0.994284, learning rate is 0.068255, loss is 0.000033 After 38 steps: gloabal_step is 39.000000, w is -0.995064, learning rate is 0.067573, loss is 0.000024 After 39 steps: gloabal_step is 40.000000, w is -0.995731, learning rate is 0.066897, loss is 0.000018
由結果能夠看出,隨着訓練輪數增長學習率在不斷減少class