TensorFlow筆記之神經網絡優化——學習率

學習率:表示了每次更新參數的幅度大小。學習率過大,會致使待優化的參數在最小值附近波動,不收斂;學習率太小,會致使待優化的參數收斂緩慢。
       在訓練過程當中,參數的更新相紙損失函數梯度降低的方向。參數的更新公式爲:
       假設損失函數爲 loss = (w + 1)2。梯度是損失函數 loss 的導數爲 ∇=2w+2。如參數初值爲 5,學習率爲 0.2,則參數和損失函數更新以下:python

       1 次     參數 w:       5 - 0.2 * (2 * 5 + 2) = 2.6
       2 次     參數 w:      2.6 - 0.2 * (2 * 2.6 + 2) = 1.16
       3 次     參數 w:      1.16 – 0.2 * (2 * 1.16 + 2) = 0.296
       4 次     參數 w:      0.296


       損失函數 loss = (w + 1)2 的圖像爲:函數

                          
       由圖可知,損失函數loss的最小值會在(-1,0)處獲得,此時損失函數的倒數爲0,獲得最終參數w = -1。代碼以下學習

#coding:utf-8
#設損失函數 loss=(w+1)^2,令w初始值爲5。反向傳播就是求最優w,即最小loss對應的w值
import tensorflow as tf
 
w = tf.Variable(tf.constant(5, dtype=tf.float32))
 
loss = tf.square(w+1)
 
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
 
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(40):
        sess.run(train_step)
        w_val = sess.run(w)
        loss_val = sess.run(loss)
        print "After %s steps: w is %f, loss is %f." %(i, w_val, loss_val)


       運行結果以下:優化

After 35 steps: w is -1.000000, loss is 0.000000.
After 36 steps: w is -1.000000, loss is 0.000000.
After 37 steps: w is -1.000000, loss is 0.000000.
After 38 steps: w is -1.000000, loss is 0.000000.
After 39 steps: w is -1.000000, loss is 0.000000.


       若上述代碼學習率設置爲1,則運行結果以下:code

After 35 steps: w is 5.000000, loss is 36.000000.
After 36 steps: w is -7.000000, loss is 36.000000.
After 37 steps: w is 5.000000, loss is 36.000000.
After 38 steps: w is -7.000000, loss is 36.000000.
After 39 steps: w is 5.000000, loss is 36.000000.


       若學習率設置爲0.001,則運行結果以下:
 blog

After 35 steps: w is 4.582785, loss is 31.167484.
After 36 steps: w is 4.571619, loss is 31.042938.
After 37 steps: w is 4.560476, loss is 30.918892.
After 38 steps: w is 4.549355, loss is 30.795341.
After 39 steps: w is 4.538256, loss is 30.672281.

指數衰減學習率:隨着訓練輪數變化而動態更新
學習率計算公式以下:
       
用 Tensorflow 的函數表示爲:utf-8

global_step = tf.Variable(0, trainable=False)
learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True/False)


其中, LEARNING_RATE_BASE 爲學習率初始值, LEARNING_RATE_DECAY 爲學習率衰減率,global_step 記錄了當前訓練輪數,爲不可訓練型參數。學習率 learning_rate 更新頻率爲輸入數據集總樣本數除以每次喂入樣本數。若 staircase 設置爲 True 時,表示 global_step/learning rate step 取整數,學習率階梯型衰減;若 staircase 設置爲 false 時,學習率會是一條平滑降低的曲線。
       例如:在本例中,模型訓練過程不設定固定的學習率,使用指數衰減學習率進行訓練。其中,學習率初值設置爲 0.1,學習率衰減率設置爲 0.99, BATCH_SIZE 設置爲 1。it

#coding:utf-8
#設損失函數 loss=(w+1)^2,令w初始值爲10。反向傳播就是求最優w,即求最小loss對應的w值
#使用指數衰減的學習率,在迭代初期獲得較高的降低速度,能夠在較小的訓練輪數下取得更有效的收斂速度。
import tensorflow as tf
 
LEARNING_RATE_BASE = 0.1 #最初學習率
LEARNING_RATE_DECAY = 0.99 #學習率衰減率
LEARNING_RATE_STEP = 1 #喂入多少輪BATCH_SIZE後,更新一此學習率,通常爲:總樣本數/BATCH_SIZE
 
#運行了幾輪BATCH_SIZE的計數器,初始值爲0,設爲不被訓練
global_step = tf.Variable(0, trainable=False)
#定義指數降低學習率
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, LEARNING_RATE_STEP, LEARNING_RATE_DECAY, staircase=True)
 
w = tf.Variable(tf.constant(5, dtype=tf.float32))
 
loss = tf.square(w+1)
 
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step = global_step)
 
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(40):
        sess.run(train_step)
        learning_rate_val = sess.run(learning_rate)
        global_step_val = sess.run(global_step)
        w_val = sess.run(w)
        loss_val = sess.run(loss)
        print "After %s steps: gloabal_step is %f, w is %f, learning rate is %f, loss is %f" %(i, global_step_val, w_val, learning_rate_val, loss_val


       運行結果以下:io

After 35 steps: gloabal_step is 36.000000, w is -0.992297, learning rate is 0.069641, loss is 0.000059
After 36 steps: gloabal_step is 37.000000, w is -0.993369, learning rate is 0.068945, loss is 0.000044
After 37 steps: gloabal_step is 38.000000, w is -0.994284, learning rate is 0.068255, loss is 0.000033
After 38 steps: gloabal_step is 39.000000, w is -0.995064, learning rate is 0.067573, loss is 0.000024
After 39 steps: gloabal_step is 40.000000, w is -0.995731, learning rate is 0.066897, loss is 0.000018


       由結果能夠看出,隨着訓練輪數增長學習率在不斷減少class

相關文章
相關標籤/搜索