一般爲了模型能更好的收斂,隨着訓練的進行,但願可以減少學習率,以使得模型可以更好地收斂,找到loss最低的那個點.python
tensorflow中提供了多種學習率的調整方式.在https://www.tensorflow.org/api_docs/python/tf/compat/v1/train搜索decay.能夠看到有多種學習率的衰減策略.api
本文介紹兩種學習率衰減策略,指數衰減和多項式衰減.app
tf.compat.v1.train.exponential_decay( learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None )
learning_rate 初始學習率
global_step 當前總共訓練多少個迭代
decay_steps 每xxx steps後變動一次學習率
decay_rate 用以計算變動後的學習率
staircase: global_step/decay_steps的結果是float型仍是向下取整學習
學習率的計算公式爲:decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)測試
咱們用一段測試代碼來繪製一下學習率的變化狀況.this
#coding=utf-8 import matplotlib.pyplot as plt import tensorflow as tf x=[] y=[] N = 200 #總共訓練200個迭代 num_epoch = tf.Variable(0, name='global_step', trainable=False) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for num_epoch in range(N): ##初始學習率0.5,每10個迭代更新一次學習率. learing_rate_decay = tf.train.exponential_decay(learning_rate=0.5, global_step=num_epoch, decay_steps=10, decay_rate=0.9, staircase=False) learning_rate = sess.run([learing_rate_decay]) y.append(learning_rate) #print(y) x = range(N) fig = plt.figure() ax.set_xlabel('step') ax.set_ylabel('learing rate') plt.plot(x, y, 'r', linewidth=2) plt.show()
結果如圖:
.net
tf.compat.v1.train.polynomial_decay( learning_rate, global_step, decay_steps, end_learning_rate=0.0001, power=1.0, cycle=False, name=None )
設定一個初始學習率,一個終止學習率,而後線性衰減.cycle控制衰減到end_learning_rate後是否保持這個最小學習率不變,仍是循環往復. 太小的學習率會致使收斂到局部最優解,循環往復能夠必定程度上避免這個問題.
根據cycle是否爲true,其計算方式不一樣,以下:
code
#coding=utf-8 import matplotlib.pyplot as plt import tensorflow as tf x=[] y=[] z=[] N = 200 #總共訓練200個迭代 num_epoch = tf.Variable(0, name='global_step', trainable=False) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for num_epoch in range(N): ##初始學習率0.5,每10個迭代更新一次學習率. learing_rate_decay = tf.train.polynomial_decay(learning_rate=0.5, global_step=num_epoch, decay_steps=10, end_learning_rate=0.0001, cycle=False) learning_rate = sess.run([learing_rate_decay]) y.append(learning_rate) learing_rate_decay2 = tf.train.polynomial_decay(learning_rate=0.5, global_step=num_epoch, decay_steps=10, end_learning_rate=0.0001, cycle=True) learning_rate2 = sess.run([learing_rate_decay2]) z.append(learning_rate2) #print(y) x = range(N) fig = plt.figure() ax.set_xlabel('step') ax.set_ylabel('learing rate') plt.plot(x, y, 'r', linewidth=2) plt.plot(x, z, 'g', linewidth=2) plt.show()
繪圖結果以下:
cycle爲false時對應紅線,學習率降低到0.0001後再也不降低. cycle=true時,降低到0.0001後再突變到一個更大的值,在繼續衰減,循環往復.blog
在代碼裏,一般經過參數去控制不一樣的學習率策略,例如utf-8
def _configure_learning_rate(num_samples_per_epoch, global_step): """Configures the learning rate. Args: num_samples_per_epoch: The number of samples in each epoch of training. global_step: The global_step tensor. Returns: A `Tensor` representing the learning rate. Raises: ValueError: if """ # Note: when num_clones is > 1, this will actually have each clone to go # over each epoch FLAGS.num_epochs_per_decay times. This is different # behavior from sync replicas and is expected to produce different results. decay_steps = int(num_samples_per_epoch * FLAGS.num_epochs_per_decay / FLAGS.batch_size) if FLAGS.sync_replicas: decay_steps /= FLAGS.replicas_to_aggregate if FLAGS.learning_rate_decay_type == 'exponential': return tf.train.exponential_decay(FLAGS.learning_rate, global_step, decay_steps, FLAGS.learning_rate_decay_factor, staircase=True, name='exponential_decay_learning_rate') elif FLAGS.learning_rate_decay_type == 'fixed': return tf.constant(FLAGS.learning_rate, name='fixed_learning_rate') elif FLAGS.learning_rate_decay_type == 'polynomial': return tf.train.polynomial_decay(FLAGS.learning_rate, global_step, decay_steps, FLAGS.end_learning_rate, power=1.0, cycle=False, name='polynomial_decay_learning_rate') else: raise ValueError('learning_rate_decay_type [%s] was not recognized' % FLAGS.learning_rate_decay_type)
推薦一篇:http://www.javashuo.com/article/p-yiehgvxm-gq.html 對各類學習率衰減策略描述的很詳細.而且都有配圖,能夠很直觀地看到各類衰減策略下學習率變換狀況.