原文地址:app
https://blog.csdn.net/qq_23981335/article/details/89097757dom
---------------------
做者:周衛林
來源:CSDN函數
-----------------------------------------------------------------------------------------------性能
1.構建LSTM
在tensorflow中,存在兩個庫函數能夠構建LSTM,分別爲tf.nn.rnn_cell.BasicLSTMCell和tf.contrib.rnn.BasicLSTMCell,最常使用的參數是num_units,表示的是LSTM中隱含狀態的維度,state_in_tuple表示將(c,h)表示爲一個元組。
spa
lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(num_units=hidden_size)
2.初始化隱含狀態
LSTM的輸入不只有數據輸入,還有前一個時刻的狀態輸入,所以須要初始化輸入狀態.net
initial_state=lstm_cell.zero_state(batch_size,dtype=tf.float32)
3.添加dropout層
能夠在基本的LSTM上添加dropout層code
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.keep_prob)
4.多層LSTMorm
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell]*hidden_layer_num)
其中hidden_layer_num爲LSTM的層數 blog
5.完整代碼get
(1)原理表達最清楚、最一目瞭然的LSTM構建方式以下:
import tensorflow as tf import numpy as np batch_size=2 hidden_size=64 num_steps=10 input_dim=8 input=np.random.randn(batch_size,num_steps,input_dim) input[1,6:]=0 x=tf.placeholder(dtype=tf.float32,shape=[batch_size,num_steps,input_dim],name='input_x') lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(num_units=hidden_size) initial_state=lstm_cell.zero_state(batch_size,dtype=tf.float32) outputs=[] with tf.variable_scope('RNN'): for i in range(num_steps): if i > 0 : # print(tf.get_variable_scope()) tf.get_variable_scope().reuse_variables() output=lstm_cell(x[:,i,:],initial_state) outputs.append(output) with tf.Session() as sess: init_op=tf.initialize_all_variables() sess.run(init_op) np.set_printoptions(threshold=np.NAN) result=sess.run(outputs,feed_dict={x:input}) print(result)
(2)簡化構建形式
若是以爲寫for循環比較麻煩,則可使用tf.nn.static_rnn函數,這個函數就是使用for循環實現的LSTM ,可是須要注意的是該函數的參數設置:
tf.nn.static_rnn( cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None )
其中cell即爲LSTM,inputs的維度必須爲 [ num_steps, batch_size, input_dim ] ,sequence_length爲batch_size個輸入的長度。
完整代碼以下:
import tensorflow as tf import numpy as np batch_size=2 num_units=64 num_steps=10 input_dim=8 input=np.random.randn(batch_size,num_steps,input_dim) input[1,6:]=0 x=tf.placeholder(dtype=tf.float32,shape=[batch_size,num_steps,input_dim],name='input_x') lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(num_units) initial_state=lstm_cell.zero_state(batch_size,dtype=tf.float32) y=tf.unstack(x,axis=1) # x:[batch_size,num_steps,input_dim],type:placeholder # y:[num_steps,batch_size,input_dim],type:list output,state=tf.nn.static_rnn(lstm_cell,y,sequence_length=[10,6],initial_state=initial_state) with tf.Session() as sess: init_op=tf.initialize_all_variables() sess.run(init_op) np.set_printoptions(threshold=np.NAN) result1,result2=(sess.run([output,state],feed_dict={x:input})) result1=np.asarray(result1) result2=np.asarray(result2) print(result1) print('*'*100) print(result2)
還可使用tf.nn.dynamic_rnn函數來實現
tf.nn.dynamic_rnn( cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None )
該函數的cell即爲LSTM,inputs的維度是 [batch_size,num_steps,input_dim]
output,state=tf.nn.dynamic_rnn(cell,x,sequence_length=[10,6],initial_state=initial_state)
六、static_rnn與dynamic_rnn之間的區別
不論dynamic_rnn仍是static_rnn,每一個batch的序列長度都是同樣的(不足的話本身要去padding),不一樣的是dynamic會根據 sequence_length 停止計算。另一個不一樣是dynamic_rnn動態生成graph 。
可是dynamic_rnn不一樣的batch序列長度能夠不同,例如第一個batch長度爲10,第二個batch長度爲20,可是static_rnn不一樣的batch序列長度必須是相同的,都必須是num_steps
下面使用dynamic_rnn來實現不一樣batch之間的序列長度不一樣:
import tensorflow as tf import numpy as np batch_size=2 num_units=64 num_steps=10 input_dim=8 input=np.random.randn(batch_size,num_steps,input_dim) input2=np.random.randn(batch_size,num_steps*2,input_dim) x=tf.placeholder(dtype=tf.float32,shape=[batch_size,None,input_dim],name='input') # None 表示序列長度不定 lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(num_units) initial_state=lstm_cell.zero_state(batch_size,dtype=tf.float32) output,state=tf.nn.dynamic_rnn(lstm_cell,x,initial_state=initial_state) with tf.Session() as sess: init_op=tf.initialize_all_variables() sess.run(init_op) np.set_printoptions(threshold=np.NAN) result1,result2=(sess.run([output,state],feed_dict={x:input})) # 序列長度爲10 x:[batch_size,num_steps,input_dim],此時LSTM個數爲10個,或者說循環10次LSTM result1=np.asarray(result1) result2=np.asarray(result2) print(result1) print('*'*100) print(result2) result1, result2 = (sess.run([output, state], feed_dict={x:input2})) # 序列長度爲20 x:[batch_size,num_steps,input_dim],此時LSTM個數爲20個,或者說循環20次LSTM result1 = np.asarray(result1) result2 = np.asarray(result2) print(result1) print('*' * 100) print(result2)
可是static_rnn是不能夠的。
7.dynamic_rnn的性能和static_rnn的性能差別
import tensorflow as tf import numpy as np import time num_step=100 input_dim=8 batch_size=2 num_unit=64 input_data=np.random.randn(batch_size,num_step,input_dim) x=tf.placeholder(dtype=tf.float32,shape=[batch_size,num_step,input_dim]) seq_len=tf.placeholder(dtype=tf.int32,shape=[batch_size]) lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(num_unit) initial_state=lstm_cell.zero_state(batch_size,dtype=tf.float32) y=tf.unstack(x,axis=1) output1,state1=tf.nn.static_rnn(lstm_cell,y,sequence_length=seq_len,initial_state=initial_state) output2,state2=tf.nn.dynamic_rnn(lstm_cell,x,sequence_length=seq_len,initial_state=initial_state) print('begin train...') with tf.Session() as sess: init_op=tf.initialize_all_variables() sess.run(init_op) for i in range(100): sess.run([output1,state1],feed_dict={x:input_data,seq_len:[10]*batch_size}) time1=time.time() for i in range(100): sess.run([output1,state1],feed_dict={x:input_data,seq_len:[10]*batch_size}) time2=time.time() print('static_rnn seq_len:10\t\t{}'.format(time2-time1)) for i in range(100): sess.run([output1,state1],feed_dict={x:input_data,seq_len:[100]*batch_size}) time3=time.time() print('static_rnn seq_len:100\t\t{}'.format(time3-time2)) for i in range(100): sess.run([output2,state2],feed_dict={x:input_data,seq_len:[10]*batch_size}) time4=time.time() print('dynamic_rnn seq_len:10\t\t{}'.format(time4-time3)) for i in range(100): sess.run([output2,state2],feed_dict={x:input_data,seq_len:[100]*batch_size}) time5=time.time() print('dynamic_rnn seq_len:100\t\t{}'.format(time5-time4))
result:
static_rnn seq_len:10 0.8497538566589355 static_rnn seq_len:100 1.5897266864776611 dynamic_rnn seq_len:10 0.4857025146484375 dynamic_rnn seq_len:100 2.8693313598632812
序列短的要比序列長的運行的快,dynamic_rnn比static_rnn快的緣由是:dynamic_rnn運行到序列長度後自動中止,再也不運行,而static_rnn必須運行完num_steps才中止;序列長度爲100的實驗結果和分析相反,多是由於循環耗時間,比不上直接在100個LSTM上運行的性能。
-----------------------------------------------------------------------------------------------