LSTM（長短時間記憶網絡）及其tensorflow代碼應用

時間 2019-11-19

標籤 lstm 長短時間記憶網絡及其 tensorflow 代碼應用欄目系統網絡简体版

原文原文鏈接

　　本文主要包括：html

1、什麼是LSTM
2、LSTM的曲線擬合
3、LSTM的分類問題
4、爲何LSTM有助於消除梯度消失

1、什麼是LSTMpython

　　Long Short Term 網絡即爲LSTM，是一種循環神經網絡（RNN），能夠學習長期依賴問題。RNN 都具備一種重複神經網絡模塊的鏈式的形式。在標準的 RNN 中，這個重複的模塊只有一個很是簡單的結構，例如一個 tanh 層。git

　　如上爲標準的RNN神經網絡結構，LSTM則與此不一樣，其網絡結構如圖：github

　　其中，網絡中各個元素圖標爲：網絡

　　LSTM 經過精心設計的稱做爲「門」的結構來去除或者增長信息到細胞狀態的能力。門是一種讓信息選擇式經過的方法。他們包含一個 sigmoid 神經網絡層和一個 pointwise 乘法操做。LSTM 擁有三個門，來保護和控制細胞狀態。app

　　首先是忘記門：dom

　　如上，忘記門中須要注意的是，訓練的是一個wf的權值，並且上一時刻的輸出和當前時刻的輸入是一個concat操做。忘記門決定咱們會從細胞狀態中丟棄什麼信息，由於sigmoid函數的輸出是一個小於1的值，至關於對每一個維度上的值作一個衰減。函數

　　而後是信息增長門，決定了什麼新的信息到細胞狀態中：學習

　　其中，sigmoid決定了什麼值須要更新，tanh建立一個新的細胞狀態的候選向量C_t，該過程訓練兩個權值W_i和Wc。通過第一個和第二個門後，能夠肯定傳遞信息的刪除和增長，便可以進行「細胞狀態」的更新。測試

　　第三個門就是信息輸出門：

　　經過sigmoid肯定細胞狀態那個部分將輸出，tanh處理細胞狀態獲得一個-1到1之間的值，再將它和sigmoid門的輸出相乘，輸出程序肯定輸出的部分。

2、LSTM的曲線擬合

2.1 股票價格預測

　　下面介紹一個網上經常使用的利用LSTM作股票價格的迴歸例子，數據：

　　如上，能夠看到用例包含：index_code,date,open,close,low,high,volume,money,change這樣幾個特徵。提取特徵從open-change個特徵，做爲神經網絡的輸入，輸出即爲label。整個代碼以下：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

#定義常量
rnn_unit=10       #hidden layer units
input_size=7
output_size=1
lr=0.0006         #學習率
#——————————————————導入數據——————————————————————
f=open('dataset_2.csv') 
df=pd.read_csv(f)     #讀入股票數據
data=df.iloc[:,2:10].values  #取第3-10列


#獲取訓練集
def get_train_data(batch_size=60,time_step=20,train_begin=0,train_end=5800):
    batch_index=[]
    data_train=data[train_begin:train_end]
    normalized_train_data=(data_train-np.mean(data_train,axis=0))/np.std(data_train,axis=0)  #標準化
    train_x,train_y=[],[]   #訓練集 
    for i in range(len(normalized_train_data)-time_step):
       if i % batch_size==0:
           batch_index.append(i)
       x=normalized_train_data[i:i+time_step,:7]
       y=normalized_train_data[i:i+time_step,7,np.newaxis]
       train_x.append(x.tolist())
       train_y.append(y.tolist())
    batch_index.append((len(normalized_train_data)-time_step))
    return batch_index,train_x,train_y



#獲取測試集
def get_test_data(time_step=20,test_begin=5800):
    data_test=data[test_begin:]
    mean=np.mean(data_test,axis=0)
    std=np.std(data_test,axis=0)
    normalized_test_data=(data_test-mean)/std  #標準化
    size=(len(normalized_test_data)+time_step-1)//time_step  #有size個sample 
    test_x,test_y=[],[]  
    for i in range(size-1):
       x=normalized_test_data[i*time_step:(i+1)*time_step,:7]
       y=normalized_test_data[i*time_step:(i+1)*time_step,7]
       test_x.append(x.tolist())
       test_y.extend(y)
    test_x.append((normalized_test_data[(i+1)*time_step:,:7]).tolist())
    test_y.extend((normalized_test_data[(i+1)*time_step:,7]).tolist())
    return mean,std,test_x,test_y



#——————————————————定義神經網絡變量——————————————————
#輸入層、輸出層權重、偏置

weights={
         'in':tf.Variable(tf.random_normal([input_size,rnn_unit])),
         'out':tf.Variable(tf.random_normal([rnn_unit,1]))
        }
biases={
        'in':tf.Variable(tf.constant(0.1,shape=[rnn_unit,])),
        'out':tf.Variable(tf.constant(0.1,shape=[1,]))
       }

#——————————————————定義神經網絡變量——————————————————
def lstm(X):     
    batch_size=tf.shape(X)[0]
    time_step=tf.shape(X)[1]
    w_in=weights['in']
    b_in=biases['in']  
    input=tf.reshape(X,[-1,input_size])  #須要將tensor轉成2維進行計算，計算後的結果做爲隱藏層的輸入
    input_rnn=tf.matmul(input,w_in)+b_in
    input_rnn=tf.reshape(input_rnn,[-1,time_step,rnn_unit])  #將tensor轉成3維，做爲lstm cell的輸入
    cell=tf.nn.rnn_cell.BasicLSTMCell(rnn_unit)
    init_state=cell.zero_state(batch_size,dtype=tf.float32)
    output_rnn,final_states=tf.nn.dynamic_rnn(cell, input_rnn,initial_state=init_state, dtype=tf.float32)  #output_rnn是記錄lstm每一個輸出節點的結果，final_states是最後一個cell的結果
    output=tf.reshape(output_rnn,[-1,rnn_unit]) #做爲輸出層的輸入
    w_out=weights['out']
    b_out=biases['out']
    pred=tf.matmul(output,w_out)+b_out
    return pred,final_states



#——————————————————訓練模型——————————————————
def train_lstm(batch_size=80,time_step=15,train_begin=2000,train_end=5800):
    X=tf.placeholder(tf.float32, shape=[None,time_step,input_size])
    Y=tf.placeholder(tf.float32, shape=[None,time_step,output_size])
    # 訓練樣本中第2001 - 5785個樣本，每次取15個
    batch_index,train_x,train_y=get_train_data(batch_size,time_step,train_begin,train_end)
    print(np.array(train_x).shape)# 3785  15  7
    print(batch_index)
    #至關於總共3785句話，每句話15個字，每一個字7個特徵（embadding）,對於這些樣本每次訓練80句話
    pred,_=lstm(X)
    #損失函數
    loss=tf.reduce_mean(tf.square(tf.reshape(pred,[-1])-tf.reshape(Y, [-1])))
    train_op=tf.train.AdamOptimizer(lr).minimize(loss)
    saver=tf.train.Saver(tf.global_variables(),max_to_keep=15)  
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        #重複訓練200次
        for i in range(200):
            #每次進行訓練的時候，每一個batch訓練batch_size個樣本
            for step in range(len(batch_index)-1):
                _,loss_=sess.run([train_op,loss],feed_dict={X:train_x[batch_index[step]:batch_index[step+1]],Y:train_y[batch_index[step]:batch_index[step+1]]})
            print(i,loss_)
            if i % 200==0:
                print("保存模型：",saver.save(sess,'model/stock2.model',global_step=i))


train_lstm()


#————————————————預測模型————————————————————
def prediction(time_step=20):
    X=tf.placeholder(tf.float32, shape=[None,time_step,input_size])
    mean,std,test_x,test_y=get_test_data(time_step)
    pred,_=lstm(X)     
    saver=tf.train.Saver(tf.global_variables())
    with tf.Session() as sess:
        #參數恢復
        module_file = tf.train.latest_checkpoint('model')
        saver.restore(sess, module_file) 
        test_predict=[]
        for step in range(len(test_x)-1):
          prob=sess.run(pred,feed_dict={X:[test_x[step]]})   
          predict=prob.reshape((-1))
          test_predict.extend(predict)
        test_y=np.array(test_y)*std[7]+mean[7]
        test_predict=np.array(test_predict)*std[7]+mean[7]
        acc=np.average(np.abs(test_predict-test_y[:len(test_predict)])/test_y[:len(test_predict)])  #誤差
        #以折線圖表示結果
        plt.figure()
        plt.plot(list(range(len(test_predict))), test_predict, color='b')
        plt.plot(list(range(len(test_y))), test_y,  color='r')
        plt.show()

prediction()

　　這個過程並不難理解，下面分析其中維度變換，從而增長對LSTM的理解。

　　對於RNN的網絡的構建，能夠從輸入張量的維度上理解，這裏咱們使用dynamic_rnn（固然能夠注意與tf.contrib.rnn.static_rnn在使用上的區別）：

dynamic_rnn(
    cell,
    inputs,
    sequence_length=None,
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
)

　　其中：

　　cell:輸入一個RNNcell實例

　　inputs:RNN神經網絡的輸入，若是 time_major == False (default)，輸入的形狀是: [batch_size, max_time, embedding_size]；若是 time_major == True, 輸入的形狀是: [ max_time, batch_size, embedding_size]

　　initial_state: RNN網絡的初始狀態，網絡須要一個初始狀態，對於普通的RNN網絡，初始狀態的形狀是:[batch_size, cell.state_size]

2.2 正弦曲線擬合

　　對於使用LSTM作曲線擬合，參考https://morvanzhou.github.io/tutorials/machine-learning/tensorflow/5-09-RNN3/，獲得代碼：

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
 
BATCH_START = 0 #創建 batch data 時候的 index
TIME_STEPS = 20 # backpropagation through time 的time_steps
BATCH_SIZE = 50
INPUT_SIZE = 1 # x數據輸入size
OUTPUT_SIZE = 1 # cos數據輸出 size
CELL_SIZE = 10 # RNN的 hidden unit size
LR = 0.006  # learning rate
 
# 定義一個生成數據的 get_batch function:
def get_batch():
    #global BATCH_START, TIME_STEPS
    # xs shape (50batch, 20steps)
    xs = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE).reshape((BATCH_SIZE, TIME_STEPS)) / (10*np.pi)
    res = np.cos(xs)
    # returned  xs and res: shape (batch, step, input)
    return [xs[:, :, np.newaxis], res[:, :, np.newaxis]]
 
# 定義 LSTMRNN 的主體結構
class LSTMRNN(object):
    def __init__(self, n_steps, input_size, output_size, cell_size, batch_size):
        self.n_steps = n_steps
        self.input_size = input_size
        self.output_size = output_size
        self.cell_size = cell_size
        self.batch_size = batch_size
        with tf.name_scope('inputs'):
            self.xs = tf.placeholder(tf.float32, [None, n_steps, input_size], name='xs')
            self.ys = tf.placeholder(tf.float32, [None, n_steps, output_size], name='ys')
        with tf.variable_scope('in_hidden'):
            self.add_input_layer()
        with tf.variable_scope('LSTM_cell'):
            self.add_cell()
        with tf.variable_scope('out_hidden'):
            self.add_output_layer()
        with tf.name_scope('cost'):
            self.compute_cost()
        with tf.name_scope('train'):
            self.train_op = tf.train.AdamOptimizer(LR).minimize(self.cost)
 
    # 設置 add_input_layer 功能, 添加 input_layer:
    def add_input_layer(self, ):
        l_in_x = tf.reshape(self.xs, [-1, self.input_size], name='2_2D')  # (batch*n_step, in_size)
        # Ws (in_size, cell_size)
        Ws_in = self._weight_variable([self.input_size, self.cell_size])
        # bs (cell_size, )
        bs_in = self._bias_variable([self.cell_size, ])
        # l_in_y = (batch * n_steps, cell_size)
        with tf.name_scope('Wx_plus_b'):
            l_in_y = tf.matmul(l_in_x, Ws_in) + bs_in
        # reshape l_in_y ==> (batch, n_steps, cell_size)
        self.l_in_y = tf.reshape(l_in_y, [-1, self.n_steps, self.cell_size], name='2_3D')
 
    # 設置 add_cell 功能, 添加 cell, 注意這裏的 self.cell_init_state,
    #  由於咱們在 training 的時候, 這個地方要特別說明.
    def add_cell(self):
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(self.cell_size, forget_bias=1.0, state_is_tuple=True)
        with tf.name_scope('initial_state'):
            self.cell_init_state = lstm_cell.zero_state(self.batch_size, dtype=tf.float32)
        self.cell_outputs, self.cell_final_state = tf.nn.dynamic_rnn(lstm_cell, 
                                                                     self.l_in_y,
                                                                     initial_state=self.cell_init_state,
                                                                     time_major=False)
 
    # 設置 add_output_layer 功能, 添加 output_layer:
    def add_output_layer(self):
        # shape = (batch * steps, cell_size)
        l_out_x = tf.reshape(self.cell_outputs, [-1, self.cell_size], name='2_2D')
        Ws_out = self._weight_variable([self.cell_size, self.output_size])
        bs_out = self._bias_variable([self.output_size, ])
        # shape = (batch * steps, output_size)
        with tf.name_scope('Wx_plus_b'):
            self.pred = tf.matmul(l_out_x, Ws_out) + bs_out
 
    # 添加 RNN 中剩下的部分:
    def compute_cost(self):
        losses = tf.contrib.legacy_seq2seq.sequence_loss_by_example(
            [tf.reshape(self.pred, [-1], name='reshape_pred')],
            [tf.reshape(self.ys, [-1], name='reshape_target')],
            [tf.ones([self.batch_size * self.n_steps], dtype=tf.float32)],
            average_across_timesteps=True,
            softmax_loss_function=self.ms_error,
            name='losses'
        )
        with tf.name_scope('average_cost'):
            self.cost = tf.div(
                tf.reduce_sum(losses, name='losses_sum'),
                self.batch_size,
                name='average_cost')
            tf.summary.scalar('cost', self.cost)
 
    def ms_error(self,labels, logits):
        return tf.square(tf.subtract(labels, logits))
 
    def _weight_variable(self, shape, name='weights'):
        initializer = tf.random_normal_initializer(mean=0., stddev=1., )
        return tf.get_variable(shape=shape, initializer=initializer, name=name)
 
    def _bias_variable(self, shape, name='biases'):
        initializer = tf.constant_initializer(0.1)
        return tf.get_variable(name=name, shape=shape, initializer=initializer)
 
 
# 訓練 LSTMRNN
if __name__ == '__main__':
    
    # 搭建 LSTMRNN 模型
    model = LSTMRNN(TIME_STEPS, INPUT_SIZE, OUTPUT_SIZE, CELL_SIZE, BATCH_SIZE)
    sess = tf.Session()
    saver=tf.train.Saver(max_to_keep=3)
    sess.run(tf.global_variables_initializer())  
    t = 0    
    if(t == 1):
        model_file=tf.train.latest_checkpoint('model/')
        saver.restore(sess,model_file )
        xs, res = get_batch()  # 提取 batch data
        feed_dict = {model.xs: xs}
        pred = sess.run( model.pred,feed_dict=feed_dict)
        xs.shape = (-1,1)
        res.shape = (-1, 1)
        pred.shape = (-1, 1)
        print(xs.shape,res.shape,pred.shape)
        plt.figure()
        plt.plot(xs,res,'-r')
        plt.plot(xs,pred,'--g')        
        plt.show()
    else: 
        # matplotlib可視化
        plt.ion()  # 設置連續 plot
        plt.show()     
        # 訓練屢次
        for i in range(2500):
            xs, res = get_batch()  # 提取 batch data
            # 初始化 data
            feed_dict = {
                model.xs: xs,
                model.ys: res,
            }           
            # 訓練
            _, cost, state, pred = sess.run(
                [model.train_op, model.cost, model.cell_final_state, model.pred],
                feed_dict=feed_dict)
     
            # plotting
            x = xs.reshape(-1,1)
            r = res.reshape(-1, 1)
            p = pred.reshape(-1, 1)
            plt.clf()
            plt.plot(x, r, 'r', x, p, 'b--')
            plt.ylim((-1.2, 1.2))
            plt.draw()
            plt.pause(0.3)  # 每 0.3 s 刷新一次
     
            # 打印 cost 結果
            if i % 20 == 0:
                saver.save(sess, "model/lstem_text.ckpt",global_step=i)#
                print('cost: ', round(cost, 4))

　　能夠看到一個有意思的現象，下面是前後兩個時刻的圖像：

　　x值較小的點先收斂，x值大的收斂速度很慢。其緣由主要是BPTT的求導過程，對於時間靠前的梯度降低快，能夠參考：http://www.javashuo.com/article/p-tapebxsi-gr.html 中1.2節。將網絡結構改成雙向循環神經網絡：

def add_cell(self):
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(self.cell_size, forget_bias=1.0, state_is_tuple=True)
        lstm_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell],1)
        with tf.name_scope('initial_state'):
            self.cell_init_state = lstm_cell.zero_state(self.batch_size, dtype=tf.float32)
        self.cell_outputs, self.cell_final_state = tf.nn.dynamic_rnn(lstm_cell, 
                                                                     self.l_in_y,
                                                                     initial_state=self.cell_init_state,
                                                                     time_major=False)

　　發現收斂速度快了一些。不過這個問題主要仍是是由於x的值過大致使的，修改代碼，將原始的值的獲取進行分段：

BATCH_START = 3000 #創建 batch data 時候的 index
TIME_STEPS = 20 # backpropagation through time 的time_steps
BATCH_SIZE_r = 50
BATCH_SIZE = 10
INPUT_SIZE = 1 # x數據輸入size
OUTPUT_SIZE = 1 # cos數據輸出 size
CELL_SIZE = 10 # RNN的 hidden unit size
LR = 0.006  # learning rate
ii = 0
# 定義一個生成數據的 get_batch function:
def get_batch():
    global ii
    # xs shape (50batch, 20steps)
    xs_r = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE_r)
    xs = xs_r[ii*BATCH_SIZE*TIME_STEPS:(ii+1)*BATCH_SIZE*TIME_STEPS].reshape((BATCH_SIZE, TIME_STEPS)) / (10*np.pi)
    res = np.cos(xs)
    ii += 1
    if(ii == 5):
        ii = 0
    
    # returned  xs and res: shape (batch, step, input)
    return [xs[:, :, np.newaxis], res[:, :, np.newaxis]]

　　而後能夠具體觀測某一段的收斂過程：

# matplotlib可視化
        plt.ion()  # 設置連續 plot
        plt.show()     
        # 訓練屢次
        for i in range(200):
            xs,res,pred = [],[],[]
            for j in range(5):
                

                xsj, resj = get_batch()  # 提取 batch data
                if(j != 0):
                    continue
                # 初始化 data
                feed_dict = {
                    model.xs: xsj,
                    model.ys: resj,
                }           
                # 訓練
                _, cost, state, predj = sess.run(
                    [model.train_op, model.cost, model.cell_final_state, model.pred],
                    feed_dict=feed_dict)
     
            # plotting
                x = list(xsj.reshape(-1,1))
                r = list(resj.reshape(-1, 1))
                p = list(predj.reshape(-1, 1))
                xs += x
                res += r
                pred += p
            plt.clf()
            plt.plot(xs, res, 'r', x, p, 'b--')
            plt.ylim((-1.2, 1.2))
            plt.draw()
            plt.pause(0.3)  # 每 0.3 s 刷新一次
     
            # 打印 cost 結果
            if i % 20 == 0:
                saver.save(sess, "model/lstem_text.ckpt",global_step=i)#
                print('cost: ', round(cost, 4))

　　能夠看到，當設置的區間比較大，譬如BATCH_START = 3000了，那麼就很難收斂了。

　　所以，這裏須要注意了，LSTM作迴歸問題的時候，注意觀測值與自變量之間不要差距過大。當咱們改小一些x的值，能夠看到效果如圖：

3、LSTM的分類問題

　　對於分類問題，其實和迴歸是同樣的，假設在上面的正弦函數的基礎上，若y大於0標記爲1，y小於0標記爲0，則輸出變成了一個n_class（n個類別）的向量，本例中兩個維度分別表明標記爲0的機率和標記爲1的機率。須要修改的地方爲：

　　首先是數據產生函數，添加一個打標籤的過程：

# 定義一個生成數據的 get_batch function:
def get_batch():
    #global BATCH_START, TIME_STEPS
    # xs shape (50batch, 20steps)
    xs = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE).reshape((BATCH_SIZE, TIME_STEPS)) / (200*np.pi)
    res = np.where(np.cos(4*xs)>=0,0,1).tolist()
    for i in range(BATCH_SIZE):
        for j in range(TIME_STEPS):           
            res[i][j] = [0,1] if res[i][j] == 1 else [1,0]
    # returned  xs and res: shape (batch, step, input/output)
    return [xs[:, :, np.newaxis], np.array(res)]

　　而後修改損失函數，迴歸問題就不能用最小二乘的損失了，能夠採用交叉熵損失函數：

def compute_cost(self):
        self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = self.ys,logits = self.pred))

　　固然，注意一下維度問題就能夠了，效果如圖：

　　例子代碼。

4、爲何LSTM有助於消除梯度消失

　　爲了解決RNN的梯度問題，首先有人提出了滲透單元的辦法，即在時間軸上增長跳躍鏈接，後推廣成LSTM。LSTM其門結構，提供了一種對梯度的選擇的做用。

　　對於門結構，其實若是關閉，則會一直保存之前的信息，其實也就是縮短了鏈式求導。

　　譬如，對某些輸入張量訓練獲得的f_t一直爲1，則C_t-1的信息能夠一直保存，直到有輸入x獲得的f_t爲0，則和前面的信息就沒有關係了。故解決了長時間的依賴問題。由於門控機制的存在，咱們經過控制門的打開、關閉等操做，讓梯度計算沿着梯度乘積接近1的部分建立路徑。

　　如上，能夠經過門的控制，看到紅色和藍色箭頭表明的路徑下，y_t+1的在這個路徑下的梯度與上一時刻梯度保持不變。

　　對於信息增長門與忘記門的「+」操做，其求導是加法操做而不是乘法操做，該環節梯度爲1，不會產生鏈式求導。如後面的求導，綠色路徑和藍色路徑是相加的關係，保留了以前的梯度。

　　然而，梯度消失現象能夠改善，可是梯度爆炸仍是可能會出現的。譬如對於綠色路徑：

　　仍是存在着w致使的梯度爆炸現象。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。