第二十一節，使用TensorFlow實現LSTM和GRU網絡

時間 2020-05-09

標籤第二十一節使用 tensorflow 實現 lstm gru 網絡欄目系統網絡简体版

原文原文鏈接

本節主要介紹在TensorFlow中實現LSTM以及GRU網絡。html

一 LSTM網絡

Long Short Term 網絡—— 通常就叫作 LSTM ——是一種 RNN 特殊的類型，能夠學習長期依賴信息。LSTM 由 Hochreiter & Schmidhuber (1997) 提出，並在近期被 Alex Graves 進行了改良和推廣。在不少問題，LSTM 都取得至關巨大的成功，並獲得了普遍的使用。python

LSTM 經過刻意的設計來避免長期依賴問題。記住長期的信息在實踐中是 LSTM 的默認行爲，而非須要付出很大代價才能得到的能力！算法

LSTM的結構以下：數組

這種結構的核心思想是引入了一個叫作細胞狀態的鏈接，這個細胞狀態用來存放想要記憶的東西。同時在裏面加入了三個門：網絡

忘記門;顧名思義，是控制是否遺忘的，在LSTM中即以必定的機率控制是否遺忘上一層的隱藏細胞狀態。
輸入門:輸入門（input gate）負責處理當前序列位置的輸入.
輸出門：決定何時須要把狀態和輸出放在一塊兒輸出。

二 LSTM 的變體

上面咱們介紹了正常的 LSTM。可是不是全部的 LSTM 都長成一個樣子的。實際上，幾乎全部包含 LSTM 的論文都採用了微小的變體。差別很是小，可是也值得拿出來說一下。架構

1.窺視孔鏈接(peephole )

其中一個流形的 LSTM 變體，就是由 Gers & Schmidhuber (2000) 提出的，增長了「peephole connection」。是說，咱們讓每一個門也會接受細胞狀態的輸入。app

上面的圖例中，咱們增長了 peephole 到每一個門上，可是許多論文會加入部分的 peephole 而非全部都加。less

2.coupled 忘記門和輸入門

另外一個變體是經過使用 coupled 忘記和輸入門。不一樣於以前是分開肯定什麼忘記和須要添加什麼新的信息，這裏是一同作出決定。咱們僅僅會當咱們將要輸入在當前位置時忘記。咱們僅僅輸入新的值到那些咱們已經忘記舊的信息的那些狀態。dom

3.GRU

另外一個改動較大的變體是 Gated Recurrent Unit (GRU)，這是由 Cho, et al. (2014) 提出。它將忘記門和輸入門合成了一個單一的更新門。一樣還混合了細胞狀態和隱藏狀態，和其餘一些改動。最終的模型比標準的 LSTM 模型要簡單，也是很是流行的變體。因爲GRU比LSTM少了一個狀態輸出，效果幾乎同樣，所以在編碼使用時使用GRU可讓代碼更爲簡單一些。ide

這裏只是部分流行的 LSTM 變體。固然還有不少其餘的，如 Yao, et al. (2015) 提出的 Depth Gated RNN。還有用一些徹底不一樣的觀點來解決長期依賴的問題，如 Koutnik, et al. (2014) 提出的 Clockwork RNN。

要問哪一個變體是最好的？其中的差別性真的重要嗎？ Greff, et al. (2015) 給出了流行變體的比較，結論是他們基本上是同樣的。 Jozefowicz, et al. (2015) 則在超過 1 萬中 RNN 架構上進行了測試，發現一些架構在某些任務上也取得了比 LSTM 更好的結果。

三 Bi-RNN網絡介紹

Bi-RNN又叫雙向RNN，是採用了兩個方向的RNN網絡。

RNN網絡擅長的是對於連續數據的處理，既然是連續的數據規律，咱們不只能夠學習他的正向規律，還能夠學習他的反向規律。這樣正向和反向結合的網絡，回比單向循環網絡有更高的擬合度。

雙向RNN的處理過程和單向RNN很是類似，就是在正向傳播的基礎上再進行一次反向傳播，並且這兩個都鏈接這一個輸出層。

四 TensorFlow中cell庫

TensorFlow中定義了5個關於cell的類，cell咱們能夠理解爲DNN中的一個隱藏層，只不過是一個比較特殊的層。以下

1.BasicRNNCell類

最基本的RNN類實現:

  def __init__(self, num_units, activation=None, reuse=None)

num_units：LSTM網絡單元的個數，也即隱藏層的節點數。
activation： Nonlinearity to use. Default: `tanh`.
reuse：(optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised.

2.BasicLSTMCell類

LSTM網絡:

def __init__(self, num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None):

num_units：LSTM網絡單元的個數，也即隱藏層的節點數。
forget_bias：添加到忘記門的偏置。
state_is_tuple：因爲細胞狀態ct和輸出ht是分開的，當爲True時放在一個tuple中，(c=array([[]]),h=array([[]]))，當爲False時兩個值就按列鏈接起來，成爲[batch,2n]，建議使用True。
activation: Activation function of the inner states. Default: `tanh`.
reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised. 在一個scope中是否重用。

3.LSTMCell類

LSTM實現的一個高級版本。

def __init__(self, num_units, use_peepholes=False, cell_clip=None, initializer=None, num_proj=None, proj_clip=None, num_unit_shards=None, num_proj_shards=None, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None):

num_units：LSTM網絡單元的個數，也即隱藏層的節點數。
use_peepholes：默認False，True表示啓用Peephole鏈接。
cell_clip：是否在輸出前對cell狀態按照給定值進行截斷處理。
initializer: (optional) The initializer to use for the weight and projection matrices.
num_proj: (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.經過projection層進行模型壓縮的輸出維度。
proj_clip: (optional) A float value. If `num_proj > 0` and `proj_clip` is provided, then the projected values are clipped elementwise to within `[-proj_clip, proj_clip]`.將num_proj按照給定的proj_clip截斷。
num_unit_shards: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.
num_proj_shards: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.
forget_bias: Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.
state_is_tuple: If True, accepted and returned states are 2-tuples of the `c_state` and `m_state`. If False, they are concatenated along the column axis. This latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: `tanh`.
reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised.

4.GRU類

  def __init__(self, num_units, activation=None, reuse=None, kernel_initializer=None, bias_initializer=None):

num_units：GRU網絡單元的個數，也即隱藏層的節點數。
activation： Nonlinearity to use. Default: `tanh`.
reuse：(optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raise

5.MultiRNNCell

多層RNN的實現：

def __init__(self, cells, state_is_tuple=True)

cells: list of RNNCells that will be composed in this order. 一個cell列表。將列表中的cell一個個堆疊起來，若是使用cells=[cell1,cell2]，就是一共有2層，數據通過cell1後還要通過cells。
state_is_tuple: If True, accepted and returned states are n-tuples, where `n = len(cells)`. If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated.若是是True則返回的是n-tuple，即cell的輸出值與cell的輸出狀態組成了一個元組。其中輸出值和輸出狀態的結構均爲[batch,num_units]。

五經過cell類構建RNN

定義好cell類以後，還須要將它們鏈接起來構成RNN網絡，TensorFlow中有幾種現成的構建網絡模式，是封裝好的函數，直接調用便可：

1.靜態RNN構建

def tf.contrib.rnn.static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None):

cell:生成好的cell類對象。
inputs:A length T list of inputs, each a `Tensor` of shape `[batch_size, input_size]`, or a nested tuple of such elements.輸入數據，由張量組成的list。list的順序就是時間順序。元素就是每一個序列的值，形狀爲[batch_size,input_size]。
initial_state: (optional) An initial state for the RNN. If `cell.state_size` is an integer, this must be a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`. If `cell.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell.state_size`.初始化cell狀態。
dtype: (optional) The data type for the initial state and expected output. Required if initial_state is not provided or RNN state has a heterogeneous。指望輸出和初始化state的類型。
sequence_length: Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size `[batch_size]`, values in `[0, T)`.每個輸入的序列長度。
scope: VariableScope for the created subgraph; defaults to "rnn".命名空間

返回值有兩個，一個是輸出結果，一個是cell狀態。咱們只關注結果，結果也是一個list，輸入是多少個時序，list裏面就會有多少個元素。每一個元素大小爲[batch_size,num_units]。

注意：在輸入時，必定要將咱們習慣使用的張量改成由張量組成的list。另外，在獲得輸出時也要去最後一個時序的輸出參與後面的運算。

2.動態RNN構建

def tf.nn.dynamic_rnn(cell, inputs, sequence_length=None, 
　　　  initial_state=None, dtype=None, parallel_iterations=None, 
       swap_memory=False, time_major=False, scope=None):

cell:生成好的cell類對象。
inputs：If `time_major == False` (default), this must be a `Tensor` of shape:`[batch_size, max_time, ...]`, or a nested tuple of such elements. If `time_major == True`, this must be a `Tensor` of shape: `[max_time, batch_size, ...]`, or a nested tuple of such elements. 輸入數據，是一個張量，默認是三維張量，[batch_size,max_time,...]，batch_size表示一個批次數量，max_time:表示時間序列總數，後面是一個時序輸入數據的長度。
sequence_length: Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size `[batch_size]`, values in `[0, T)`.每個輸入的序列長度。
initial_state: (optional) An initial state for the RNN.If `cell.state_size` is an integer, this must be a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`. If `cell.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell.state_size`.初始化cell狀態。
dtype：指望輸出和初始化state的類型。
parallel_iterations: (Default: 32). The number of iterations to run inparallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.
time_major: The shape format of the `inputs` and `outputs` Tensors. If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`. If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`. Using `time_major = True` is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
scope: VariableScope for the created subgraph; defaults to "rnn".命名空間。

返回值：一個是結果，一個是cell狀態。

A pair (outputs, state) where:

outputs: The RNN output `Tensor`.

If time_major == False (default), this will be a `Tensor` shaped: `[batch_size, max_time, cell.output_size]`.

If time_major == True, this will be a `Tensor` shaped: `[max_time, batch_size, cell.output_size]`.

Note, if `cell.output_size` is a (possibly nested) tuple of integers or `TensorShape` objects, then `outputs` will be a tuple having the same structure as `cell.output_size`, containing Tensors having shapes
corresponding to the shape data in `cell.output_size`.

state: The final state. If `cell.state_size` is an int, this will be shaped `[batch_size, cell.state_size]`. If it is a `TensorShape`, this will be shaped `[batch_size] + cell.state_size`. If it is a (possibly nested) tuple of ints or `TensorShape`, this will be a tuple having the corresponding shapes.

因爲time_major默認值是False，因此結果是以[batch_size,max_time,...]形式的張量。

注意：在輸出時若是是以[batch_size,max_time,...]形式，即批次優先的矩陣，由於咱們須要取最後一個時序的輸出，因此須要轉置成時間優先的形式。

outputs = tf.transpose(outputs,[1,0,2])

3.雙向RNN構建

雙向RNN做爲一個能夠學習正反規律的循環神經網絡，在TensorFlow中有4個函數可使用。

1.靜態單層雙向RNN

def tf.contrib.rnn.static_bidirectional_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, dtype=None, sequence_length=None, scope=None):

cell_fw: An instance of RNNCell, to be used for forward direction.這個參數是實例化後的cell對象，表明前向。
cell_bw: An instance of RNNCell, to be used for backward direction.這個參數是實例化後的cell對象，表明後向。
inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements.一個長度爲t的輸入列表，每個元素都是一個張量，形狀爲[batch_size,input_size],t表示時序總數。
initial_state_fw: (optional) An initial state for the forward RNN. This must be a tensor of appropriate type and shape `[batch_size, cell_fw.state_size]`. If `cell_fw.state_size` is a tuple, this should be a tuple of
tensors having shapes `[batch_size, s] for s in cell_fw.state_size`.前向的細胞狀態初始化，默認爲0.
initial_state_bw: (optional) Same as for `initial_state_fw`, but using the corresponding properties of `cell_bw`.後向的細胞狀態初始化，默認爲0.
dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.能夠爲自定義cell初始狀態指定類型。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences.傳入的序列長度
scope: VariableScope for the created subgraph; defaults to "bidirectional_rnn"。名稱空間。

返回值是一個tuple(outputs,outputs_state_fw,output_state_bw)。outputs爲一個長度爲t的list，每個元素都包含正向和反向的輸出(即合併以後的，所以不須要使用tf.concat進行鏈接了)。

2.靜態多層雙向RNN

def tf.contrib.rnn.stack_bidirectional_rnn(cells_fw, cells_bw, inputs, initial_states_fw=None, initial_states_bw=None, dtype=None, sequence_length=None, scope=None):

cells_fw: List of instances of RNNCell, one per layer, to be used for forward direction.實例化後的cell列表，表明正向。
cells_bw: List of instances of RNNCell, one per layer,to be used for backward direction.實例化後的cell列表，表明反向。
inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements.一個長度爲t的輸入列表，每個元素都是一個張量，形狀爲[batch_size,input_size]，t表示時序總數。
initial_states_fw: (optional) A list of the initial states (one per layer) for the forward RNN. Each tensor must has an appropriate type and shape `[batch_size, cell_fw.state_size]`.前向細胞狀態初始化，默認爲0.
initial_states_bw: (optional) Same as for `initial_states_fw`, but using the corresponding properties of `cells_bw`.後向的細胞狀態初始化，默認爲0.
dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.能夠爲自定義cell初始狀態指定類型。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences.傳入的序列長度
scope: VariableScope for the created subgraph; defaults to None.名稱空間。

3.動態單層雙向RNN

def tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=None, initial_state_fw=None, initial_state_bw=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None):

cell_fw: An instance of RNNCell, to be used for forward direction. 這個參數是實例化後的cell對象，表明前向。
cell_bw: An instance of RNNCell, to be used for backward direction.這個參數是實例化後的cell對象，表明後向。
inputs: The RNN inputs. If time_major == False (default), this must be a tensor of shape: `[batch_size, max_time, ...]`, or a nested tuple of such elements. If time_major == True, this must be a tensor of shape:`[max_time, batch_size, ...]`, or a nested tuple of such elements.輸入數據，是一個張量，默認是三維張量，[batch_size,max_time,...]，batch_size表示一個批次數量，max_time:表示時間序列總數，後面是一個時序輸入數據的長度。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences in the batch. If not provided, all batch entries are assumed to be full sequences; and time reversal is applied from time `0` to `max_time` for each sequence.序列長度
initial_state_fw: (optional) An initial state for the forward RNN. This must be a tensor of appropriate type and shape `[batch_size, cell_fw.state_size]`. If `cell_fw.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell_fw.state_size`.前向細胞狀態初始化，默認爲0.
initial_state_bw: (optional) Same as for `initial_state_fw`, but using the corresponding properties of `cell_bw`.後向細胞狀態初始化，默認爲0.
dtype: (optional) The data type for the initial states and expected output. Required if initial_states are not provided or RNN states have a heterogeneous dtype.數據類型。
parallel_iterations: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.
time_major: The shape format of the `inputs` and `outputs` Tensors.If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`. If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`. Using `time_major = True` is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
scope: VariableScope for the created subgraph; defaults to "bidirectional_rnn"。命名空間

返回是一個tuple(outputs,outputs_state)，outputs也是一個元組(output_fw,output_bw)，默認狀況下(即time_major=False)每個都爲一個張量，形狀爲[batch_size,max_time,layers_output]，若是須要總的結果，能夠將先後項的layers_output使用tf.concat鏈接起來。

   hiddens = tf.concat(hiddens,axis=2)

除此以外，咱們通常還須要轉換爲時序優先的矩陣。

 hiddens = tf.transpose(hiddens,[1,0,2])

4.動態多層雙向RNN

def tf.contrib.rnn.stack_bidirectional_dynamic_rnn(cells_fw, cells_bw, inputs, initial_states_fw=None, initial_states_bw=None, dtype=None, sequence_length=None, parallel_iterations=None, scope=None):

cells_fw: List of instances of RNNCell, one per layer, to be used for forward direction.實例化後的cell列表，表明正向。
cells_bw: List of instances of RNNCell, one per layer,to be used for backward direction.實例化後的cell列表，表明反向。
inputs: The RNN inputs. this must be a tensor of shape:`[batch_size, max_time, ...]`, or a nested tuple of such elements.輸入數據，是一個張量，默認是三維張量，[batch_size,max_time,...]，batch_size表示一個批次數量，max_time:表示時間序列總數，後面是一個時序輸入數據的長度。
initial_states_fw: (optional) A list of the initial states (one per layer) for the forward RNN. Each tensor must has an appropriate type and shape
`[batch_size, cell_fw.state_size]`.前向細胞狀態初始化，默認爲0.
initial_states_bw: (optional) Same as for `initial_states_fw`, but using the corresponding properties of `cells_bw`.後向細胞狀態初始化，默認爲0.
dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.數據類型。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences.序列長度。
parallel_iterations: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.
scope: VariableScope for the created subgraph; defaults to None.命名空間

返回是一個tuple(outputs,output_state_fw,output_state_bw)，outputs爲一個張量，形狀爲[batch_size,max_time,layers_output],layers_output包含tf.concat以後的正向和反向的輸出。

咱們通常還須要轉換爲時序優先的矩陣。

 hiddens = tf.transpose(hiddens,[1,0,2])

六 Tensoflow實現單層單向RNN

咱們使用MNIST數據集做爲數據源，經過構建RNN對MNIST數據集進行分類，因爲單張圖像大小爲28x28，咱們把每張圖像分紅28個總時序，每一個時序是28個值，而後送入RNN網絡。

# -*- coding: utf-8 -*-
"""
Created on Fri May 11 11:49:52 2018

@author: zy
"""

'''
使用TensorFlow庫實現單層RNN  分別使用LSTM單元，GRU單元，static_rnn和dynamic_rnn函數
'''

import tensorflow as tf
import numpy as np
tf.reset_default_graph()

'''
一 使用動態RNN處理變長序列
'''
np.random.seed(0)

#建立輸入數據  正態分佈 2：表示一次的批次數量 4：表示時間序列總數  5：表示具體的數據
X = np.random.randn(2,4,5)

#第二個樣本長度爲3
X[1,1:] = 0
#每個輸入序列的長度
seq_lengths = [4,1]
print('X:\n',X)

#分別創建一個LSTM與GRU的cell，比較輸出的狀態  3是隱藏層節點的個數
cell = tf.contrib.rnn.BasicLSTMCell(num_units = 3,state_is_tuple = True)
gru = tf.contrib.rnn.GRUCell(3)

#若是沒有initial_state，必須指定a dtype
outputs,last_states = tf.nn.dynamic_rnn(cell,X,seq_lengths,dtype =tf.float64 )
gruoutputs,grulast_states = tf.nn.dynamic_rnn(gru,X,seq_lengths,dtype =tf.float64 )

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

result,sta,gruout,grusta = sess.run([outputs,last_states,gruoutputs,grulast_states])

print('全序列:\n',result[0])
print('短序列:\n',result[1])

#因爲在BasicLSTMCell設置了state_is_tuple是True，因此lstm的值爲 (狀態ct,輸出h）
print('LSTM的狀態:',len(sta),'\n',sta[1])  

print('GRU的全序列：\n',gruout[0])
print('GRU的短序列：\n',gruout[1])
#GRU沒有狀態輸出，其狀態就是最終輸出，由於批次是兩個，因此輸出爲2
print('GRU的狀態:',len(grusta),'\n',grusta[1]) 




'''
二 構建單層單向RNN網絡對MNIST數據集分類
'''
'''
MNIST數據集一個樣本長度爲28 x 28 
咱們能夠把一個樣本分紅28個時間段，每段內容是28個值，而後送入LSTM或者GRU網絡
咱們設置隱藏層的節點數爲128
'''


def single_layer_static_lstm(input_x,n_steps,n_hidden):
    '''
    返回靜態單層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #若是是調用的是靜態rnn函數，須要這一步處理   即至關於把序列做爲第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    #能夠看作隱藏層
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias=1.0)
    #靜態rnn函數傳入的是一個張量list  每個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=lstm_cell, inputs=input_x1, dtype=tf.float32)

    return hiddens,states


def single_layer_static_gru(input_x,n_steps,n_hidden):
    '''
    返回靜態單層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #若是是調用的是靜態rnn函數，須要這一步處理   即至關於把序列做爲第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    #能夠看作隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden)
    #靜態rnn函數傳入的是一個張量list  每個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=gru_cell,inputs=input_x1,dtype=tf.float32)
        
    return hiddens,states


def single_layer_dynamic_lstm(input_x,n_steps,n_hidden):
    '''
    返回動態單層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量  形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    #能夠看作隱藏層
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias=1.0)
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=lstm_cell,inputs=input_x,dtype=tf.float32)

    #注意這裏輸出須要轉置  轉換爲時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])
    return hiddens,states



def single_layer_dynamic_gru(input_x,n_steps,n_hidden):
    '''
    返回動態單層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #能夠看作隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden)
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=gru_cell,inputs=input_x,dtype=tf.float32)
        
    
    #注意這裏輸出須要轉置  轉換爲時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states


def  mnist_rnn_classfication(flag):
    '''
    對MNIST進行分類
    
    arg:
        flags:表示構建的RNN結構是哪一種
            1：單層靜態LSTM
            2: 單層靜態GRU
            3：單層動態LSTM
            4: 單層動態GRU
    '''
        
    '''
    1. 導入數據集
    '''
    tf.reset_default_graph()
    from tensorflow.examples.tutorials.mnist import input_data
    
    #mnist是一個輕量級的類，它以numpy數組的形式存儲着訓練，校驗，測試數據集  one_hot表示輸出二值化後的10維
    mnist = input_data.read_data_sets('MNIST-data',one_hot=True)
    
    print(type(mnist)) #<class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>
    
    print('Training data shape:',mnist.train.images.shape)           #Training data shape: (55000, 784)
    print('Test data shape:',mnist.test.images.shape)                #Test data shape: (10000, 784)
    print('Validation data shape:',mnist.validation.images.shape)    #Validation data shape: (5000, 784)
    print('Training label shape:',mnist.train.labels.shape)          #Training label shape: (55000, 10)
    
    '''
    2 定義參數，以及網絡結構
    '''
    n_input = 28             #LSTM單元輸入節點的個數
    n_steps = 28             #序列長度
    n_hidden = 128           #LSTM單元輸出節點個數(即隱藏層個數)
    n_classes = 10           #類別
    batch_size = 128         #小批量大小
    training_step = 5000     #迭代次數
    display_step  = 200      #顯示步數
    learning_rate = 1e-4     #學習率  
    
    
    #定義佔位符
    #batch_size：表示一次的批次樣本數量batch_size  n_steps：表示時間序列總數  n_input：表示一個時序具體的數據長度  即一共28個時序，一個時序送入28個數據進入LSTM網絡
    input_x = tf.placeholder(dtype=tf.float32,shape=[None,n_steps,n_input])
    input_y = tf.placeholder(dtype=tf.float32,shape=[None,n_classes])


    #能夠看作隱藏層
    if  flag == 1:
        print('單層靜態LSTM網絡：')
        hiddens,states = single_layer_static_lstm(input_x,n_steps,n_hidden)
    elif flag == 2:
        print('單層靜態gru網絡：')
        hiddens,states = single_layer_static_gru(input_x,n_steps,n_hidden)
    elif  flag == 3:
        print('單層動態LSTM網絡：')
        hiddens,states = single_layer_dynamic_lstm(input_x,n_steps,n_hidden)
    elif flag == 4:
        print('單層動態gru網絡：')
        hiddens,states = single_layer_dynamic_gru(input_x,n_steps,n_hidden)
                
    print('hidden:',hiddens[-1].shape)      #(128,128)
    
    #取LSTM最後一個時序的輸出，而後通過全鏈接網絡獲得輸出值
    output = tf.contrib.layers.fully_connected(inputs=hiddens[-1],num_outputs=n_classes,activation_fn = tf.nn.softmax)
    
    '''
    3 設置對數似然損失函數
    '''
    #代價函數 J =-(Σy.logaL)/n    .表示逐元素乘
    cost = tf.reduce_mean(-tf.reduce_sum(input_y*tf.log(output),axis=1))
    
    '''
    4 求解
    '''
    train = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    #預測結果評估
    #tf.argmax(output,1)  按行統計最大值得索引
    correct = tf.equal(tf.argmax(output,1),tf.argmax(input_y,1))       #返回一個數組 表示統計預測正確或者錯誤 
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))             #求準確率
    
    
    #建立list 保存每一迭代的結果
    test_accuracy_list = []
    test_cost_list=[]
    
    
    with tf.Session() as sess:
        #使用會話執行圖
        sess.run(tf.global_variables_initializer())   #初始化變量    
        
        #開始迭代 使用Adam優化的隨機梯度降低法
        for i in range(training_step): 
            x_batch,y_batch = mnist.train.next_batch(batch_size = batch_size)   
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            
            #開始訓練
            train.run(feed_dict={input_x:x_batch,input_y:y_batch})   
            if (i+1) % display_step == 0:
                 #輸出訓練集準確率        
                training_accuracy,training_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})   
                print('Step {0}:Training set accuracy {1},cost {2}.'.format(i+1,training_accuracy,training_cost))
        
        
        #所有訓練完成作測試  分紅200次，一次測試50個樣本
        #輸出測試機準確率   若是一次性所有作測試，內容不夠用會出現OOM錯誤。因此測試時選取比較小的mini_batch來測試
        for i in range(200):        
            x_batch,y_batch = mnist.test.next_batch(batch_size = 50)      
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            test_accuracy,test_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})
            test_accuracy_list.append(test_accuracy)
            test_cost_list.append(test_cost) 
            if (i+1)% 20 == 0:
                 print('Step {0}:Test set accuracy {1},cost {2}.'.format(i+1,test_accuracy,test_cost)) 
        print('Test accuracy:',np.mean(test_accuracy_list))


if __name__ == '__main__':
    mnist_rnn_classfication(1)    #1：單層靜態LSTM
    mnist_rnn_classfication(2)    #2：單層靜態gru
    mnist_rnn_classfication(3)    #3：單層動態LSTM
    mnist_rnn_classfication(4)    #4：單層動態gru

以上是部分截圖....

七 Tensoflow實現多層單向RNN

# -*- coding: utf-8 -*-
"""
Created on Fri May 11 16:29:11 2018

@author: zy
"""

'''
使用TensorFlow庫實現單層RNN  分別使用LSTM單元，GRU單元，static_rnn和dynamic_rnn函數
'''

import tensorflow as tf
import numpy as np


'''
構建多層單向RNN網絡對MNIST數據集分類
'''
'''
MNIST數據集一個樣本長度爲28 x 28 
咱們能夠把一個樣本分紅28個時間段，每段內容是28個值，而後送入LSTM或者GRU網絡
咱們設置隱藏層的節點數爲128
'''


def multi_layer_static_lstm(input_x,n_steps,n_hidden):
    '''
    返回靜態多層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #若是是調用的是靜態rnn函數，須要這一步處理   即至關於把序列做爲第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    #能夠看作3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.LSTMCell(num_units=n_hidden))
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據通過cell1後還要通過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #靜態rnn函數傳入的是一個張量list  每個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=mcell,inputs=input_x1,dtype=tf.float32)

    return hiddens,states


def multi_layer_static_gru(input_x,n_steps,n_hidden):
    '''
    返回靜態多層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #若是是調用的是靜態rnn函數，須要這一步處理   即至關於把序列做爲第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

        #能夠看作3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.GRUCell(num_units=n_hidden))    
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據通過cell1後還要通過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #靜態rnn函數傳入的是一個張量list  每個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=mcell,inputs=input_x1,dtype=tf.float32)
        
    return hiddens,states


def multi_layer_static_mix(input_x,n_steps,n_hidden):
    '''
    返回靜態多層GRU和LSTM混合單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #若是是調用的是靜態rnn函數，須要這一步處理   即至關於把序列做爲第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)
    
    #能夠看作2個隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden*2)
    lstm_cell = tf.contrib.rnn.LSTMCell(num_units=n_hidden)
    
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據通過cell1後還要通過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell,gru_cell])
    
    #靜態rnn函數傳入的是一個張量list  每個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=mcell,inputs=input_x1,dtype=tf.float32)
    
    return hiddens,states


def multi_layer_dynamic_lstm(input_x,n_steps,n_hidden):
    '''
    返回動態多層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量  形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    #能夠看作3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.LSTMCell(num_units=n_hidden))
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據通過cell1後還要通過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=mcell,inputs=input_x,dtype=tf.float32)
    
    #注意這裏輸出須要轉置  轉換爲時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states


def multi_layer_dynamic_gru(input_x,n_steps,n_hidden):
    '''
    返回動態多層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    #能夠看作3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.GRUCell(num_units=n_hidden))
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據通過cell1後還要通過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=mcell,inputs=input_x,dtype=tf.float32)
    
    #注意這裏輸出須要轉置  轉換爲時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states   



def multi_layer_dynamic_mix(input_x,n_steps,n_hidden):
    '''
    返回動態多層GRU和LSTM混合單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
        
    #能夠看作2個隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden*2)
    lstm_cell = tf.contrib.rnn.LSTMCell(num_units=n_hidden)
    
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據通過cell1後還要通過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell,gru_cell])
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=mcell,inputs=input_x,dtype=tf.float32)
    
    #注意這裏輸出須要轉置  轉換爲時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states



def  mnist_rnn_classfication(flag):
    '''
    對MNIST進行分類
    
    arg:
        flags:表示構建的RNN結構是哪一種
            1：多層靜態LSTM
            2: 多層靜態GRU
            3：多層靜態LSTM和GRU混合
            4：多層動態LSTM
            5: 多層動態GRU
            6: 多層動態LSTM和GRU混合
    '''
        
    '''
    1. 導入數據集
    '''
    tf.reset_default_graph()
    from tensorflow.examples.tutorials.mnist import input_data
    
    #mnist是一個輕量級的類，它以numpy數組的形式存儲着訓練，校驗，測試數據集  one_hot表示輸出二值化後的10維
    mnist = input_data.read_data_sets('MNIST-data',one_hot=True)
    
    print(type(mnist)) #<class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>
    
    print('Training data shape:',mnist.train.images.shape)           #Training data shape: (55000, 784)
    print('Test data shape:',mnist.test.images.shape)                #Test data shape: (10000, 784)
    print('Validation data shape:',mnist.validation.images.shape)    #Validation data shape: (5000, 784)
    print('Training label shape:',mnist.train.labels.shape)          #Training label shape: (55000, 10)
    
    '''
    2 定義參數，以及網絡結構
    '''
    n_input = 28             #LSTM單元輸入節點的個數
    n_steps = 28             #序列長度
    n_hidden = 128           #LSTM單元輸出節點個數(即隱藏層個數)
    n_classes = 10           #類別
    batch_size = 128         #小批量大小
    training_step = 1000     #迭代次數
    display_step  = 200      #顯示步數
    learning_rate = 1e-4     #學習率  
    
    
    #定義佔位符
    #batch_size：表示一次的批次樣本數量batch_size  n_steps：表示時間序列總數  n_input：表示一個時序具體的數據長度  即一共28個時序，一個時序送入28個數據進入LSTM網絡
    input_x = tf.placeholder(dtype=tf.float32,shape=[None,n_steps,n_input])
    input_y = tf.placeholder(dtype=tf.float32,shape=[None,n_classes])


    #能夠看作隱藏層
    if  flag == 1:
        print('多層靜態LSTM網絡：')
        hiddens,states = multi_layer_static_lstm(input_x,n_steps,n_hidden)
    elif flag == 2:
        print('多層靜態gru網絡：')
        hiddens,states = multi_layer_static_gru(input_x,n_steps,n_hidden)
    elif flag == 3:
        print('多層靜態LSTM和gru混合網絡：')
        hiddens,states = multi_layer_static_mix(input_x,n_steps,n_hidden)
    elif  flag == 4:
        print('多層動態LSTM網絡：')
        hiddens,states = multi_layer_dynamic_lstm(input_x,n_steps,n_hidden)
    elif flag == 5:
        print('多層動態gru網絡：')
        hiddens,states = multi_layer_dynamic_gru(input_x,n_steps,n_hidden)
    elif flag == 6:
        print('多層動態LSTM和gru混合網絡：')
        hiddens,states = multi_layer_dynamic_mix(input_x,n_steps,n_hidden)
                
    print('hidden:',hiddens[-1].shape)      #(128,128)
    
    #取LSTM最後一個時序的輸出，而後通過全鏈接網絡獲得輸出值
    output = tf.contrib.layers.fully_connected(inputs=hiddens[-1],num_outputs=n_classes,activation_fn = tf.nn.softmax)
    
    '''
    3 設置對數似然損失函數
    '''
    #代價函數 J =-(Σy.logaL)/n    .表示逐元素乘
    cost = tf.reduce_mean(-tf.reduce_sum(input_y*tf.log(output),axis=1))
    
    '''
    4 求解
    '''
    train = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    #預測結果評估
    #tf.argmax(output,1)  按行統計最大值得索引
    correct = tf.equal(tf.argmax(output,1),tf.argmax(input_y,1))       #返回一個數組 表示統計預測正確或者錯誤 
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))             #求準確率
    
    
    #建立list 保存每一迭代的結果
    test_accuracy_list = []
    test_cost_list=[]
    
    
    with tf.Session() as sess:
        #使用會話執行圖
        sess.run(tf.global_variables_initializer())   #初始化變量    
        
        #開始迭代 使用Adam優化的隨機梯度降低法
        for i in range(training_step): 
            x_batch,y_batch = mnist.train.next_batch(batch_size = batch_size)   
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            
            #開始訓練
            train.run(feed_dict={input_x:x_batch,input_y:y_batch})   
            if (i+1) % display_step == 0:
                 #輸出訓練集準確率        
                training_accuracy,training_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})   
                print('Step {0}:Training set accuracy {1},cost {2}.'.format(i+1,training_accuracy,training_cost))
        
        
        #所有訓練完成作測試  分紅200次，一次測試50個樣本
        #輸出測試機準確率   若是一次性所有作測試，內容不夠用會出現OOM錯誤。因此測試時選取比較小的mini_batch來測試
        for i in range(200):        
            x_batch,y_batch = mnist.test.next_batch(batch_size = 50)      
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            test_accuracy,test_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})
            test_accuracy_list.append(test_accuracy)
            test_cost_list.append(test_cost) 
            if (i+1)% 20 == 0:
                 print('Step {0}:Test set accuracy {1},cost {2}.'.format(i+1,test_accuracy,test_cost)) 
        print('Test accuracy:',np.mean(test_accuracy_list))


if __name__ == '__main__':
    mnist_rnn_classfication(1)    #1：多層靜態LSTM
    mnist_rnn_classfication(2)    #2：多層靜態gru
    mnist_rnn_classfication(3)    #3: 多層靜態LSTM和gru混合網絡：
    mnist_rnn_classfication(4)    #4：多層動態LSTM
    mnist_rnn_classfication(5)    #5：多層動態gru
    mnist_rnn_classfication(6)    #3: 多層動態LSTM和gru混合網絡：

以上是部分截圖...

八 Tensoflow實現雙向RNN

# -*- coding: utf-8 -*-
"""
Created on Fri May 11 21:24:41 2018

@author: zy
"""


'''
使用TensorFlow庫實現單層雙向RNN  分別使用LSTM單元，GRU單元，static_rnn和dynamic_rnn函數
'''

import tensorflow as tf
import numpy as np


'''
構建雙向RNN網絡對MNIST數據集分類
'''
'''
MNIST數據集一個樣本長度爲28 x 28 
咱們能夠把一個樣本分紅28個時間段，每段內容是28個值，而後送入LSTM或者GRU網絡
咱們設置隱藏層的節點數爲128
'''


def single_layer_static_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回單層靜態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #若是是調用的是靜態rnn函數，須要這一步處理   即至關於把序列做爲第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)



    #正向
    lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)
    #反向
    lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)


    #靜態rnn函數傳入的是一個張量list  每個元素都是一個(batch_size,n_input)大小的張量  這裏的輸出hiddens是一個list 每個元素都是前向輸出,後向輸出的合併
    hiddens,fw_state,bw_state = tf.contrib.rnn.static_bidirectional_rnn(cell_fw=lstm_fw_cell,cell_bw=lstm_bw_cell,inputs=input_x1,dtype=tf.float32)
        
    print('hiddens:\n',type(hiddens),len(hiddens),hiddens[0].shape,hiddens[1].shape)    #<class 'list'> 28 (?, 256) (?, 256)
    
    return hiddens,fw_state,bw_state


def single_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回單層動態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
     #正向
    lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)
    #反向
    lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)

    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出是一個元組 每個元素也是這種形狀
    hiddens,state = tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell,cell_bw=lstm_bw_cell,inputs=input_x,dtype=tf.float32)
    
    print('hiddens:\n',type(hiddens),len(hiddens),hiddens[0].shape,hiddens[1].shape)   #<class 'tuple'> 2 (?, 28, 128) (?, 28, 128)
    #按axis=2合併 (?,28,128) (?,28,128)按最後一維合併(?,28,256)
    hiddens = tf.concat(hiddens,axis=2)
    
    #注意這裏輸出須要轉置  轉換爲時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
        
    return hiddens,state


def multi_layer_static_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回多層靜態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #若是是調用的是靜態rnn函數，須要這一步處理   即至關於把序列做爲第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    stacked_fw_rnn = []
    stacked_bw_rnn = []
    for i in range(3):
        #正向
        stacked_fw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))
        #反向
        stacked_bw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))


    #靜態rnn函數傳入的是一個張量list  每個元素都是一個(batch_size,n_input)大小的張量 這裏的輸出hiddens是一個list 每個元素都是前向輸出,後向輸出的合併
    hiddens,fw_state,bw_state = tf.contrib.rnn.stack_bidirectional_rnn(stacked_fw_rnn,stacked_bw_rnn,inputs=input_x1,dtype=tf.float32)
        
    print('hiddens:\n',type(hiddens),len(hiddens),hiddens[0].shape,hiddens[1].shape)    #<class 'list'> 28 (?, 256) (?, 256)

    return hiddens,fw_state,bw_state


def multi_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回多層動態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀爲[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''    
    stacked_fw_rnn = []
    stacked_bw_rnn = []
    for i in range(3):
        #正向
        stacked_fw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))
        #反向
        stacked_bw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))
    tf.contrib.rnn.MultiRNNCell
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀，n_input變成了正向和反向合併以後的 即n_input*2
    hiddens,fw_state,bw_state = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(stacked_fw_rnn,stacked_bw_rnn,inputs=input_x,dtype=tf.float32)
    
    print('hiddens:\n',type(hiddens),hiddens.shape)   # <class 'tensorflow.python.framework.ops.Tensor'> (?, 28, 256)
        
    #注意這裏輸出須要轉置  轉換爲時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    
    return hiddens,fw_state,bw_state





def  mnist_rnn_classfication(flag):
    '''
    對MNIST進行分類
    
    arg:
        flags:表示構建的RNN結構是哪一種
            1：單層靜態雙向LSTM
            2: 單層動態雙向LSTM
            3：多層靜態雙向LSTM
            4: 多層動態雙向LSTM

    '''
    '''
    1. 導入數據集
    '''
    tf.reset_default_graph()
    from tensorflow.examples.tutorials.mnist import input_data
    
    #mnist是一個輕量級的類，它以numpy數組的形式存儲着訓練，校驗，測試數據集  one_hot表示輸出二值化後的10維
    mnist = input_data.read_data_sets('MNIST-data',one_hot=True)
    
    print(type(mnist)) #<class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>
    
    print('Training data shape:',mnist.train.images.shape)           #Training data shape: (55000, 784)
    print('Test data shape:',mnist.test.images.shape)                #Test data shape: (10000, 784)
    print('Validation data shape:',mnist.validation.images.shape)    #Validation data shape: (5000, 784)
    print('Training label shape:',mnist.train.labels.shape)          #Training label shape: (55000, 10)
    
    '''
    2 定義參數，以及網絡結構
    '''
    n_input = 28             #LSTM單元輸入節點的個數
    n_steps = 28             #序列長度
    n_hidden = 128           #LSTM單元輸出節點個數(即隱藏層個數)
    n_classes = 10           #類別
    batch_size = 128         #小批量大小
    training_step = 1000     #迭代次數
    display_step  = 200      #顯示步數
    learning_rate = 1e-4     #學習率  
    
    
    #定義佔位符
    #batch_size：表示一次的批次樣本數量batch_size  n_steps：表示時間序列總數  n_input：表示一個時序具體的數據長度  即一共28個時序，一個時序送入28個數據進入LSTM網絡
    input_x = tf.placeholder(dtype=tf.float32,shape=[None,n_steps,n_input])
    input_y = tf.placeholder(dtype=tf.float32,shape=[None,n_classes])
    
    
    #能夠看作隱藏層
    if  flag == 1:
        print('單層靜態雙向LSTM網絡：')
        hiddens,fw_state,bw_state = single_layer_static_bi_lstm(input_x,n_steps,n_hidden)
    elif flag == 2:
        print('單層動態雙向LSTM網絡：')
        hiddens,bw_state = single_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden)
    elif flag == 3:
        print('多層靜態雙向LSTM網絡：')
        hiddens,fw_state,bw_state = multi_layer_static_bi_lstm(input_x,n_steps,n_hidden)
    elif  flag == 4:
        print('多層動態雙向LSTM網絡：')
        hiddens,fw_state,bw_state = multi_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden)


    
    #取LSTM最後一個時序的輸出，而後通過全鏈接網絡獲得輸出值
    output = tf.contrib.layers.fully_connected(inputs=hiddens[-1],num_outputs=n_classes,activation_fn = tf.nn.softmax)
    
    '''
    3 設置對數似然損失函數
    '''
    #代價函數 J =-(Σy.logaL)/n    .表示逐元素乘
    cost = tf.reduce_mean(-tf.reduce_sum(input_y*tf.log(output),axis=1))
    
    '''
    4 求解
    '''
    train = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    #預測結果評估
    #tf.argmax(output,1)  按行統計最大值得索引
    correct = tf.equal(tf.argmax(output,1),tf.argmax(input_y,1))       #返回一個數組 表示統計預測正確或者錯誤 
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))             #求準確率
    
    
    #建立list 保存每一迭代的結果
    test_accuracy_list = []
    test_cost_list=[]
    
    
    with tf.Session() as sess:
        #使用會話執行圖
        sess.run(tf.global_variables_initializer())   #初始化變量    
        
        #開始迭代 使用Adam優化的隨機梯度降低法
        for i in range(training_step): 
            x_batch,y_batch = mnist.train.next_batch(batch_size = batch_size)   
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            
            #開始訓練
            train.run(feed_dict={input_x:x_batch,input_y:y_batch})   
            if (i+1) % display_step == 0:
                 #輸出訓練集準確率        
                training_accuracy,training_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})   
                print('Step {0}:Training set accuracy {1},cost {2}.'.format(i+1,training_accuracy,training_cost))
        
        
        #所有訓練完成作測試  分紅200次，一次測試50個樣本
        #輸出測試機準確率   若是一次性所有作測試，內容不夠用會出現OOM錯誤。因此測試時選取比較小的mini_batch來測試
        for i in range(200):        
            x_batch,y_batch = mnist.test.next_batch(batch_size = 50)      
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            test_accuracy,test_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})
            test_accuracy_list.append(test_accuracy)
            test_cost_list.append(test_cost) 
            if (i+1)% 20 == 0:
                 print('Step {0}:Test set accuracy {1},cost {2}.'.format(i+1,test_accuracy,test_cost)) 
        print('Test accuracy:',np.mean(test_accuracy_list))
        

if __name__ == '__main__':
    mnist_rnn_classfication(1)    #1：單層靜態雙向LSTM網絡：
    mnist_rnn_classfication(2)    #2：單層動態雙向LSTM網絡：
    mnist_rnn_classfication(3)    #3: 多層靜態雙向LSTM網絡：
    mnist_rnn_classfication(4)    #4：多層動態雙向LSTM網絡：