第14章循環神經網絡

時間 2019-12-13

標籤循環神經網絡简体版

原文原文鏈接

目錄python

第14章循環神經網絡

第14章循環神經網絡

寫在前面

參考書git

《機器學習實戰——基於Scikit-Learn和TensorFlow》github

工具網絡

python3.5.1，Jupyter Notebook, Pycharmapp

TensorFlow中的基本RNN

假設RNN只運行兩個時間迭代，每一個時間迭代輸入一個大小爲3的向量。

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: simple_rnn.py
@time: 2019/6/15 16:53
@desc: 實現一個最簡單的RNN網絡。咱們將使用tanh激活函數建立一個由5個
        神經元組成的一層RNN。假設RNN只運行兩個時間迭代，每一個時間迭代
        輸入一個大小爲3的向量。
"""
import tensorflow as tf
import numpy as np

n_inputs = 3
n_neurons = 5

x0 = tf.placeholder(tf.float32, [None, n_inputs])
x1 = tf.placeholder(tf.float32, [None, n_inputs])

Wx = tf.Variable(tf.random_normal(shape=[n_inputs, n_neurons], dtype=tf.float32))
Wy = tf.Variable(tf.random_normal(shape=[n_neurons, n_neurons], dtype=tf.float32))
b = tf.Variable(tf.zeros([1, n_neurons], dtype=tf.float32))

y0 = tf.tanh(tf.matmul(x0, Wx) + b)
y1 = tf.tanh(tf.matmul(y0, Wy) + tf.matmul(x1, Wx) + b)

init = tf.global_variables_initializer()

# Mini-batch：包含4個實例的小批次
x0_batch = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 0, 1]])   # t=0
x1_batch = np.array([[9, 8, 7], [0, 0, 0], [6, 5, 4], [3, 2, 1]])   # t=1

with tf.Session() as sess:
    init.run()
    y0_val, y1_val = sess.run([y0, y1], feed_dict={x0: x0_batch, x1: x1_batch})
    print(y0_val)
    print('-'*50)
    print(y1_val)

運行結果

經過時間靜態展開

static_rnn()函數經過鏈式單元來建立一個展開的RNN網絡。

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: simple_rnn2.py
@time: 2019/6/15 17:06
@desc: 與前一個程序相同
"""

import tensorflow as tf
from tensorflow.contrib.rnn import BasicRNNCell
from tensorflow.contrib.rnn import static_rnn
import numpy as np

n_inputs = 3
n_neurons = 5

x0 = tf.placeholder(tf.float32, [None, n_inputs])
x1 = tf.placeholder(tf.float32, [None, n_inputs])

basic_cell = BasicRNNCell(num_units=n_neurons)
output_seqs, states = static_rnn(basic_cell, [x0, x1], dtype=tf.float32)

y0, y1 = output_seqs

init = tf.global_variables_initializer()

# Mini-batch：包含4個實例的小批次
x0_batch = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 0, 1]])   # t=0
x1_batch = np.array([[9, 8, 7], [0, 0, 0], [6, 5, 4], [3, 2, 1]])   # t=1

with tf.Session() as sess:
    init.run()
    y0_val, y1_val = sess.run([y0, y1], feed_dict={x0: x0_batch, x1: x1_batch})
    print(y0_val)
    print('-'*50)
    print(y1_val)

運行結果

經過時間動態展開

利用dynamic_rnn()和while_loop()

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: dynamic_rnn1.py
@time: 2019/6/16 13:37
@desc:  經過時間動態展開 dynamic_rnn
"""

import tensorflow as tf
from tensorflow.contrib.rnn import BasicRNNCell
import numpy as np


n_steps = 2
n_inputs = 3
n_neurons = 5

x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
basic_cell = BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, x, dtype=tf.float32)

x_batch = np.array([
    [[0, 1, 2], [9, 8, 7]],
    [[3, 4, 5], [0, 0, 0]],
    [[6, 7, 8], [6, 5, 4]],
    [[9, 0, 1], [3, 2, 1]],
])

init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    outputs_val = outputs.eval(feed_dict={x: x_batch})
    print(outputs_val)

運行結果

這時問題來了，動態、靜態這兩種有啥區別呢？dom

參考：tensor flow dynamic_rnn 與rnn有啥區別？機器學習

處理長度可變輸入序列

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: dynamic_rnn2.py
@time: 2019/6/17 9:42
@desc: 處理長度可變輸入序列
"""

import tensorflow as tf
from tensorflow.contrib.rnn import BasicRNNCell
import numpy as np


n_steps = 2
n_inputs = 3
n_neurons = 5

seq_length = tf.placeholder(tf.int32, [None])
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
basic_cell = BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, x, dtype=tf.float32, sequence_length=seq_length)

# 假設第二個輸出序列僅包含一個輸入。爲了適應輸入張量X，必須使用零向量填充輸入。
x_batch = np.array([
    [[0, 1, 2], [9, 8, 7]],
    [[3, 4, 5], [0, 0, 0]],
    [[6, 7, 8], [6, 5, 4]],
    [[9, 0, 1], [3, 2, 1]],
])

seq_length_batch = np.array([2, 1, 2, 2])

init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run([outputs, states], feed_dict={x: x_batch, seq_length: seq_length_batch})
    print(outputs_val)

運行結果

結果分析函數

RNN每一次迭代超過輸入長度的部分輸出零向量。工具

此外，狀態張量包含了每一個單元的最終狀態（除了零向量）。oop

處理長度可變輸出序列

最一般的解決方案是定義一種被稱爲序列結束令牌（EOS token）的特殊輸出。

訓練RNN

經過時間反向傳播（BPTT）：梯度經過被成本函數使用的全部輸出向後流動，而不是僅僅經過輸出最終輸出。

訓練序列分類器

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: rnn_test1.py
@time: 2019/6/17 10:28
@desc: 訓練一個識別MNIST圖像的RNN網絡。
"""

import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import BasicRNNCell

from tensorflow.examples.tutorials.mnist import input_data


n_steps = 28
n_inputs = 28
n_neurons = 150
n_outputs = 10

learning_rate = 0.001

n_epochs = 100
batch_size = 150

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.int32, [None])

basic_cell = BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)

logits = fully_connected(states, n_outputs, activation_fn=None)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)

loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
init = tf.global_variables_initializer()

# 加載MNIST數據，並按照網格的要求改造測試數據。
mnist = input_data.read_data_sets('D:/Python3Space/BookStudy/book2/MNIST_data/')
X_test = mnist.test.images.reshape((-1, n_steps, n_inputs))
y_test = mnist.test.labels

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            X_batch = X_batch.reshape((-1, n_steps, n_inputs))
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
        print(epoch, "Train accuracy: ", acc_train, "Test accuracy: ", acc_test)

運行結果

tf.nn.in_top_k：主要是用於計算預測的結果和實際結果的是否相等，返回一個bool類型的張量，tf.nn.in_top_k(prediction, target, K):prediction就是表示你預測的結果，大小就是預測樣本的數量乘以輸出的維度，類型是tf.float32等。target就是實際樣本類別的索引，大小就是樣本數量的個數。K表示每一個樣本的預測結果的前K個最大的數裏面是否含有target中的值。通常都是取1。

參考連接：tf.nn.in_top_k的用法

tf.cast：將x的數據格式轉化成dtype。例如，原來x的數據格式是bool，那麼將其轉化成float之後，就可以將其轉化成0和1的序列。反之也能夠。

cast(
    x,
    dtype,
    name=None
)

訓練預測時間序列

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: rnn_test2.py
@time: 2019/6/18 10:11
@desc: 訓練預測時間序列
"""

import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import BasicRNNCell
from tensorflow.contrib.rnn import OutputProjectionWrapper
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


n_steps = 100
n_inputs = 1
n_neurous = 100
n_outputs = 1

learning_rate = 0.001

n_iterations = 10000
batch_size = 50

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

# 如今在每一個時間迭代，有一個大小爲100的輸出向量，可是實際上咱們須要一個單獨的輸出值。
# 最簡單的解決方案是將單元格包裝在OutputProjectionWrapper中。
# cell = OutputProjectionWrapper(BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu), output_size=n_outputs)

# 用技巧提升速度
cell = BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu)
rnn_outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 15, 101)
with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[:-1][np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    X_new = X_data[1:][np.newaxis, :, np.newaxis]
    y_true = X_new * np.sin(X_new) / 3 + 2 * np.sin(5 * X_new)
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(X_new.flatten())
print('真實結果：', y_true.flatten())
print('預測結果：', y_pred.flatten())

fig = plt.figure(dpi=150)
plt.plot(X_new.flatten(), y_true.flatten(), 'r', label='y_true')
plt.plot(X_new.flatten(), y_pred.flatten(), 'b', label='y_pred')
plt.legend()
plt.show()

運行結果

創造性的RNN

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: rnn_test3.py
@time: 2019/6/19 8:47
@desc: 創造性RNN
"""

import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import BasicRNNCell
from tensorflow.contrib.rnn import OutputProjectionWrapper
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


n_steps = 20
n_inputs = 1
n_neurous = 100
n_outputs = 1

learning_rate = 0.001

n_iterations = 10000
batch_size = 50

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

# 如今在每一個時間迭代，有一個大小爲100的輸出向量，可是實際上咱們須要一個單獨的輸出值。
# 最簡單的解決方案是將單元格包裝在OutputProjectionWrapper中。
# cell = OutputProjectionWrapper(BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu), output_size=n_outputs)

# 用技巧提升速度
cell = BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu)
rnn_outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 19, 20)
with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    # sequence = [0.] * n_steps
    sequence = list(y_batch.flatten())
    for iteration in range(300):
        XX_batch = np.array(sequence[-n_steps:]).reshape(1, n_steps, 1)
        y_pred = sess.run(outputs, feed_dict={X: XX_batch})
        sequence.append(y_pred[0, -1, 0])

fig = plt.figure(dpi=150)
plt.plot(sequence)
plt.legend()
plt.show()

結果1：使用零值做爲種子序列

結果2：使用實例做爲種子序列

深層RNN

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: rnn_test4.py
@time: 2019/6/19 10:10
@desc: 深層RNN
"""

import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import BasicRNNCell
from tensorflow.contrib.rnn import MultiRNNCell
from tensorflow.contrib.rnn import OutputProjectionWrapper
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


n_steps = 100
n_inputs = 1
n_neurous = 100
n_outputs = 1

n_layers = 10

learning_rate = 0.00001

n_iterations = 10000
batch_size = 50

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

# 如今在每一個時間迭代，有一個大小爲100的輸出向量，可是實際上咱們須要一個單獨的輸出值。
# 最簡單的解決方案是將單元格包裝在OutputProjectionWrapper中。
# cell = OutputProjectionWrapper(BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu), output_size=n_outputs)

# 用技巧提升速度
# cell = BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu)
# multi_layer_cell = MultiRNNCell([cell] * n_layers)
layers = [BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu) for _ in range(n_layers)]
multi_layer_cell = MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 15, 101)
with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[:-1][np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    X_new = X_data[1:][np.newaxis, :, np.newaxis]
    y_true = X_new * np.sin(X_new) / 3 + 2 * np.sin(5 * X_new)
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(X_new.flatten())
print('真實結果：', y_true.flatten())
print('預測結果：', y_pred.flatten())

plt.rcParams['font.sans-serif']=['SimHei']  # 用來正常顯示中文標籤
plt.rcParams['axes.unicode_minus'] = False  # 用來正常顯示負號
fig = plt.figure(dpi=150)
plt.plot(X_new.flatten(), y_true.flatten(), 'r', label='y_true')
plt.plot(X_new.flatten(), y_pred.flatten(), 'b', label='y_pred')
plt.title('深層RNN預測')
plt.legend()
plt.show()

運行結果

在多個GPU中分配一個深層RNN

並無多個GPU，因此只是整理了一下代碼。。。

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: rnn_gpu.py
@time: 2019/6/19 12:11
@desc: 在多個GPU中分配一個深層RNN
"""

import tensorflow as tf
import numpy as np
from tensorflow.contrib.rnn import RNNCell
from tensorflow.contrib.rnn import BasicRNNCell
from tensorflow.contrib.rnn import MultiRNNCell
from tensorflow.contrib.layers import fully_connected

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


class DeviceCellWrapper(RNNCell):
    def __init__(self, device, cell):
        self._cell = cell
        self._device = device

    @property
    def state_size(self):
        return self._cell.state_size

    @property
    def output(self):
        return self._cell.output_size

    def __call__(self, inputs, state, scope=None):
        with tf.device(self._device):
            return self._cell(inputs, state, scope)


n_steps = 100
n_inputs = 1
n_neurous = 100
n_outputs = 1

n_layers = 10

learning_rate = 0.00001

n_iterations = 10000
batch_size = 50

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])


devices = ['/gpu:0', '/gpu:1', '/gpu:2']
cells = [DeviceCellWrapper(dev, BasicRNNCell(num_units=n_neurous)) for dev in devices]
multi_layer_cell = MultiRNNCell(cells)
rnn_outputs,  states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)

stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 15, 101)
with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[:-1][np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    X_new = X_data[1:][np.newaxis, :, np.newaxis]
    y_true = X_new * np.sin(X_new) / 3 + 2 * np.sin(5 * X_new)
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(X_new.flatten())
print('真實結果：', y_true.flatten())
print('預測結果：', y_pred.flatten())

plt.rcParams['font.sans-serif']=['SimHei']  # 用來正常顯示中文標籤
plt.rcParams['axes.unicode_minus'] = False  # 用來正常顯示負號
fig = plt.figure(dpi=150)
plt.plot(X_new.flatten(), y_true.flatten(), 'r', label='y_true')
plt.plot(X_new.flatten(), y_pred.flatten(), 'b', label='y_pred')
plt.title('深層RNN預測')
plt.legend()
plt.show()

應用丟棄機制

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: rnn_test5.py
@time: 2019/6/19 13:44
@desc: 應用丟棄機制
"""


import tensorflow as tf
import sys
import numpy as np
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import BasicRNNCell
from tensorflow.contrib.rnn import MultiRNNCell
from tensorflow.contrib.rnn import OutputProjectionWrapper
from tensorflow.contrib.rnn import DropoutWrapper
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


is_training = (sys.argv[-1] == "train")
keep_prob = 0.5
n_steps = 100
n_inputs = 1
n_neurous = 100
n_outputs = 1

n_layers = 10

learning_rate = 0.00001

n_iterations = 10000
batch_size = 50


def make_rnn_cell():
    return BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu)


def make_drop_cell():
    return DropoutWrapper(make_rnn_cell(), input_keep_prob=keep_prob)


X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

# 如今在每一個時間迭代，有一個大小爲100的輸出向量，可是實際上咱們須要一個單獨的輸出值。
# 最簡單的解決方案是將單元格包裝在OutputProjectionWrapper中。
# cell = OutputProjectionWrapper(BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu), output_size=n_outputs)

# 用技巧提升速度
layers = [make_rnn_cell() for _ in range(n_layers)]
if is_training:
    layers = [make_drop_cell() for _ in range(n_layers)]

multi_layer_cell = MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 15, 101)

'''
# 應用丟棄機制
saver = tf.train.Saver()

with tf.Session() as sess:
    if is_training:
        init.run()
        for iteration in range(n_iterations):
            # train the model
        save_path = saver.save(sess, "./my_model.ckpt")
    else:
        saver.restore(sess, "./my_model.ckpt")
        # use the model
'''

with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[:-1][np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    X_new = X_data[1:][np.newaxis, :, np.newaxis]
    y_true = X_new * np.sin(X_new) / 3 + 2 * np.sin(5 * X_new)
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(X_new.flatten())
print('真實結果：', y_true.flatten())
print('預測結果：', y_pred.flatten())

plt.rcParams['font.sans-serif']=['SimHei']  # 用來正常顯示中文標籤
plt.rcParams['axes.unicode_minus'] = False  # 用來正常顯示負號
fig = plt.figure(dpi=150)
plt.plot(X_new.flatten(), y_true.flatten(), 'r', label='y_true')
plt.plot(X_new.flatten(), y_pred.flatten(), 'b', label='y_pred')
plt.title('深層RNN預測')
plt.legend()
plt.show()

運行結果

LSTM單元

四個不一樣的全鏈接層：主層：tanh，直接輸出$y_t和h_t$；忘記門限：logitstic，控制着哪些長期狀態應該被丟棄；輸入門限：控制着主層的哪些部分會被加入到長期狀態（這就是「部分存儲」的緣由）；輸出門限：控制着哪些長期狀態應該在這個時間迭代被讀取和輸出（$h_t和y_t$）。

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: lstm_test1.py
@time: 2019/6/19 14:51
@desc: LSTM單元
"""

import tensorflow as tf
import sys
import numpy as np
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import BasicLSTMCell
from tensorflow.contrib.rnn import MultiRNNCell
from tensorflow.contrib.rnn import OutputProjectionWrapper
from tensorflow.contrib.rnn import DropoutWrapper
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


is_training = (sys.argv[-1] == "train")
keep_prob = 0.5
n_steps = 100
n_inputs = 1
n_neurous = 100
n_outputs = 1

n_layers = 5

learning_rate = 0.00001

n_iterations = 10000
batch_size = 50


def make_rnn_cell():
    return BasicLSTMCell(num_units=n_neurous, activation=tf.nn.relu)


def make_drop_cell():
    return DropoutWrapper(make_rnn_cell(), input_keep_prob=keep_prob)


X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

# 如今在每一個時間迭代，有一個大小爲100的輸出向量，可是實際上咱們須要一個單獨的輸出值。
# 最簡單的解決方案是將單元格包裝在OutputProjectionWrapper中。
# cell = OutputProjectionWrapper(BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu), output_size=n_outputs)

# 用技巧提升速度
layers = [make_rnn_cell() for _ in range(n_layers)]
# if is_training:
#     layers = [make_drop_cell() for _ in range(n_layers)]

multi_layer_cell = MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 15, 101)

with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[:-1][np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    X_new = X_data[1:][np.newaxis, :, np.newaxis]
    y_true = X_new * np.sin(X_new) / 3 + 2 * np.sin(5 * X_new)
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(X_new.flatten())
print('真實結果：', y_true.flatten())
print('預測結果：', y_pred.flatten())

plt.rcParams['font.sans-serif']=['SimHei']  # 用來正常顯示中文標籤
plt.rcParams['axes.unicode_minus'] = False  # 用來正常顯示負號
fig = plt.figure(dpi=150)
plt.plot(X_new.flatten(), y_true.flatten(), 'r', label='y_true')
plt.plot(X_new.flatten(), y_pred.flatten(), 'b', label='y_pred')
plt.title('深層RNN預測')
plt.legend()
plt.show()

運行結果

窺視孔鏈接

窺視孔鏈接（peephole connections）：LSTM變體，當前一個長期狀態$c_{(t-1)}$做爲輸入傳入忘記門限和輸入門限，當前的長期狀態$c_{(t)}$做爲輸入傳出門限控制器。

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: lstm_test2.py
@time: 2019/6/19 16:36
@desc: 窺視孔鏈接
"""

import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import LSTMCell
from tensorflow.contrib.rnn import MultiRNNCell
from tensorflow.contrib.rnn import OutputProjectionWrapper
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


n_steps = 100
n_inputs = 1
n_neurous = 100
n_outputs = 1

n_layers = 10

learning_rate = 0.00001

n_iterations = 10000
batch_size = 50

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

# 如今在每一個時間迭代，有一個大小爲100的輸出向量，可是實際上咱們須要一個單獨的輸出值。
# 最簡單的解決方案是將單元格包裝在OutputProjectionWrapper中。
# cell = OutputProjectionWrapper(BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu), output_size=n_outputs)

# 用技巧提升速度
# cell = BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu)
# multi_layer_cell = MultiRNNCell([cell] * n_layers)
layers = [LSTMCell(num_units=n_neurous, activation=tf.nn.relu, use_peepholes=True) for _ in range(n_layers)]
multi_layer_cell = MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 15, 101)
with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[:-1][np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    X_new = X_data[1:][np.newaxis, :, np.newaxis]
    y_true = X_new * np.sin(X_new) / 3 + 2 * np.sin(5 * X_new)
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(X_new.flatten())
print('真實結果：', y_true.flatten())
print('預測結果：', y_pred.flatten())

plt.rcParams['font.sans-serif']=['SimHei']  # 用來正常顯示中文標籤
plt.rcParams['axes.unicode_minus'] = False  # 用來正常顯示負號
fig = plt.figure(dpi=150)
plt.plot(X_new.flatten(), y_true.flatten(), 'r', label='y_true')
plt.plot(X_new.flatten(), y_pred.flatten(), 'b', label='y_pred')
plt.title('深層RNN預測')
plt.legend()
plt.show()

運行結果

GRU單元

GRU單元是LSTM的簡化版本，其主要簡化了：

兩個狀態向量合併爲一個向量$h_{(t)}$。
一個門限控制器同時控制忘記門限和輸入門限。若是門限控制器的輸出是1，那麼輸入門限打開而忘記門限關閉。若是輸出是0，則恰好相反。換句話說，不管什麼時候須要存儲一個記憶，它將被存在的位置將首先被擦除。這其實是LSTM單元的一個常見變體。
沒有輸出門限。在每一個時間迭代，輸出向量的所有狀態被直接輸出。然而，GRU有一個新的門限控制器來控制前一個狀態的哪部分將顯示給主層。

GRU 是新一代的循環神經網絡，與 LSTM 很是類似。與 LSTM 相比，GRU 去除掉了細胞狀態，使用隱藏狀態來進行信息的傳遞。它只包含兩個門：更新門和重置門。
更新門：更新門的做用相似於 LSTM 中的遺忘門和輸入門。它決定了要忘記哪些信息以及哪些新信息須要被添加。
重置門：重置門用於決定遺忘先前信息的程度。
GRU 的張量運算較少，所以它比 LSTM 的訓練更快一下。很難去斷定這二者到底誰更好，研究人員一般會二者都試一下，而後選擇最合適的。

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# coding=utf-8 

"""
@author: Li Tian
@contact: 694317828@qq.com
@software: pycharm
@file: gru_test1.py
@time: 2019/6/19 17:07
@desc: GRU單元
"""

import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.rnn import GRUCell
from tensorflow.contrib.rnn import MultiRNNCell
from tensorflow.contrib.rnn import OutputProjectionWrapper
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()


n_steps = 100
n_inputs = 1
n_neurous = 100
n_outputs = 1

n_layers = 10

learning_rate = 0.00001

n_iterations = 10000
batch_size = 50

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

# 如今在每一個時間迭代，有一個大小爲100的輸出向量，可是實際上咱們須要一個單獨的輸出值。
# 最簡單的解決方案是將單元格包裝在OutputProjectionWrapper中。
# cell = OutputProjectionWrapper(BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu), output_size=n_outputs)

# 用技巧提升速度
# cell = BasicRNNCell(num_units=n_neurous, activation=tf.nn.relu)
# multi_layer_cell = MultiRNNCell([cell] * n_layers)
layers = [GRUCell(num_units=n_neurous, activation=tf.nn.relu) for _ in range(n_layers)]
multi_layer_cell = MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurous])
stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

X_data = np.linspace(0, 15, 101)
with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch = X_data[:-1][np.newaxis, :, np.newaxis]
        y_batch = X_batch * np.sin(X_batch) / 3 + 2 * np.sin(5 * X_batch)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

    X_new = X_data[1:][np.newaxis, :, np.newaxis]
    y_true = X_new * np.sin(X_new) / 3 + 2 * np.sin(5 * X_new)
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(X_new.flatten())
print('真實結果：', y_true.flatten())
print('預測結果：', y_pred.flatten())

plt.rcParams['font.sans-serif']=['SimHei']  # 用來正常顯示中文標籤
plt.rcParams['axes.unicode_minus'] = False  # 用來正常顯示負號
fig = plt.figure(dpi=150)
plt.plot(X_new.flatten(), y_true.flatten(), 'r', label='y_true')
plt.plot(X_new.flatten(), y_pred.flatten(), 'b', label='y_pred')
plt.title('GRU預測')
plt.legend()
plt.show()

運行結果

參考連接：深刻理解LSTM，窺視孔鏈接，GRU

更好的參考連接：GRU與LSTM總結

其餘變形的LSTM網絡總結

參考連接：直觀理解LSTM（長短時記憶網絡）

窺視孔鏈接LSTM

一種流行的LSTM變種，由Gers和Schmidhuber （2000）提出，加入了「窺視孔鏈接」（peephole connections）。這意味着門限層也將單元狀態做爲輸入。
耦合遺忘輸入門限的LSTM

就是使用耦合遺忘和輸入門限。咱們不單獨決定遺忘哪些、添加哪些新信息，而是一塊兒作出決定。在輸入的時候才進行遺忘。在遺忘某些舊信息時纔將新值添加到狀態中。
門限遞歸單元（GRU）

它將遺忘和輸入門限結合輸入到單個「更新門限」中。一樣還將單元狀態和隱藏狀態合併，並作出一些其餘變化。所得模型比標準LSTM模型要簡單，這種作法愈來愈流行。

部分課後題的摘抄

在構建RNN時使用dynamic_rnn()而不是static_rnn()的優點是什麼？
1. 它基於while_loop()操做，能夠在反向傳播期間將GPU內存交互到CPU內存，從而避免了內存溢出。
2. 它更加易於使用，由於其採起單張量做爲輸入和輸出（覆蓋全部時間步長），而不是一個張量列表（每一個時間一個步長）。不須要入棧、出棧，或轉置。
3. 它產生的圖形更小，更容易在TensorBoard中可視化。
如何處理變長輸入序列？變長輸出序列又會怎麼樣？
1. 爲了處理可變長度的輸入序列，最簡單的方法是在調用static_rnn()或dynamic_rnn()方法時傳入sequence_length參數。另外一個方法是填充長度較小的輸入（好比，用0填充）來使其與最大輸入長度相同（這可能比第一種方法快，由於全部輸入序列具備相同的長度）。
2. 爲了處理可變長度的輸出序列，若是事先知道每一個輸出序列的長度，就可使用sequence_length參數（例如，序列到序列RNN使用暴力評分標記視頻中的每一幀：輸出序列和輸入序列長度徹底一致）。若是事先不知道輸出序列的長度，則可使用填充方法：始終輸出相同大小的序列，可是忽略end-of-sequence標記以後的任何輸出（在計算成本函數時忽略它們）。
在多個GPU之間分配訓練和執行層次RNN的常見方式是什麼？

爲了在多個GPU直接分配訓練並執行深度RNN，一個經常使用的簡單技術是將每一個層放在不一樣的GPU上。