Tensorflow快餐教程(12) - 用機器寫莎士比亞的戲劇

時間 2019-12-04

標籤 tensorflow 快餐教程機器莎士比亞戲劇简体版

原文原文鏈接

摘要：如何用TFLearn，Keras等高層框架來學習自動生成莎翁的戲劇或者尼采的哲學文章git

高層框架：TFLearn和Keras

上一節咱們學習了Tensorflow的高層API封裝，能夠經過簡單的幾步就生成一個DNN分類器來解決MNIST手寫識別問題。github

儘管Tensorflow也在不斷推動Estimator API。可是，這並非工具的所有。在Tensorflow官方的API方外，咱們還有強大的工具，好比TFLearn和Keras。後端

這節咱們就作一個武器庫的展現，看看專門爲Tensorflow作的高層框架TFLearn和跨Tensorflow和CNTK幾種後端的Keras爲咱們作了哪些強大的功能封裝。網絡

機器來寫莎士比亞的戲劇

以前咱們簡單介紹了強大的用於處理序列數據的RNN。RNN比起其它網絡的重要優勢是能夠學習了序列數據以後進行自生成。
好比，學習《唐詩三百首》能夠寫詩，學習了Linux Kernel源代碼就能寫C代碼（雖然基本上編譯不過）。app

咱們首先來一個自動寫莎士比亞戲劇的例子吧。
在看代碼以前我先嘮叨幾句。深度學習對於數據量的要求仍是比較高的，像訓練自動生成的這種，通常得幾百萬到幾千萬量級的訓練數據下才能有好的效果。好比只用幾篇小說來訓練確定生成不知所云的小說。就算是人類也作不到只學幾首詩就會寫詩麼。
另一點就是，訓練數據量上來了，對於時間和算力的要求也是指數級提升的。
好比咱們用莎翁的戲劇來訓練，雖然數據量也不是特別的大，也就16萬多行，可是在CPU上訓練的話也不是一兩個小時能搞定的，大約是天爲單位。
後面咱們舉圖像或視頻的例子，在CPU上訓，論月也是並不意外的。框架

那麼，這個須要訓一天左右的例子，代碼會多複雜呢？答案是核心代碼不過10幾行，總共加上數據處理和測試代碼也不過50行左右。dom

from __future__ import absolute_import, division, print_function

import os
import pickle
from six.moves import urllib

import tflearn
from tflearn.data_utils import *

path = "shakespeare_input.txt"
char_idx_file = 'char_idx.pickle'

if not os.path.isfile(path):
    urllib.request.urlretrieve("https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/shakespeare_input.txt", path)

maxlen = 25

char_idx = None
if os.path.isfile(char_idx_file):
  print('Loading previous char_idx')
  char_idx = pickle.load(open(char_idx_file, 'rb'))

X, Y, char_idx = \
    textfile_to_semi_redundant_sequences(path, seq_maxlen=maxlen, redun_step=3,
                                         pre_defined_char_idx=char_idx)

pickle.dump(char_idx, open(char_idx_file,'wb'))

g = tflearn.input_data([None, maxlen, len(char_idx)])
g = tflearn.lstm(g, 512, return_seq=True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512, return_seq=True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512)
g = tflearn.dropout(g, 0.5)
g = tflearn.fully_connected(g, len(char_idx), activation='softmax')
g = tflearn.regression(g, optimizer='adam', loss='categorical_crossentropy',
                       learning_rate=0.001)

m = tflearn.SequenceGenerator(g, dictionary=char_idx,
                              seq_maxlen=maxlen,
                              clip_gradients=5.0,
                              checkpoint_path='model_shakespeare')

for i in range(50):
    seed = random_sequence_from_textfile(path, maxlen)
    m.fit(X, Y, validation_set=0.1, batch_size=128,
          n_epoch=1, run_id='shakespeare')
    print("-- TESTING...")
    print("-- Test with temperature of 1.0 --")
    print(m.generate(600, temperature=1.0, seq_seed=seed))
    print("-- Test with temperature of 0.5 --")
    print(m.generate(600, temperature=0.5, seq_seed=seed))

上面的例子須要使用TFLearn框架，能夠經過工具

pip install tflearn

來安裝。學習

TFLearn是專門爲Tensorflow開發的高層次API框架。
用TFLearn API的主要好處是可讀性更好，好比剛纔的核心代碼：測試

g = tflearn.input_data([None, maxlen, len(char_idx)])
g = tflearn.lstm(g, 512, return_seq=True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512, return_seq=True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512)
g = tflearn.dropout(g, 0.5)
g = tflearn.fully_connected(g, len(char_idx), activation='softmax')
g = tflearn.regression(g, optimizer='adam', loss='categorical_crossentropy',
                       learning_rate=0.001)

m = tflearn.SequenceGenerator(g, dictionary=char_idx,
                              seq_maxlen=maxlen,
                              clip_gradients=5.0,
                              checkpoint_path='model_shakespeare')

從輸入數據，三層LSTM，三層Dropout，最後是一個softmax的全鏈接層。

咱們再來看一個預測泰坦尼克號倖存機率的網絡的結構：

# Build neural network
net = tflearn.input_data(shape=[None, 6])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)

# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)

從生成城市名字提及

你們的莎士比亞模型應該正在訓練過程當中吧，我們閒着也是閒着，不如從一個更簡單的例子來看看這個生成過程。
咱們仍是取TFLearn的官方例子，經過讀取美國主要城市名字列表來生成一些新的城市名字。

咱們以Z開頭的城市爲例：

Zachary
Zafra
Zag
Zahl
Zaleski
Zalma
Zama
Zanesfield
Zanesville
Zap
Zapata
Zarah
Zavalla
Zearing
Zebina
Zebulon
Zeeland
Zeigler
Zela
Zelienople
Zell
Zellwood
Zemple
Zena
Zenda
Zenith
Zephyr
Zephyr Cove
Zephyrhills
Zia Pueblo
Zillah
Zilwaukee
Zim
Zimmerman
Zinc
Zion
Zionsville
Zita
Zoar
Zolfo Springs
Zona
Zumbro Falls
Zumbrota
Zuni
Zurich
Zwingle
Zwolle

一共20580個城市。這個訓練就快多了，在純CPU上訓練，大約5到6分鐘能夠訓練一輪。

代碼以下，跟上面寫莎翁的戲劇的一模一樣：

from __future__ import absolute_import, division, print_function

import os
from six import moves
import ssl

import tflearn
from tflearn.data_utils import *

path = "US_Cities.txt"
if not os.path.isfile(path):
    context = ssl._create_unverified_context()
    moves.urllib.request.urlretrieve("https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/US_Cities.txt", path, context=context)

maxlen = 20

string_utf8 = open(path, "r").read().decode('utf-8')
X, Y, char_idx = \
    string_to_semi_redundant_sequences(string_utf8, seq_maxlen=maxlen, redun_step=3)

g = tflearn.input_data(shape=[None, maxlen, len(char_idx)])
g = tflearn.lstm(g, 512, return_seq=True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512)
g = tflearn.dropout(g, 0.5)
g = tflearn.fully_connected(g, len(char_idx), activation='softmax')
g = tflearn.regression(g, optimizer='adam', loss='categorical_crossentropy',
                       learning_rate=0.001)

m = tflearn.SequenceGenerator(g, dictionary=char_idx,
                              seq_maxlen=maxlen,
                              clip_gradients=5.0,
                              checkpoint_path='model_us_cities')

for i in range(40):
    seed = random_sequence_from_string(string_utf8, maxlen)
    m.fit(X, Y, validation_set=0.1, batch_size=128,
          n_epoch=1, run_id='us_cities')
    print("-- TESTING...")
    print("-- Test with temperature of 1.2 --")
    print(m.generate(30, temperature=1.2, seq_seed=seed).encode('utf-8'))
    print("-- Test with temperature of 1.0 --")
    print(m.generate(30, temperature=1.0, seq_seed=seed).encode('utf-8'))
    print("-- Test with temperature of 0.5 --")
    print(m.generate(30, temperature=0.5, seq_seed=seed).encode('utf-8'))

咱們看下第一輪訓練完生成的城市名：

t and Shoot
Cuthbertd
Lettfrecv
El
Ceoneel Sutd
Sa

第二輪：

stle
Finchford
Finch Dasthond
madloogd
Wlaycoyarfw

第三輪：

averal
Cape Carteret
Acbiropa Heowar Sor Dittoy
Do

第十輪：

hoenchen
Schofield
Stcojos
Schabell
StcaKnerum Cri

跨後端高層API - Keras，生成尼采的文章

Keras是能夠跨Tensorflow，微軟的CNTK等多種後端的API。
能夠經過

pip install keras

來安裝keras。咱們安裝了Tensorflow以後，Keras會選用Tensorflow來作它的後端。

咱們也看下Keras上文本生成的例子。官方例子是用來生成尼采的句子。

核心語句就6句話：

model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

下面是完整的代碼，你們跑來玩玩吧。若是對尼采不感興趣，也能夠換成別的文章。不過請注意，正如註釋中所說的，文本隨便換，可是要保持在10萬字符以上。最好是100萬字符以上。

'''Example script to generate text from Nietzsche's writings.
At least 20 epochs are required before the generated text
starts sounding coherent.
It is recommended to run this script on GPU, as recurrent
networks are quite computationally intensive.
If you try this script on new data, make sure your corpus
has at least ~100k characters. ~1M is better.
'''

from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
import numpy as np
import random
import sys
import io

path = get_file('nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1


# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)


def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def on_epoch_end(epoch, logs):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=60,
          callbacks=[print_callback])

本文做者：lusing

閱讀原文

本文爲雲棲社區原創內容，未經容許不得轉載。