天然語言處理,語音處理、文本處理。語音識別(speech recognition),讓計算機可以「聽懂」人類語音,語音的文字信息「提取」。html
日本富國生命保險公司花170萬美圓安裝人工智能系統,客戶語言轉換文本,分析詞正面或負面。智能客服是人工能智能公司研究重點。循環神經網絡(recurrent neural network,RNN)模型。python
模型選擇。每個矩形是一個向量,箭頭表示函數。最下面一行輸入向量,最上面一行輸出向量,中間一行RNN狀態。一對一,沒用RNN,如Vanilla模型,固定大小輸入到固定大小輸出(圖像分類)。一對多,序列輸出,圖片描述,輸入一張圖片輸出一段文字序列,CNN、RNN結合,圖像、語言結合。多對一,序列輸入,情感分析,輸入一段文字,分類積極、消極情感,如淘寶商品評論分類,用LSTM。多對多,異步序列輸入、序列輸出,機器翻譯,如RNN讀取英文語句,以法語形式輸出。多對多,同步序列輸入、序列輸出,視頻分類,視頻每幀打標記。中間RNN狀態部分固定,可屢次使用,不需對序列長度預先約束。Andrej Karpathy《The Unreasonable Effectiveness of Recurrent Neural Networks》。http://karpathy.github.io/201... 。天然語言處理,語音合成(文字生成語音)、語單識別、聲紋識別(聲紋鑑權)、文本處理(分詞、情感分析、文本挖掘)。git
英文數字語音識別。https://github.com/pannous/te... 。20行Python代碼建立超簡單語音識別器。LSTM循環神經網絡,TFLearn訓練英文數字口語數據集。spoken numbers pcm數據集 http://pannous.net/spoken_num... 。多人閱讀0~9數字英文音頻,分男女聲,一段音頻(wav文件)只有一個數字對應英文聲音。標識方法{數字}_人名_xxx。github
定義輸入數據,預處理數據。語音處理成矩陣形式。梅爾頻率倒譜系數(Mel frequency cepstral coefficents, MFCC)特徵向量。語音分幀、取對數、逆矩陣,生成MFCC表明語音特徵。小程序
定義網絡模型。LSTM模型。微信
訓練模型,並存儲模型。網絡
預測模型。任意輸入一個語音文件,預測。session
語音識別,可用在智能輸入法、會議快速錄入、語音控制系統、智能家居領域。架構
#!/usr/bin/env python #!/usr/local/bin/python # -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import import tflearn import speech_data learning_rate = 0.0001 training_iters = 300000 # steps 迭代次數 batch_size = 64 width = 20 # mfcc features MFCC特徵 height = 80 # (max) length of utterance 最大發音長度 classes = 10 # digits 數字類別 batch = word_batch = speech_data.mfcc_batch_generator(batch_size) # 生成每一批MFCC語音 X, Y = next(batch) # train, test, _ = ,X trainX, trainY = X, Y testX, testY = X, Y #overfit for now # Data preprocessing # Sequence padding # trainX = pad_sequences(trainX, maxlen=100, value=0.) # testX = pad_sequences(testX, maxlen=100, value=0.) # # Converting labels to binary vectors # trainY = to_categorical(trainY, nb_classes=2) # testY = to_categorical(testY, nb_classes=2) # Network building # LSTM模型 net = tflearn.input_data([None, width, height]) # net = tflearn.embedding(net, input_dim=10000, output_dim=128) net = tflearn.lstm(net, 128, dropout=0.8) net = tflearn.fully_connected(net, classes, activation='softmax') net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='categorical_crossentropy') # Training model = tflearn.DNN(net, tensorboard_verbose=0) model.load("tflearn.lstm.model") while 1: #training_iters model.fit(trainX, trainY, n_epoch=100, validation_set=(testX, testY), show_metric=True, batch_size=batch_size) _y=model.predict(X) model.save("tflearn.lstm.model") print (_y) print (y)
智能聊天機器人。將來方向「天然語言人機交互」。蘋果Siri、微軟Cortana和小冰、Google Now、百度度祕、亞馬遜藍牙音箱Amazon Echo內置語音助手Alexa、Facebook 語音助手M。經過和用戶「語音機器人」對話,引導用戶到對應服務。從此智能硬件、智能家居嵌入式應用。
智能聊天機器人3代技術。第一代特徵工程,大量邏輯判斷。第二代檢索庫,給定問題、聊天,從檢索庫找到與已有答案最匹配答案。第三代深度學習,seq2seq+Attention模型,大量訓練,根據輸入生成輸出。app
seq2seq+Attention模型原理、構建方法。翻譯模型,把一個序列翻譯成另外一個序列。兩個RNNLM,一個做編碼器,一個解碼器,組成RNN編碼器-解碼器。文本處理領域,經常使用編碼器-解碼器(encoder-decoder)框架。輸入->編碼器->語義編碼C->解碼器->輸出。適合處理上下文(context)生成一個目標(target)通用處理模型。一個句子對<X,Y>,輸入給定句子X,經過編碼器-解碼器框架生成目標句子Y。X、Y能夠不一樣語言,機器翻譯。X、Y是對話問句答句,聊天機器人。X、Y能夠是圖片和對應描述,看圖說話。
X由x1、x2等單詞序列組成,Y由y1、y2等單詞序列組成。編碼器編碼輸入X,生成中間語義編碼C,解碼器解碼中間語義編碼C,每一個i時刻結合已生成y1、y2……yi-1歷史信息生成Yi。生成句子每一個詞采用中間語義編碼相同 C。短句子貼切,長句子不合語義。
實際實現聊天系統,編碼器和解碼器採用RNN模型、LSTM模型。句子長度超過30,LSTM模型效果急劇降低,引入Attention模型,長句子提高系統效果。Attention機制,人在作一件事情,專一作這件事,忽略周圍其餘事。源句子中對生成句子重要關鍵詞權重提升,產生更準確應答。增長Attention模型編碼器-解碼器模型框架:輸入->編碼器->語義編碼C1、C2、C3->解碼器->輸出Y一、Y二、Y3。中間語義編碼Ci不斷變化,產生更準確Yi。
最佳實踐。https://github.com/suriyadeep... ,依賴TensorFlow 0.12.1環境。康奈爾大學 Corpus數據集(Cornell Movie Dialogs Corpus) http://www.cs.cornell.edu/~cr... 。600 部電影對白。
處理聊天數據。
先把數據集整理成「問」、「答」文件,生成.enc(問句)、.dec(答句)文件。test.dec #測試集答句,test.enc #測試集問句,train.dec #訓練集答句,train.enc #訓練集問句。
建立詞彙表,問句、答句轉換成對應id形式。詞彙表文件2萬個詞彙。vocab20000.dec #答句詞彙表,vocab20000.enc #問句詞彙表。_GO、_EOS、_UNK、_PAD seq2seq模型特殊標記,填充標記對話。_GO標記對話開始。_EOS標記對話結束。_UNK標記未出現詞彙表字符,替換稀有詞彙。_PAD填充序列,保證批次序列長度相同。轉換成ids文件,test.enc.ids20000、train.dec.ids20000、train.enc.ids20000。問句、答句轉換ids文件,每行是一個問句或答句,每行每一個id表明問句或答句對應位置詞。
採用編碼器-解碼器框架訓練。
定義訓練參數。seq2seq.ini。
[strings] # Mode : train, test, serve 模式 mode = train train_enc = data/train.enc train_dec = data/train.dec test_enc = data/test.enc test_dec = data/test.dec # folder where checkpoints, vocabulary, temporary data will be stored # 模型文件和詞彙表存儲路徑 working_directory = working_dir/ [ints] # vocabulary size # 詞彙表大小 # 20,000 is a reasonable size enc_vocab_size = 20000 dec_vocab_size = 20000 # number of LSTM layers : 1/2/3 # LSTM層數 num_layers = 3 # typical options : 128, 256, 512, 1024 每層大小,可取值 layer_size = 256 # dataset size limit; typically none : no limit max_train_data_size = 0 batch_size = 64 # steps per checkpoint # 每多少次迭代存儲一次模型 # Note : At a checkpoint, models parameters are saved, model is evaluated # and results are printed steps_per_checkpoint = 300 [floats] learning_rate = 0.5 # 學習速率 learning_rate_decay_factor = 0.99 # 學習速率降低係數 max_gradient_norm = 5.0
定義網絡模型 seq2seq。seq2seq_model.py。TensorFlow 0.12。定義seq2seq+Attention模型類,3個函數。《Grammar as a Foreign Language》 http://arxiv.org/abs/1412.7499 。初始化模型函數(__init__)、訓練模型函數(step)、獲取下一批次訓練數據函數(get_batch)。
from __future__ import absolute_import from __future__ import division from __future__ import print_function import random import numpy as np from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf from tensorflow.models.rnn.translate import data_utils class Seq2SeqModel(object): def __init__(self, source_vocab_size, target_vocab_size, buckets, size, num_layers, max_gradient_norm, batch_size, learning_rate, learning_rate_decay_factor, use_lstm=False, num_samples=512, forward_only=False): """ 構建模型 Args: 參數 source_vocab_size: size of the source vocabulary. 問句詞彙表大小 target_vocab_size: size of the target vocabulary.答句詞彙表大小 buckets: a list of pairs (I, O), where I specifies maximum input length that will be processed in that bucket, and O specifies maximum output length. Training instances that have inputs longer than I or outputs longer than O will be pushed to the next bucket and padded accordingly. We assume that the list is sorted, e.g., [(2, 4), (8, 16)]. 其中I指定最大輸入長度,O指定最大輸出長度 size: number of units in each layer of the model.每層神經元數量 num_layers: number of layers in the model.模型層數 max_gradient_norm: gradients will be clipped to maximally this norm.梯度被削減到最大規範 batch_size: the size of the batches used during training; the model construction is independent of batch_size, so it can be changed after initialization if this is convenient, e.g., for decoding.批次大小。訓練、預測批次大小,可不一樣 learning_rate: learning rate to start with.學習速率 learning_rate_decay_factor: decay learning rate by this much when needed.調整學習速率 use_lstm: if true, we use LSTM cells instead of GRU cells.使用LSTM 單元代替GRU單元 num_samples: number of samples for sampled softmax.使用softmax樣本數 forward_only: if set, we do not construct the backward pass in the model.是否僅構建前向傳播 """ self.source_vocab_size = source_vocab_size self.target_vocab_size = target_vocab_size self.buckets = buckets self.batch_size = batch_size self.learning_rate = tf.Variable(float(learning_rate), trainable=False) self.learning_rate_decay_op = self.learning_rate.assign( self.learning_rate * learning_rate_decay_factor) self.global_step = tf.Variable(0, trainable=False) # If we use sampled softmax, we need an output projection. output_projection = None softmax_loss_function = None # Sampled softmax only makes sense if we sample less than vocabulary size. # 若是樣本量比詞彙表量小,用抽樣softmax if num_samples > 0 and num_samples < self.target_vocab_size: w = tf.get_variable("proj_w", [size, self.target_vocab_size]) w_t = tf.transpose(w) b = tf.get_variable("proj_b", [self.target_vocab_size]) output_projection = (w, b) def sampled_loss(inputs, labels): labels = tf.reshape(labels, [-1, 1]) return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, num_samples, self.target_vocab_size) softmax_loss_function = sampled_loss # Create the internal multi-layer cell for our RNN. # 構建RNN single_cell = tf.nn.rnn_cell.GRUCell(size) if use_lstm: single_cell = tf.nn.rnn_cell.BasicLSTMCell(size) cell = single_cell cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.5) if num_layers > 1: cell = tf.nn.rnn_cell.MultiRNNCell([single_cell] * num_layers) # The seq2seq function: we use embedding for the input and attention. # Attention模型 def seq2seq_f(encoder_inputs, decoder_inputs, do_decode): return tf.nn.seq2seq.embedding_attention_seq2seq( encoder_inputs, decoder_inputs, cell, num_encoder_symbols=source_vocab_size, num_decoder_symbols=target_vocab_size, embedding_size=size, output_projection=output_projection, feed_previous=do_decode) # Feeds for inputs. # 給模型填充數據 self.encoder_inputs = [] self.decoder_inputs = [] self.target_weights = [] for i in xrange(buckets[-1][0]): # Last bucket is the biggest one. self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="encoder{0}".format(i))) for i in xrange(buckets[-1][1] + 1): self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None], name="decoder{0}".format(i))) self.target_weights.append(tf.placeholder(tf.float32, shape=[None], name="weight{0}".format(i))) # Our targets are decoder inputs shifted by one. # targets值是解碼器偏移1位 targets = [self.decoder_inputs[i + 1] for i in xrange(len(self.decoder_inputs) - 1)] # Training outputs and losses. # 訓練模型輸出 if forward_only: self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets( self.encoder_inputs, self.decoder_inputs, targets, self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True), softmax_loss_function=softmax_loss_function) # If we use output projection, we need to project outputs for decoding. if output_projection is not None: for b in xrange(len(buckets)): self.outputs[b] = [ tf.matmul(output, output_projection[0]) + output_projection[1] for output in self.outputs[b] ] else: self.outputs, self.losses = tf.nn.seq2seq.model_with_buckets( self.encoder_inputs, self.decoder_inputs, targets, self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, False), softmax_loss_function=softmax_loss_function) # Gradients and SGD update operation for training the model. # 訓練模型,更新梯度 params = tf.trainable_variables() if not forward_only: self.gradient_norms = [] self.updates = [] opt = tf.train.AdamOptimizer() for b in xrange(len(buckets)): gradients = tf.gradients(self.losses[b], params) clipped_gradients, norm = tf.clip_by_global_norm(gradients, max_gradient_norm) self.gradient_norms.append(norm) self.updates.append(opt.apply_gradients( zip(clipped_gradients, params), global_step=self.global_step)) self.saver = tf.train.Saver(tf.global_variables()) def step(self, session, encoder_inputs, decoder_inputs, target_weights, bucket_id, forward_only): """Run a step of the model feeding the given inputs. 定義運行模型的每一步 Args: session: tensorflow session to use. encoder_inputs: list of numpy int vectors to feed as encoder inputs.問句向量序列 decoder_inputs: list of numpy int vectors to feed as decoder inputs.答句向量序列 target_weights: list of numpy float vectors to feed as target weights. bucket_id: which bucket of the model to use.輸入bucket_id forward_only: whether to do the backward step or only forward.是否只作前向傳播 Returns: A triple consisting of gradient norm (or None if we did not do backward), average perplexity, and the outputs. Raises: ValueError: if length of encoder_inputs, decoder_inputs, or target_weights disagrees with bucket size for the specified bucket_id. """ # Check if the sizes match. encoder_size, decoder_size = self.buckets[bucket_id] if len(encoder_inputs) != encoder_size: raise ValueError("Encoder length must be equal to the one in bucket," " %d != %d." % (len(encoder_inputs), encoder_size)) if len(decoder_inputs) != decoder_size: raise ValueError("Decoder length must be equal to the one in bucket," " %d != %d." % (len(decoder_inputs), decoder_size)) if len(target_weights) != decoder_size: raise ValueError("Weights length must be equal to the one in bucket," " %d != %d." % (len(target_weights), decoder_size)) # Input feed: encoder inputs, decoder inputs, target_weights, as provided. # 輸入填充 input_feed = {} for l in xrange(encoder_size): input_feed[self.encoder_inputs[l].name] = encoder_inputs[l] for l in xrange(decoder_size): input_feed[self.decoder_inputs[l].name] = decoder_inputs[l] input_feed[self.target_weights[l].name] = target_weights[l] # Since our targets are decoder inputs shifted by one, we need one more. last_target = self.decoder_inputs[decoder_size].name input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32) # Output feed: depends on whether we do a backward step or not. # 輸出填充:與是否有後向傳播有關 if not forward_only: output_feed = [self.updates[bucket_id], # Update Op that does SGD. self.gradient_norms[bucket_id], # Gradient norm. self.losses[bucket_id]] # Loss for this batch. else: output_feed = [self.losses[bucket_id]] # Loss for this batch. for l in xrange(decoder_size): # Output logits. output_feed.append(self.outputs[bucket_id][l]) outputs = session.run(output_feed, input_feed) if not forward_only: return outputs[1], outputs[2], None # Gradient norm, loss, no outputs.有後向傳播輸出,梯度、損失值、None else: return None, outputs[0], outputs[1:] # No gradient norm, loss, outputs.僅有前向傳播輸出,None,損失值,None def get_batch(self, data, bucket_id): """ 從指定桶獲取一個批次隨機數據,在訓練每步(step)使用 Args:參數 data: a tuple of size len(self.buckets) in which each element contains lists of pairs of input and output data that we use to create a batch.長度爲(self.buckets)元組,每一個元素包含建立批次輸入、輸出數據對列表 bucket_id: integer, which bucket to get the batch for.整數,從哪一個bucket獲取批次 Returns:返回 The triple (encoder_inputs, decoder_inputs, target_weights) for the constructed batch that has the proper format to call step(...) later.一個包含三項元組(encoder_inputs, decoder_inputs, target_weights) """ encoder_size, decoder_size = self.buckets[bucket_id] encoder_inputs, decoder_inputs = [], [] # Get a random batch of encoder and decoder inputs from data, # pad them if needed, reverse encoder inputs and add GO to decoder. for _ in xrange(self.batch_size): encoder_input, decoder_input = random.choice(data[bucket_id]) # Encoder inputs are padded and then reversed. encoder_pad = [data_utils.PAD_ID] * (encoder_size - len(encoder_input)) encoder_inputs.append(list(reversed(encoder_input + encoder_pad))) # Decoder inputs get an extra "GO" symbol, and are padded then. decoder_pad_size = decoder_size - len(decoder_input) - 1 decoder_inputs.append([data_utils.GO_ID] + decoder_input + [data_utils.PAD_ID] * decoder_pad_size) # Now we create batch-major vectors from the data selected above. batch_encoder_inputs, batch_decoder_inputs, batch_weights = [], [], [] # Batch encoder inputs are just re-indexed encoder_inputs. for length_idx in xrange(encoder_size): batch_encoder_inputs.append( np.array([encoder_inputs[batch_idx][length_idx] for batch_idx in xrange(self.batch_size)], dtype=np.int32)) # Batch decoder inputs are re-indexed decoder_inputs, we create weights. for length_idx in xrange(decoder_size): batch_decoder_inputs.append( np.array([decoder_inputs[batch_idx][length_idx] for batch_idx in xrange(self.batch_size)], dtype=np.int32)) # Create target_weights to be 0 for targets that are padding. batch_weight = np.ones(self.batch_size, dtype=np.float32) for batch_idx in xrange(self.batch_size): # We set weight to 0 if the corresponding target is a PAD symbol. # The corresponding target is decoder_input shifted by 1 forward. if length_idx < decoder_size - 1: target = decoder_inputs[batch_idx][length_idx + 1] if length_idx == decoder_size - 1 or target == data_utils.PAD_ID: batch_weight[batch_idx] = 0.0 batch_weights.append(batch_weight) return batch_encoder_inputs, batch_decoder_inputs, batch_weights
訓練模型。修改seq2seq.ini文件mode值「train」,execute.py訓練。
驗證模型。修改seq2seq.ini文件mode值「test」,execute.py測試。
from __future__ import absolute_import from __future__ import division from __future__ import print_function import math import os import random import sys import time import numpy as np from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf import data_utils import seq2seq_model try: from ConfigParser import SafeConfigParser except: from configparser import SafeConfigParser # In Python 3, ConfigParser has been renamed to configparser for PEP 8 compliance. gConfig = {} def get_config(config_file='seq2seq.ini'): parser = SafeConfigParser() parser.read(config_file) # get the ints, floats and strings _conf_ints = [ (key, int(value)) for key,value in parser.items('ints') ] _conf_floats = [ (key, float(value)) for key,value in parser.items('floats') ] _conf_strings = [ (key, str(value)) for key,value in parser.items('strings') ] return dict(_conf_ints + _conf_floats + _conf_strings) # We use a number of buckets and pad to the closest one for efficiency. # See seq2seq_model.Seq2SeqModel for details of how they work. _buckets = [(5, 10), (10, 15), (20, 25), (40, 50)] def read_data(source_path, target_path, max_size=None): """Read data from source and target files and put into buckets. Args: source_path: path to the files with token-ids for the source language. target_path: path to the file with token-ids for the target language; it must be aligned with the source file: n-th line contains the desired output for n-th line from the source_path. max_size: maximum number of lines to read, all other will be ignored; if 0 or None, data files will be read completely (no limit). Returns: data_set: a list of length len(_buckets); data_set[n] contains a list of (source, target) pairs read from the provided data files that fit into the n-th bucket, i.e., such that len(source) < _buckets[n][0] and len(target) < _buckets[n][1]; source and target are lists of token-ids. """ data_set = [[] for _ in _buckets] with tf.gfile.GFile(source_path, mode="r") as source_file: with tf.gfile.GFile(target_path, mode="r") as target_file: source, target = source_file.readline(), target_file.readline() counter = 0 while source and target and (not max_size or counter < max_size): counter += 1 if counter % 100000 == 0: print(" reading data line %d" % counter) sys.stdout.flush() source_ids = [int(x) for x in source.split()] target_ids = [int(x) for x in target.split()] target_ids.append(data_utils.EOS_ID) for bucket_id, (source_size, target_size) in enumerate(_buckets): if len(source_ids) < source_size and len(target_ids) < target_size: data_set[bucket_id].append([source_ids, target_ids]) break source, target = source_file.readline(), target_file.readline() return data_set def create_model(session, forward_only): """Create model and initialize or load parameters""" model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocab_size'], gConfig['dec_vocab_size'], _buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['max_gradient_norm'], gConfig['batch_size'], gConfig['learning_rate'], gConfig['learning_rate_decay_factor'], forward_only=forward_only) if 'pretrained_model' in gConfig: model.saver.restore(session,gConfig['pretrained_model']) return model ckpt = tf.train.get_checkpoint_state(gConfig['working_directory']) if ckpt and ckpt.model_checkpoint_path: print("Reading model parameters from %s" % ckpt.model_checkpoint_path) model.saver.restore(session, ckpt.model_checkpoint_path) else: print("Created model with fresh parameters.") session.run(tf.global_variables_initializer()) return model def train(): # prepare dataset # 準備數據集 print("Preparing data in %s" % gConfig['working_directory']) enc_train, dec_train, enc_dev, dec_dev, _, _ = data_utils.prepare_custom_data(gConfig['working_directory'],gConfig['train_enc'],gConfig['train_dec'],gConfig['test_enc'],gConfig['test_dec'],gConfig['enc_vocab_size'],gConfig['dec_vocab_size']) # setup config to use BFC allocator config = tf.ConfigProto() config.gpu_options.allocator_type = 'BFC' with tf.Session(config=config) as sess: # Create model. # 構建模型 print("Creating %d layers of %d units." % (gConfig['num_layers'], gConfig['layer_size'])) model = create_model(sess, False) # Read data into buckets and compute their sizes. # 把數據讀入桶(bucket)中,計算桶大小 print ("Reading development and training data (limit: %d)." % gConfig['max_train_data_size']) dev_set = read_data(enc_dev, dec_dev) train_set = read_data(enc_train, dec_train, gConfig['max_train_data_size']) train_bucket_sizes = [len(train_set[b]) for b in xrange(len(_buckets))] train_total_size = float(sum(train_bucket_sizes)) # A bucket scale is a list of increasing numbers from 0 to 1 that we'll use # to select a bucket. Length of [scale[i], scale[i+1]] is proportional to # the size if i-th training bucket, as used later. train_buckets_scale = [sum(train_bucket_sizes[:i + 1]) / train_total_size for i in xrange(len(train_bucket_sizes))] # This is the training loop. # 開始訓練循環 step_time, loss = 0.0, 0.0 current_step = 0 previous_losses = [] while True: # Choose a bucket according to data distribution. We pick a random number # in [0, 1] and use the corresponding interval in train_buckets_scale. # 隨機生成一個0-1數,在生成bucket_id中使用 random_number_01 = np.random.random_sample() bucket_id = min([i for i in xrange(len(train_buckets_scale)) if train_buckets_scale[i] > random_number_01]) # Get a batch and make a step. # 獲取一個批次數據,進行一步訓練 start_time = time.time() encoder_inputs, decoder_inputs, target_weights = model.get_batch( train_set, bucket_id) _, step_loss, _ = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket_id, False) step_time += (time.time() - start_time) / gConfig['steps_per_checkpoint'] loss += step_loss / gConfig['steps_per_checkpoint'] current_step += 1 # Once in a while, we save checkpoint, print statistics, and run evals. # 保存檢查點文件,打印統計數據 if current_step % gConfig['steps_per_checkpoint'] == 0: # Print statistics for the previous epoch. perplexity = math.exp(loss) if loss < 300 else float('inf') print ("global step %d learning rate %.4f step-time %.2f perplexity " "%.2f" % (model.global_step.eval(), model.learning_rate.eval(), step_time, perplexity)) # Decrease learning rate if no improvement was seen over last 3 times. # 若是損失值在最近3次內沒有再下降,減少學習率 if len(previous_losses) > 2 and loss > max(previous_losses[-3:]): sess.run(model.learning_rate_decay_op) previous_losses.append(loss) # Save checkpoint and zero timer and loss. # 保存檢查點文件,計數器、損失值歸零 checkpoint_path = os.path.join(gConfig['working_directory'], "seq2seq.ckpt") model.saver.save(sess, checkpoint_path, global_step=model.global_step) step_time, loss = 0.0, 0.0 # Run evals on development set and print their perplexity. for bucket_id in xrange(len(_buckets)): if len(dev_set[bucket_id]) == 0: print(" eval: empty bucket %d" % (bucket_id)) continue encoder_inputs, decoder_inputs, target_weights = model.get_batch( dev_set, bucket_id) _, eval_loss, _ = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket_id, True) eval_ppx = math.exp(eval_loss) if eval_loss < 300 else float('inf') print(" eval: bucket %d perplexity %.2f" % (bucket_id, eval_ppx)) sys.stdout.flush() def decode(): with tf.Session() as sess: # Create model and load parameters. # 創建模型,定義超參數batch_size model = create_model(sess, True) model.batch_size = 1 # We decode one sentence at a time.一次只解碼一個句子 # Load vocabularies. # 加載詞彙表文件 enc_vocab_path = os.path.join(gConfig['working_directory'],"vocab%d.enc" % gConfig['enc_vocab_size']) dec_vocab_path = os.path.join(gConfig['working_directory'],"vocab%d.dec" % gConfig['dec_vocab_size']) enc_vocab, _ = data_utils.initialize_vocabulary(enc_vocab_path) _, rev_dec_vocab = data_utils.initialize_vocabulary(dec_vocab_path) # Decode from standard input. # 對標準輸入句子解碼 sys.stdout.write("> ") sys.stdout.flush() sentence = sys.stdin.readline() while sentence: # Get token-ids for the input sentence. # 獲得輸入句子的token-ids token_ids = data_utils.sentence_to_token_ids(tf.compat.as_bytes(sentence), enc_vocab) # Which bucket does it belong to? # 計算token_ids屬於哪一個桶(bucket) bucket_id = min([b for b in xrange(len(_buckets)) if _buckets[b][0] > len(token_ids)]) # Get a 1-element batch to feed the sentence to the model. # 句子送入模型 encoder_inputs, decoder_inputs, target_weights = model.get_batch( {bucket_id: [(token_ids, [])]}, bucket_id) # Get output logits for the sentence. _, _, output_logits = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket_id, True) # This is a greedy decoder - outputs are just argmaxes of output_logits. # 貪心解碼器,輸出output_logits argmaxes outputs = [int(np.argmax(logit, axis=1)) for logit in output_logits] # If there is an EOS symbol in outputs, cut them at that point. if data_utils.EOS_ID in outputs: outputs = outputs[:outputs.index(data_utils.EOS_ID)] # Print out French sentence corresponding to outputs. # 打印與輸出句子對應法語句子 print(" ".join([tf.compat.as_str(rev_dec_vocab[output]) for output in outputs])) print("> ", end="") sys.stdout.flush() sentence = sys.stdin.readline() def self_test(): """Test the translation model.""" with tf.Session() as sess: print("Self-test for neural translation model.") # Create model with vocabularies of 10, 2 small buckets, 2 layers of 32. model = seq2seq_model.Seq2SeqModel(10, 10, [(3, 3), (6, 6)], 32, 2, 5.0, 32, 0.3, 0.99, num_samples=8) sess.run(tf.initialize_all_variables()) # Fake data set for both the (3, 3) and (6, 6) bucket. data_set = ([([1, 1], [2, 2]), ([3, 3], [4]), ([5], [6])], [([1, 1, 1, 1, 1], [2, 2, 2, 2, 2]), ([3, 3, 3], [5, 6])]) for _ in xrange(5): # Train the fake model for 5 steps. bucket_id = random.choice([0, 1]) encoder_inputs, decoder_inputs, target_weights = model.get_batch( data_set, bucket_id) model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket_id, False) def init_session(sess, conf='seq2seq.ini'): global gConfig gConfig = get_config(conf) # Create model and load parameters. model = create_model(sess, True) model.batch_size = 1 # We decode one sentence at a time. # Load vocabularies. enc_vocab_path = os.path.join(gConfig['working_directory'],"vocab%d.enc" % gConfig['enc_vocab_size']) dec_vocab_path = os.path.join(gConfig['working_directory'],"vocab%d.dec" % gConfig['dec_vocab_size']) enc_vocab, _ = data_utils.initialize_vocabulary(enc_vocab_path) _, rev_dec_vocab = data_utils.initialize_vocabulary(dec_vocab_path) return sess, model, enc_vocab, rev_dec_vocab def decode_line(sess, model, enc_vocab, rev_dec_vocab, sentence): # Get token-ids for the input sentence. token_ids = data_utils.sentence_to_token_ids(tf.compat.as_bytes(sentence), enc_vocab) # Which bucket does it belong to? bucket_id = min([b for b in xrange(len(_buckets)) if _buckets[b][0] > len(token_ids)]) # Get a 1-element batch to feed the sentence to the model. encoder_inputs, decoder_inputs, target_weights = model.get_batch({bucket_id: [(token_ids, [])]}, bucket_id) # Get output logits for the sentence. _, _, output_logits = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket_id, True) # This is a greedy decoder - outputs are just argmaxes of output_logits. outputs = [int(np.argmax(logit, axis=1)) for logit in output_logits] # If there is an EOS symbol in outputs, cut them at that point. if data_utils.EOS_ID in outputs: outputs = outputs[:outputs.index(data_utils.EOS_ID)] return " ".join([tf.compat.as_str(rev_dec_vocab[output]) for output in outputs]) if __name__ == '__main__': if len(sys.argv) - 1: gConfig = get_config(sys.argv[1]) else: # get configuration from seq2seq.ini gConfig = get_config() print('\n>> Mode : %s\n' %(gConfig['mode'])) if gConfig['mode'] == 'train': # start training train() elif gConfig['mode'] == 'test': # interactive decode decode() else: # wrong way to execute "serve" # Use : >> python ui/app.py # uses seq2seq_serve.ini as conf file print('Serve Usage : >> python ui/app.py') print('# uses seq2seq_serve.ini as conf file')
基於文字智能機器人,結合語音識別,產生直接對話機器人。系統架構:
人->語音識別(ASR)->天然語言理解(NLU)->對話管理->天然語言生成(NLG)->語音合成(TTS)->人。《中國人工智能學會通信》2016年第6卷第1期。
圖靈機器人公司,提升對話和語義準確度,提高中文語境智能程度。竹間智能科技,研究記憶、自學習情感機器人,機器人真正理解多模式多渠道信息,高度擬人化迴應,最理想天然語言交流模式交流。騰訊公司,社交對話數據。微信,最龐大天然語言交流語料庫,利用龐大真實數據,結合小程序成爲全部服務入口。
參考資料:
《TensorFlow技術解析與實戰》
歡迎推薦上海機器學習工做機會,個人微信:qingxingfengzi
人工智能工做機會分割線-----------------------------------------
杭州阿里 新零售淘寶基礎架構平臺:移動AI高級專家