Dynamic attention in tensorflow

時間 2019-12-04

標籤 dynamic attention tensorflow 简体版

原文原文鏈接

新代碼在contrib\seq2seq\python\ops\attention_decoder_fn.py php

和以前代碼相比再也不採用conv的方式來計算乘，直接使用乘法和linearpython

給出了兩種attention的實現傳統的"bahdanau": additive (Bahdanau et al., ICLR'2015) Neural Machine Translation by Jointly Learning to Align and Translategit

以及"luong": multiplicative (Luong et al., EMNLP'2015) Effective Approaches to Attention-based Neural Machine Translationweb

這裏以 bahdanau爲例 fetch

仍是按照 Grammar as a Foreign Language的公式 ui

對應代碼裏面 spa

將input encoder outputs 也就是輸入的attention states做爲 attention valuescode

也就是在prepare_attention中 orm

attention_values = attention_states blog

那麼attention keys 對應 W_1h_i的部分，採用linear來實現

attention_keys = layers.linear(

attention_states, num_units, biases_initializer=None, scope=scope)

在建立score function的

_create_attention_score_fn 中完整定義了計算過程

這裏去掉luong的實現部分僅僅看bahdanau部分

with variable_scope.variable_scope(name, reuse=reuse):

if attention_option == "bahdanau":

#這裏對應第一個公式最右面 query_w對應W_2, query是對應d_t

query_w = variable_scope.get_variable(

"attnW", [num_units, num_units], dtype=dtype)

#對應第一個公式最左側的v

score_v = variable_scope.get_variable("attnV", [num_units], dtype=dtype)

def attention_score_fn(query, keys, values):

"""Put attention masks on attention_values using attention_keys and query.

Args:

query: A Tensor of shape [batch_size, num_units].

keys: A Tensor of shape [batch_size, attention_length, num_units].

values: A Tensor of shape [batch_size, attention_length, num_units].

Returns:

context_vector: A Tensor of shape [batch_size, num_units].

Raises:

ValueError: if attention_option is neither "luong" or "bahdanau".

"""

if attention_option == "bahdanau":

# transform query W_2*d_t

query = math_ops.matmul(query, query_w)

# reshape query: [batch_size, 1, num_units]

query = array_ops.reshape(query, [-1, 1, num_units])

# attn_fun 對應第一個公式的最左側結果(=左側) math_ops.reduce_sum(v * math_ops.tanh(keys + query), [2]) * + reduce_sum操做便是dot操做

scores = _attn_add_fun(score_v, keys, query)

# Compute alignment weights

# scores: [batch_size, length]

# alignments: [batch_size, length]

# TODO(thangluong): not normalize over padding positions.

#對應第二個公式計算softmax結果

alignments = nn_ops.softmax(scores)

# Now calculate the attention-weighted vector.

alignments = array_ops.expand_dims(alignments, 2)

#利用softmax獲得的權重計算attention向量的加權加和

context_vector = math_ops.reduce_sum(alignments * values, [1])

context_vector.set_shape([None, num_units])

#context_vector即對應第三個公式 =的左側

return context_vector

再看下計算出contenxt_vector以後的使用，這個方法正如論文中所說也和以前舊代碼基本一致

也就是說將context和query進行concat以後經過linear映射依然獲得num_units的長度做爲attention

def _create_attention_construct_fn(name, num_units, attention_score_fn, reuse):

"""Function to compute attention vectors.

Args:

name: to label variables.

num_units: hidden state dimension.

attention_score_fn: to compute similarity between key and target states.

reuse: whether to reuse variable scope.

Returns:

attention_construct_fn: to build attention states.

"""

with variable_scope.variable_scope(name, reuse=reuse) as scope:

def construct_fn(attention_query, attention_keys, attention_values):

context = attention_score_fn(attention_query, attention_keys,

attention_values)

concat_input = array_ops.concat([attention_query, context], 1)

attention = layers.linear(

concat_input, num_units, biases_initializer=None, scope=scope)

return attention

return construct_fn

最終的使用，cell_output就是attention，而next_input是cell_input和attention的concat

# construct attention

attention = attention_construct_fn(cell_output, attention_keys,

attention_values)

cell_output = attention

# argmax decoder

cell_output = output_fn(cell_output) # logits

next_input_id = math_ops.cast(

math_ops.argmax(cell_output, 1), dtype=dtype)

done = math_ops.equal(next_input_id, end_of_sequence_id)

cell_input = array_ops.gather(embeddings, next_input_id)

# combine cell_input and attention

next_input = array_ops.concat([cell_input, attention], 1)

1. Dynamic seq2seq in tensorflow
2. 可視化展現attention(seq2seq with attention in tensorflow)
3. Tensorflow學習——Attention
4. 論文筆記-AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based
5. Dynamic sql in Mysql
6. Dynamic Clock in Terminal.
7. Attention in CV
8. Attention Model in NLP
9. Attention in NLP
10. Attention in RNN
更多相關文章...
• SQL IN 操作符 - SQL 教程
• Swift for-in 循環 - Swift 教程
• JDK13 GA發佈：5大特性解讀
• C# 中 foreach 遍歷的用法

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。