Dynamic Memory Networks模型用於文本分類

時間 2019-12-02

標籤 dynamic memory networks 模型用於文本分類简体版

原文原文鏈接

Dynamic Memory Networks模型用於文本分類

模型

模型主要包含四個模塊：提問、回答、記憶存儲、輸入。輸入模塊首先計算問題和輸入得文本向量表示，而後根據問題計算attention，並以此選擇和問題相關的輸入。而後記憶存儲模塊會根據問題和輸入迭代存儲記憶，並以最後的時序向量做爲答案模塊的輸入，答案模塊結合問題和該向量輸出答案。git

以上圖爲例，輸入時8句話，問題是Where is the football？下面分模塊解析求解過程github

輸入模塊

輸入主要採用GRU單元進行編碼，分兩種狀況：一、當輸入是一個句子時，直接喂到GRU單元，輸出句長個向量表示，這時後續的attention會選擇最相關的詞二、當輸入是多個句子時，將句子用特殊標識符鏈接，而後將每一個特殊標識符處的隱藏狀態輸出，這時attention會選擇和句子相關的句子bash

問題模塊

也採用GRU單元編碼，不一樣的是這裏只需輸出最後的隱藏狀態，而輸入模塊需輸出所有的隱藏狀態，熟悉tensorflow的朋友必定知道tf.nn.dynamic_rnn，這個函數的輸出是一個擁有兩個值得矢量，其中一個是RNN得最後隱藏狀態，一個是所有的狀態網絡

記憶存儲模塊

這一塊分兩部分，attention和記憶更新一、attention：attention的計算以下圖，在i次迭代的t時刻參數分別爲輸入向量（8句話的向量表示，迭代過程當中保持不變）、上一時刻的記憶、問題向量表示函數

其中：

這裏利用了多種向量類似性計算方法，並拼接後輸入到兩層神經網絡中計算attention值。ui

二、記憶更新：在第i次迭代，計算以下圖，第0次迭代m的初始值爲q，c~t~每次迭代都保存不變，爲輸入（8個句子）向量，h~t~爲隱藏向量表示，每輪迭代中，將每一個時刻的隱藏向量表示和記憶向量上一時刻的向量放入GRU單元，更新記憶向量m。編碼

在模塊實例圖中，以8個句子和「Where is the football」做爲問題第一次迭代：經過q和input以及m（初始爲q)計算attention，找到和q最相關的句子是s~7~:"John put down the football"，並給予高權重g，以此更新記憶向量m，例如記住‘john’。第二次迭代：經過q和input、m計算attention，此時m記住了「john」，而不是q了，找到的最相關句子爲S~2~和S~6~，它們都包含了「john」，接着繼續更新記憶向量，最後輸出到回答模塊

回答模塊

以下計算表示，a~0~=m（m爲記憶模塊最後的記憶） spa

DMN簡單實現

記憶更新門控單元：這裏用到了Attention based GRU，把g添加到GRU內部。

def gated_gru(self,c_current,h_previous,g_current):
        """ gated gru to get updated hidden state :param c_current: [batch_size,embedding_size] :param h_previous:[batch_size,hidden_size] :param g_current: [batch_size,1] :return h_current: [batch_size,hidden_size] """
        # 1.compute candidate hidden state using GRU.
        h_candidate=self.gru_cell(c_current, h_previous,"gru_candidate_sentence") #[batch_size,hidden_size]
        # 2.combine candidate hidden state and previous hidden state using weight(a gate) to get updated hidden state.
        h_current=tf.multiply(g_current,h_candidate)+tf.multiply(1-g_current,h_previous) #[batch_size,hidden_size]
        return h_current
複製代碼

這部分採用了DMN+中的實現，scala

最後一步和傳統GRU不一樣

所以

attention實現

def attention_mechanism_parallel(self,c_full,m,q,i):
        """ parallel implemtation of gate function given a list of candidate sentence, a query, and previous memory. Input: c_full: candidate fact. shape:[batch_size,story_length,hidden_size] m: previous memory. shape:[batch_size,hidden_size] q: question. shape:[batch_size,hidden_size] Output: a scalar score (in batch). shape:[batch_size,story_length] """
        q=tf.expand_dims(q,axis=1) #[batch_size,1,hidden_size]
        m=tf.expand_dims(m,axis=1) #[batch_size,1,hidden_size]

        # 1.define a large feature vector that captures a variety of similarities between input,memory and question vector: z(c,m,q)
        c_q_elementwise=tf.multiply(c_full,q)          #[batch_size,story_length,hidden_size]
        c_m_elementwise=tf.multiply(c_full,m)          #[batch_size,story_length,hidden_size]
        c_q_minus=tf.abs(tf.subtract(c_full,q))        #[batch_size,story_length,hidden_size]
        c_m_minus=tf.abs(tf.subtract(c_full,m))        #[batch_size,story_length,hidden_size]
        # c_transpose Wq
        c_w_q=self.x1Wx2_parallel(c_full,q,"c_w_q"+str(i))   #[batch_size,story_length,hidden_size]
        c_w_m=self.x1Wx2_parallel(c_full,m,"c_w_m"+str(i))   #[batch_size,story_length,hidden_size]
        # c_transposeWm
        q_tile=tf.tile(q,[1,self.story_length,1])     #[batch_size,story_length,hidden_size]
        m_tile=tf.tile(m,[1,self.story_length,1])     #[batch_size,story_length,hidden_size]
        z=tf.concat([c_full,m_tile,q_tile,c_q_elementwise,c_m_elementwise,c_q_minus,c_m_minus,c_w_q,c_w_m],2) #[batch_size,story_length,hidden_size*9]
        # 2. two layer feed foward
        g=tf.layers.dense(z,self.hidden_size*3,activation=tf.nn.tanh)  #[batch_size,story_length,hidden_size*3]
        g=tf.layers.dense(g,1,activation=tf.nn.sigmoid)                #[batch_size,story_length,1]
        g=tf.squeeze(g,axis=2)                                         #[batch_size,story_length]
        return g
    def x1Wx2_parallel(self,x1,x2,scope):
        """ :param x1: [batch_size,story_length,hidden_size] :param x2: [batch_size,1,hidden_size] :param scope: a string :return: [batch_size,story_length,hidden_size] """
        with tf.variable_scope(scope):
            x1=tf.reshape(x1,shape=(self.batch_size,-1)) #[batch_size,story_length*hidden_size]
            x1_w=tf.layers.dense(x1,self.story_length*self.hidden_size,use_bias=False) #[self.hidden_size, story_length*self.hidden_size]
            x1_w_expand=tf.expand_dims(x1_w,axis=2)     #[batch_size,story_length*self.hidden_size,1]
            x1_w_x2=tf.matmul(x1_w_expand,x2)           #[batch_size,story_length*self.hidden_size,hidden_size]
            x1_w_x2=tf.reshape(x1_w_x2,shape=(self.batch_size,self.story_length,self.hidden_size,self.hidden_size))
            x1_w_x2=tf.reduce_sum(x1_w_x2,axis=3)      #[batch_size,story_length,hidden_size]
            return x1_w_x2
複製代碼