《A Knowledge-Grounded Neural Conversation Model》

時間 2020-06-09

標籤 knowledge grounded neural conversation model 简体版

原文原文鏈接

abstract

如今的大多數模型均可以被應用在閒聊場景下，可是尚未證據代表他們能夠應用在更有用的對話場景下。這篇論文提出了一個知識驅動的，帶有背景知識的神經網絡對話系統，目的是爲了在對話中產生更有意義的回覆。以seq2seq模型爲基礎（傳統的seq2seq只能學習到句子的骨架而不包括有效的信息），用對話歷史和外界的facts去規範回答。模型具備通用性，能夠應用在open-domain。
數據庫

introduction

這個模型不是像傳統的對話系統有明確的任務目標，經過少許數據去訓練在必定的回覆骨架下的預約義好的槽值填充，也不是傳統的沒什麼有用信息的閒聊，它的目標是和用戶一同完成一個目標不是特別明確，可是有信息含量的對話（上述兩種的綜合），外部數據的鏈接來自於網絡文本數據，不是數據庫裏的結構化數據，所以擴充外部數據更容易。這是第一個大規模，徹底數據驅動的充分高效利用外部知識的神經網絡模型。網絡

background

構建徹底數據驅動的轉換模型的主要挑戰是，世界上大多數知識都沒有在任何現有的會話數據集中徹底表示。雖然得益於社交媒體的快速發展，這些數據集（Serban et al.2015）的規模已經大幅增加，但這些數據集仍然遠遠不能和維基百科，Foursquare，Goodreads，或IMDB相比。該問題極大地限制了現有數據驅動的會話模型的發展，由於它們必須如圖1中那樣迴避或偏向地響應，尤爲是對於在會話訓練數據中表現不佳的那些實體。另外一方面，即便包含大多數實體的會話數據可能存在，咱們仍然會面臨挑戰，由於這樣的大型數據集難以應用於模型培訓，而且數據中展現的許多會話模式（例如，對於相似實體）將會冗餘的。論文的方法旨在避免冗餘，並嘗試更好地歸納現有的會話數據，如圖2所示。雖然圖中的對話涉及特定的場所，產品和服務，但會話模式是通常的，一樣適用於其餘實體，這樣的話對於以個新的場景，咱們只須要去擴充咱們的facts庫，而不用從新訓練整個模型。（傳統對話系統會利用預約義好的槽值去填充對話骨架（圖中的加粗文本））

學的是對話行爲，變化的是factsapp

model

根據句子中的關鍵詞在facts庫中檢索和本對話相關的facts。模型的訓練使用multi-task學習，分爲兩部分：dom

任務一：有外部信息的回覆，encoder的輸入是（{f1,f2,...,S},R）
任務二：hi的回覆：how are you之類無有效信息的閒聊回覆，encoder的輸入時（S,R）

分爲兩部分作有三個好處：學習

當只基於對話歷史的encoder，decoder訓練完以後，再進行包括facts的encoder，decoder的訓練時，就能夠進行warm start training，
不一樣數據集靈活應變
若是將任務一的回覆替換爲某一fact（R= fi），這就使得任務一類似於一個自編碼器，進而使得產生的回覆更加有效。more informative

dialog encoder and decoder

encoder和decoder都使用RNN,cell is GRU測試

facts encoder

由memory-network-model以及end-to-end memery networks演化而來（一個實體在對話中被提到了--以後基於用戶輸入和對話歷史對facts進行檢索，進而生成答案）this

datasets

twitter：conversational
（no facts:key to learning the conversational structure or backbone.）
foursquare ：non-conversational
（tip date:comments left by customers about restaurants and other, usually commercial, establishments.）
1M grounded dataset

tip爲foursquare中的詞，handles爲twitter上的對話中對應的多個tip的「slot」,例如在twitter中的某句話，@handle 很便宜，物美價廉。這裏的handle能夠對應的tip有不少，tip = ‘衣服’ or tip = ‘食品’。編碼

輪數爲兩輪的對話，依據foursquare中的tipdata，在twitter中找到相關的handles和其相關的包含foursquare中實體的對話。另外，這個對話中的第一輪包括一個商戶名字的句柄（用@表示）或者和句柄匹配的哈希標籤。由於對話目的是模擬真實用戶之間的對話，因此把那些包含，由用戶使用在foursquare中的句柄生成的回覆，的對話刪掉（也就是說這句話是用戶經過一些詞在軟件中讓軟件生成的，而不是他本身嘴裏說出來的）。lua

grounded conversation dataset: 對於每一個handle，有兩種打分function：
- 基於全部包含這個handle的tips訓練的1-gram LM的困惑度
- 卡方分數，用來測量每一個token包含的handle相關的內容有多少

經過這兩個分數和人工選擇，最後選出4k的對話數據做爲有效的數據集用來作validation dataset和測試集，這些對話須要從訓練數據中抽離。
15k(way1)+15k(way2)+15k(random)->10k(sampled)->4k(human-judge reranking)spa

實驗

多任務訓練：

FACTStask:We expose the full model to ({f1,...,fn,S},R)training examples.
NOFACTStask: We expose the model without fact en-coder to (S,R) examples.
AUTOENCODERtask: This is similar to the FACTStask,except that we replace the response with each of the facts,i.e., this model is trained on({f1,...,fn,S},fi)exam-ples. There arentimes many samples for this task thanfor the FACTStask.4

decoding and reranking
驗證集： This yields the following reranking score:
logP(R|S,F) +λlogP(S|R) +γ|R|

the log-likelihoodlogP(R|S,F)according to the decoder;
word count;
the log-likelihoodlogP(S|R)of the source giventhe response.
λ and γ are free parameters, which we tune on our development N-best lists using MERT (Och 2003) by optimizing BLEU.

evaluation matrics
BLEU automatic evaluation,perplexity,lexiel diversity
Automatic evaluation is augmented with human judgments of appropriateness and informativeness

results

==SEQ2SEQ==: Trained on task NOFACTS with the 23M gen-eral conversation dataset. Since there is only one task, it is not per sea multi-task setting.
==SEQ2SEQ-S==: SEQ2SEQ model that is trained on the NOFACTS task with 1M grounded dataset (without the facts)
==MTASK==: Trained on two instances of the NOFACTS task, respectively with the 23M general dataset and 1M grounded dataset (but without the facts). While not an in-teresting system in itself, we include it to assess the effect of multi-task learning separately from facts.
==MTASK-R==: Trained on the NOFACTS task with the 23M dataset, and the FACTS task with the 1M groundeddataset.
==MTASK-F==: Trained on the NOFACTS task with the 23Mdataset, and the AUTOENCODERtask with the 1M dataset.
==MTASK-RF==: Blends MTASK-F and MTASK-R, as it incorporates 3 tasks: NOFACTS with the 23M generaldataset, FACTSwith the 1M grounded dataset, and AU-TOENCODER again with the 1M dataset.

相關標籤/搜索

cudnn7.0.4+tensorflow1.5.0+neural

a+aa+aaa+...+aa..a

api+domain+model

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。