NAACL 2019 字詞表示學習分析

時間 2019-12-10

標籤 naacl 字詞表示學習分析简体版

原文原文鏈接

NAACL 2019 表示學習分析

爲要找出字、詞、文檔等實體表示學習相關的文章。node

word embedding

搜索關鍵詞 word embedding算法

Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

In this paper, we propose a novel representation for text documents based on aggregating word embedding vectors into document embeddings.express

Our approach is inspired by the Vector of Locally-Aggregated Descriptors used for image representation, and it works as follows.跨域

First, the word embeddings gathered from a collection of documents are clustered by k-means in order to learn a codebook of semnatically-related word embeddings.網絡

Each word embedding is then associated to its nearest cluster centroid (codeword).架構

The Vector of Locally-Aggregated Word Embeddings (VLAWE) representation of a document is then computed by accumulating the differences between each codeword vector and each word vector (from the document) associated to the respective codeword.app

We plug the VLAWE representation, which is learned in an unsupervised manner, into a classifier and show that it is useful for a diverse set of text classification tasks.框架

We compare our approach with a broad range of recent state-of-the-art methods, demonstrating the effectiveness of our approach.less

Furthermore, we obtain a considerable improvement on the Movie Review data set, reporting an accuracy of 93.3%, which represents an absolute gain of 10% over the state-of-the-art approach.dom

《局部聚合字嵌入向量：一種新的文檔級表示》

在本文中，咱們提出了一種新的文本表示方法，該方法是將嵌入向量集成到文檔嵌入中。

咱們的方法受到用於圖像表示的局部聚合描述符的向量的啓發，其工做原理以下。

首先，從文檔集合中收集到的單詞嵌入被k-means彙集起來，以便學習與語義相關的單詞嵌入的代碼本。

而後將每一個字嵌入與其最近的簇形心（代碼字）相關聯。

而後，經過累積每一個碼字向量和與各個碼字相關聯的每一個字向量（來自文檔）之間的差別，計算文檔的局部聚合字嵌入（VLAWE）表示的向量。

咱們將以無監督方式學習的VLAWE表示插入到分類器中，並代表它對於不一樣的文本分類任務集是有用的。

咱們將咱們的方法與一系列最新最早進的方法進行了比較，證實了咱們的方法的有效性。

此外，咱們對電影評論數據集進行了至關大的改進，報告的準確率爲93.3%，這意味着絕對收益比最早進的方法高出10%。

Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision

Word embeddings learned in two languages can be mapped to a common space to produce Bilingual Word Embeddings (BWE).

Unsupervised BWE methods learn such a mapping without any parallel data.

However, these methods are mainly evaluated on tasks of word translation or word similarity.

We show that these methods fail to capture the sentiment information and do not perform well enough on cross-lingual sentiment analysis.

In this work, we propose UBiSE (Unsupervised Bilingual Sentiment Embeddings), which learns sentiment-specific word representations for two languages in a common space without any cross-lingual supervision.

Our method only requires a sentiment corpus in the source language and pretrained monolingual word embeddings of both languages.

We evaluate our method on three language pairs for cross-lingual sentiment analysis.

Experimental results show that our method outperforms previous unsupervised BWE methods and even supervised BWE methods.

Our method succeeds for a distant language pair English-Basque.

《在不借助跨語言監督的狀況下學習雙語情感特定詞嵌入》

兩種語言學習的單詞嵌入能夠映射到一個公共空間，以生成雙語單詞嵌入（BWE）。

無監督的BWE方法在沒有任何並行數據的狀況下學習這種映射。

然而，這些方法主要是對翻譯任務或詞的類似性進行評價。

咱們發現，這些方法沒法捕獲情感信息，而且在跨語言情感分析中表現不佳。

在這項工做中，咱們提出了Ubise（無監督雙語情緒嵌入），它能夠在沒有任何跨語言監督的狀況下，在一個公共空間中學習兩種語言的情緒特定詞彙表示。

咱們的方法只須要一個情感語料庫在源語言和預先訓練的單語單詞嵌入兩種語言。

咱們評估了三種語言對的跨語言情感分析方法。

實驗結果代表，該方法優於以往的無監督BWE方法，甚至優於有監督BWE方法。

咱們的方法適用於一對遠程語言的英語巴斯克語。

ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems

Regularization of neural machine translation is still a significant problem, especially in low-resource settings.

To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value).

Such a joint training allows the proposed system to learn the distributional properties represented by the word embeddings, empirically improving the generalization to unseen sentences.

Experiments over three translation datasets have showed a consistent improvement over a strong baseline, ranging between 0.91 and 2.4 BLEU points, and also a marked improvement over a state-of-the-art system.

《ReWE：用於神經機器翻譯系統正則化的詞嵌入》

神經機器翻譯的正則化仍然是一個重要的問題，特別是在低資源設置。

爲了解決這個問題，咱們提出了迴歸嵌入詞（ReWE）做爲一種新的正則化技術，在一個系統中聯合訓練預測翻譯中的下一個詞（分類值）及其嵌入詞（連續值）。

這樣的聯合訓練使所提出的系統可以學習單詞embeddings所表明的分佈性質，從而從經驗上改進對未見過句子的泛化。

對三個翻譯數據集進行的實驗代表，相對於一個強大的基線，在0.91 到 2.4 BLEU之間有了一致的改進，而且比最早進的系統有了顯著的改進。

Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts

Learning high-quality embeddings for rare words is a hard problem because of sparse context information.

Mimicking (Pinter et al., 2017) has been proposed as a solution: given embeddings learned by a standard algorithm, a model is first trained to reproduce embeddings of frequent words from their surface form and then used to compute embeddings for rare words.

In this paper, we introduce attentive mimicking: the mimicking model is given access not only to a word’s surface form, but also to all available contexts and learns to attend to the most informative and reliable contexts for computing an embedding.

In an evaluation on four tasks, we show that attentive mimicking outperforms previous work for both rare and medium-frequency words.

Thus, compared to previous work, attentive mimicking improves embeddings for a much larger part of the vocabulary, including the medium-frequency range.

《專一模仿：經過關注信息上下文來更好地嵌入單詞》

因爲上下文信息稀疏，學習高質量的稀有詞嵌入是一個難題。

「模仿」被（Pinter et al., 2017）提出做爲一個解決方案。給定標準算法學習的嵌入，首先訓練一個模型從其表面形式重現頻繁詞的嵌入，而後用它計算稀有詞的嵌入。

在本文中，咱們引入了專一的模仿：模仿模型不只能夠訪問一個單詞的表面形式，並且能夠訪問全部可用的上下文，並學習如何關注信息量最大、最可靠的上下文來計算嵌入。

在對四個任務的評估中，咱們發現專一模仿在稀有和中頻詞彙方面都優於之前的工做。

所以，與之前的工做相比，專心的模仿提升了詞彙表更大部分的嵌入，包括中頻範圍。

Better Word Embeddings by Disentangling Contextual n-Gram Information

Pre-trained word vectors are ubiquitous in Natural Language Processing applications.

In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings.

We claim that training word embeddings along with higher n-gram embeddings helps in the removal of the contextual information from the unigrams, resulting in better stand-alone word embeddings.

We empirically show the validity of our hypothesis by outperforming other competing word representation models by a significant margin on a wide variety of tasks.

We make our models publicly available.

《經過分離上下文n-gram信息更好地嵌入單詞》

預先訓練的詞向量在天然語言處理應用中廣泛存在。

本文介紹了訓練詞嵌入與bigram甚至tri-gram嵌入的結合，改進了unigram嵌入。

咱們認爲訓練單詞嵌入和更高的n-gram嵌入有助於從unigram中刪除上下文信息，從而產生更好的獨立單詞嵌入。

咱們經過在各類各樣的任務上以顯著的優點賽過其餘競爭詞表示模型，實證地證實了咱們假設的有效性。

咱們把咱們的模型公開。

Word-Node2Vec: Improving Word Embedding with Document-Level Non-Local Word Co-occurrences

A standard word embedding algorithm, such as word2vec and glove, makes a strong assumption that words are likely to be semantically related only if they co-occur locally within a window of fixed size.

However, this strong assumption may not capture the semantic association between words that co-occur frequently but non-locally within documents.

In this paper, we propose a graph-based word embedding method, named ‘word-node2vec’.

By relaxing the strong constraint of locality, our method is able to capture both the local and non-local co-occurrences.

Word-node2vec constructs a graph where every node represents a word and an edge between two nodes represents a combination of both local (e.g. word2vec) and document-level co-occurrences.

Our experiments show that word-node2vec outperforms word2vec and glove on a range of different tasks, such as predicting word-pair similarity, word analogy and concept categorization.

《word-node2vec：改進文檔級非本地單詞的嵌入》

標準的單詞嵌入算法（如word2vec和glove）做出了一個強有力的假設，即單詞只有在固定大小的窗口中局部共存時，纔有可能與語義相關。

然而，這一強有力的假設可能沒法捕獲常常出現但在文檔中非本地出現的單詞之間的語義關聯。

本文提出了一種基於圖的嵌入方法「word-node2vec」。

經過對局部強約束的放鬆，咱們的方法可以捕獲局部和非局部的共現。

word-node2vec構造了一個圖，其中每一個節點表示一個單詞，兩個節點之間的邊表示本地（例如word2vec）和文檔級共存的組合。

咱們的實驗代表，word-node2vec比word2vec和glove在預測word similarity、word analogy和概念分類等一系列不一樣的任務上表現出色。

Density Matching for Bilingual Word Embedding

Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages.

In this paper, we propose an approach that instead expresses the two monolingual embedding spaces as probability densities defined by a Gaussian mixture model, and matches the two densities using a method called normalizing flow.

The method requires no explicit supervision, and can be learned with only a seed dictionary of words that have identical strings.

We argue that this formulation has several intuitively attractive properties, particularly with the respect to improving robustness and generalization to mappings between difficult language pairs or word pairs.

On a benchmark data set of bilingual lexicon induction and cross-lingual word similarity, our approach can achieve competitive or superior performance compared to state-of-the-art published results, with particularly strong results being found on etymologically distant and/or morphologically rich languages.

《雙語嵌入詞的密度匹配》

最近的跨語言嵌入方法通常都是基於兩種語言中嵌入向量集之間的線性變換。

本文提出了一種將兩個單語嵌入空間表示爲高斯混合模型定義的機率密度的方法，並用歸一化流方法對兩個密度進行匹配。

該方法不須要顯式的監督，而且只能使用具備相同字符串的單詞種子字典來學習。

咱們認爲，這個構想有幾個直觀上有吸引力的性質，特別是在提升健壯性和推廣到困難語言對或詞對之間的映射方面。

在雙語詞彙概括和跨語言詞彙類似度的基準數據集上，與最新出版的結果相比，咱們的方法能夠得到具備競爭力的或優越的性能，尤爲是在詞源遙遠和/或形態豐富的語言上發現了很是強的結果。

Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs

Recent research has discovered that a shared bilingual word embedding space can be induced by projecting monolingual word embedding spaces from two languages using a self-learning paradigm without any bilingual supervision.

However, it has also been shown that for distant language pairs such fully unsupervised self-learning methods are unstable and often get stuck in poor local optima due to reduced isomorphism between starting monolingual spaces.

In this work, we propose a new robust framework for learning unsupervised multilingual word embeddings that mitigates the instability issues.

We learn a shared multilingual embedding space for a variable number of languages by incrementally adding new languages one by one to the current multilingual space.

Through the gradual language addition the method can leverage the interdependencies between the new language and all other languages in the current multilingual space.

We find that it is beneficial to project more distant languages later in the iterative process.

Our fully unsupervised multilingual embedding spaces yield results that are on par with the state-of-the-art methods in the bilingual lexicon induction (BLI) task, and simultaneously obtain state-of-the-art scores on two downstream tasks: multilingual document classification and multilingual dependency parsing, outperforming even supervised baselines.

This finding also accentuates the need to establish evaluation protocols for cross-lingual word embeddings beyond the omnipresent intrinsic BLI task in future work.

《使用增量多語言集線器學習無監督多語言單詞嵌入》

最近的研究發現，在沒有任何雙語監督的狀況下，利用自學習範式從兩種語言中投射出單語嵌入空間，能夠誘導共享的雙語嵌入空間。

然而，也有研究代表，對於遠程語言對，這種徹底無監督的自學習方法是不穩定的，而且因爲起始單語空間之間的同構減小，經常陷入局部最優。

在這項工做中，咱們提出了一個新的強大的框架學習無監督的多語言單詞嵌入，以減輕不穩定的問題。

咱們經過在當前多語言空間中逐個遞增地添加新的語言，來學習可變數量語言的共享多語言嵌入空間。

經過逐步的語言添加，該方法能夠利用新語言和當前多語言空間中全部其餘語言之間的相互依賴性。

咱們發如今稍後的迭代過程當中投射更遙遠的語言是有益的。

咱們的徹底無監督的多語言嵌入空間產生的結果與雙語詞彙誘導（BLI）任務中的最新方法至關，同時得到兩個下游任務的最新分數：多語言文檔分類和多語言依賴性分析，甚至優於監督的baseli。NES。

這一發現也強調了在將來工做中，除了無所不在的內在 BLI 任務以外，還須要創建跨語言單詞嵌入的評估協議。

VCWE: Visual Character-Enhanced Word Embeddings

Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information.

In this paper, we propose a model to learn Chinese word embeddings via three-level composition:

(1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character;

(2) a recurrent neural network with self-attention to compose character representation into word embeddings;

(3) the Skip-Gram framework to capture non-compositionality directly from the contextual information.

Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.

《視覺字符加強詞嵌入》

中文是一種標識文字系統，漢字的形狀包含着豐富的句法和語義信息。

在本文中，咱們提出了一個經過三級組合學習漢字嵌入的模型：

（1）卷積神經網絡，從字符的視覺形狀中提取字符內部的成分；

（2）一個具備self-attention的遞歸神經網絡，將字符表示合成單詞嵌入；

（3）Skip-Gram 直接從上下文信息中捕獲非組合性。

評估代表，咱們的模型在四個任務上表現出色：word similarity、情感分析、命名實體識別和詞性標註。

Improving Cross-Domain Chinese Word Segmentation with Word Embeddings

Cross-domain Chinese Word Segmentation (CWS) remains a challenge despite recent progress in neural-based CWS.

The limited amount of annotated data in the target domain has been the key obstacle to a satisfactory performance.

In this paper, we propose a semi-supervised word-based approach to improving cross-domain CWS given a baseline segmenter.

Particularly, our model only deploys word embeddings trained on raw text in the target domain, discarding complex hand-crafted features and domain-specific dictionaries.

Innovative subsampling and negative sampling methods are proposed to derive word embeddings optimized for CWS.

We conduct experiments on five datasets in special domains, covering domains in novels, medicine, and patent.

Results show that our model can obviously improve cross-domain CWS, especially in the segmentation of domain-specific noun entities.

The word F-measure increases by over 3.0% on four datasets, outperforming state-of-the-art semi-supervised and unsupervised cross-domain CWS approaches with a large margin.

We make our data and code available on Github.

《基於嵌入詞的跨域漢語分詞改進》

儘管最近神經網絡有必定進展，但跨領域中文分詞（CWS）還是一個挑戰。

目標域中有限數量的標註數據是實現使人滿意的性能的關鍵障礙。

在本文中，咱們提出了一種基於半監督字的方法來改善跨域連續波，並給出了一個基線系統分段器。

特別是，咱們的模型只在目標域中部署針對原始文本的單詞嵌入，丟棄複雜的手工製做功能和特定於域的字典。

提出了一種新穎的子抽樣和負抽樣方法，以得到適合CWS的詞嵌入。

咱們在五個特殊領域的數據集上進行實驗，涵蓋小說、醫學和專利領域。

在四個數據集上，word f-measure的增加超過3.0%，超過了最早進的半監督和無監督跨域CWS方法，而且有很大的差距。

咱們在Github上公開了數據和代碼。

Misspelling Oblivious Word Embeddings

In this paper we present a method to learn word embeddings that are resilient to misspellings.

Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words.

We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns.

In our method, misspellings of each word are embedded close to their correct variants.

We train these embeddings on a new dataset we are releasing publicly.

Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

《拼錯遺忘詞嵌入》

在本文中，咱們提出了一種學習單詞嵌入的方法，這種方法能夠克服拼寫錯誤。

現有的單詞嵌入對於包含大量詞彙外單詞的畸形文本的適用性有限。

咱們提出了一種將FastText與亞詞相結合的方法，以及學習拼寫錯誤模式的監督任務。

在咱們的方法中，每一個單詞的拼寫錯誤都嵌入到它們正確的變體附近。

咱們在公開發布的新數據集上訓練這些嵌入。

最後，咱們實驗性地展現了這種方法在使用公共測試集執行內部和外部NLP任務時的優點。

Subword-based Compact Reconstruction of Word Embeddings

The idea of subword-based word embeddings has been proposed in the literature, mainly for solving the out-of-vocabulary (OOV) word problem observed in standard word-based word embeddings.

In this paper, we propose a method of reconstructing pre-trained word embeddings using subword information that can effectively represent a large number of subword embeddings in a considerably small fixed space.

The key techniques of our method are twofold: memory-shared embeddings and a variant of the key-value-query self-attention mechanism.

Our experiments show that our reconstructed subword-based embeddings can successfully imitate well-trained word embeddings in a small fixed space while preventing quality degradation across several linguistic benchmark datasets, and can simultaneously predict effective embeddings of OOV words.

We also demonstrate the effectiveness of our reconstruction method when we apply them to downstream tasks.

《基於亞詞的緊湊重構詞嵌入》

文獻中提出了基於亞詞嵌入的概念，主要是爲了解決標準的基於詞嵌入中出現的集外詞（OOV）問題。

本文提出了一種利用亞詞信息重構預訓練詞嵌入的方法，該方法能在至關小的固定空間內有效地表示大量亞詞嵌入。

該方法的關鍵技術有兩個方面：內存共享嵌入和鍵值查詢自注意機制的變體。

咱們的實驗代表，咱們重建的基於亞詞嵌入能夠成功地在一個小的固定空間模擬高質量的詞嵌入，同時防止跨幾個語言基準數據集的質量降低，而且能夠同時預測OOV單詞的有效嵌入。

當咱們將重建方法應用於下游任務時，咱們也證實了其有效性。

Learning to Respond to Mixed-code Queries using Bilingual Word Embeddings

We present a method for learning bilingual word embeddings in order to support second language (L2) learners in finding recurring phrases and example sentences that match mixed-code queries (e.g., 「接受 sentence」) composed of words in both target language and native language (L1).

In our approach, mixed-code queries are transformed into target language queries aimed at maximizing the probability of retrieving relevant target language phrases and sentences.

The method involves converting a given parallel corpus into mixed-code data, generating word embeddings from mixed-code data, and expanding queries in target languages based on bilingual word embeddings.

We present a prototype search engine, x.Linggle, that applies the method to a linguistic search engine for a parallel corpus. Preliminary evaluation on a list of common word-translation shows that the method performs reasonablly well.

《學習使用雙語單詞嵌入來響應混合代碼查詢》

咱們提出了一種學習雙語單詞嵌入的方法，以支持第二語言（L2）學習者查找重複出現的短語和示例句，這些短語和示例句匹配混合代碼查詢（例如，「接受句子」），由目標語言和母語（L1）中的單詞組成。

在咱們的方法中，混合代碼查詢被轉換成目標語言查詢，目的是最大化檢索相關目標語言短語和句子的機率。

該方法包括將給定的並行語料庫轉換爲混合代碼數據，從混合代碼數據生成單詞嵌入，並基於雙語單詞嵌入擴展目標語言中的查詢。

咱們提出了一個原型搜索引擎x.linggle，它將該方法應用於並行語料庫的語言搜索引擎。對經常使用詞翻譯列表的初步評價代表，該方法具備較好的翻譯效果。

SWOW-8500: Word Association task for Intrinsic Evaluation of Word Embeddings

Downstream evaluation of pretrained word embeddings is expensive, more so for tasks where current state of the art models are very large architectures.

Intrinsic evaluation using word similarity or analogy datasets, on the other hand, suffers from several disadvantages.

We propose a novel intrinsic evaluation task employing large word association datasets (particularly the Small World of Words dataset).

We observe correlations not just between performances on SWOW-8500 and previously proposed intrinsic tasks of word similarity prediction, but also with downstream tasks (eg. Text Classification and Natural Language Inference).

Most importantly, we report better confidence intervals for scores on our word association task, with no fall in correlation with downstream performance.

《SWOW-8500：詞語關聯任務，面向詞嵌入的內在評價》

對於當前最早進的模型都是很是大的架構的任務來講，對預訓練的詞嵌入的下游評估是昂貴的。

另外一方面，使用詞類似度或類比數據集的內在評價也存在一些缺點。

咱們提出了一個新的內在評價任務，採用大型詞彙關聯數據集（尤爲是小世界的詞彙數據集）。

咱們不只觀察到SWOW-8500的性能與先前提出的單詞類似性預測的內在任務之間的相關性，並且還觀察到與下游任務（如文本分類和天然語言推理）之間的相關性。

最重要的是，咱們報告了單詞關聯任務得分的更好的置信區間，與下游任務效果沒有相關性。

Word Representation

搜索關鍵詞 Word Representation

A Systematic Study of Leveraging Subword Information for Learning Word Representations

The use of subword-level information (e.g., characters, character n-grams, morphemes) has become ubiquitous in modern word representation learning.

Its importance is attested especially for morphologically rich languages which generate a large number of rare words.

Despite a steadily increasing interest in such subword-informed word representations, their systematic comparative analysis across typologically diverse languages and different tasks is still missing.

In this work, we deliver such a study focusing on the variation of two crucial components required for subword-level integration into word representation models:

1) segmentation of words into subword units, and 2) subword composition functions to obtain final word representations.

We propose a general framework for learning subword-informed word representations that allows for easy experimentation with different segmentation and composition components, also including more advanced techniques based on position embeddings and self-attention.

Using the unified framework, we run experiments over a large number of subword-informed word representation configurations (60 in total) on 3 tasks (general and rare word similarity, dependency parsing, fine-grained entity typing) for 5 languages representing 3 language types.

Our main results clearly indicate that there is no 「one-size-fits-all」 configuration, as performance is both language- and task-dependent.

We also show that configurations based on unsupervised segmentation (e.g., BPE, Morfessor) are sometimes comparable to or even outperform the ones based on supervised word segmentation.

《利用亞詞信息學習詞彙表徵的系統研究》

亞詞級信息（如字符、字符n-gram、詞素）的使用在現代詞的表示學習中已經廣泛存在。

它的重要性獲得了證明，尤爲是對於產生大量稀有詞彙的形態豐富的語言。

儘管人們對這種亞詞信息詞表示法的興趣在穩步增長，但它們在不一樣類型語言和不一樣任務中的系統比較分析仍然缺失。

在這項工做中，咱們提供了這樣一項研究，重點是將亞詞級集成到字表示模型中所需的兩個關鍵組件的變化：

1）將單詞分割成子字單元，以及 2）子字合成功能，以得到最終的字表示。

咱們提出了一個學習亞詞信息詞表示的通用框架，容許使用不一樣的分段和組合組件進行簡單的實驗，還包括基於位置嵌入和自注意力更先進的技術。

使用統一的框架，咱們在表明3種語言類型的5種語言的3個任務（通常和罕見的詞類似性、依賴性分析、細粒度實體類型）上運行了大量子字通知的詞表示配置（總共60個）的實驗。

咱們的主要結果清楚地代表，沒有「一刀切」的配置，由於性能依賴於語言和任務。

咱們還代表，基於無監督分詞的配置（如bpe、morfessor）有時與基於監督分詞的配置至關，甚至優於基於監督分詞的配置。

Gating Mechanisms for Combining Character and Word-level Word Representations: an Empirical Study

In this paper we study how different ways of combining character and word-level representations affect the quality of both final word and sentence representations.

We provide strong empirical evidence that modeling characters improves the learned representations at the word and sentence levels, and that doing so is particularly useful when representing less frequent words.

We further show that a feature-wise sigmoid gating mechanism is a robust method for creating representations that encode semantic similarity, as it performed reasonably well in several word similarity datasets.

Finally, our findings suggest that properly capturing semantic similarity at the word level does not consistently yield improved performance in downstream sentence-level tasks.

《字符和詞級詞表示結合的門控機制：實證研究》

本文研究了不一樣的字符和詞級表達組合方式對詞尾和句子表達質量的影響。

咱們提供了強有力的經驗證據，證實建模字符能夠改善單詞和句子層次的學習表示，而且這樣作在表示不太頻繁的單詞時尤爲有用。

咱們進一步證實了一種基於特徵的sigmoid選通機制是一種用於建立編碼語義類似性的表示的強大方法，由於它在多個單詞類似性數據集中表現得至關出色。

最後，咱們的研究結果代表，在單詞層面上恰當地捕捉語義類似性並不能持續提升下游句子層面任務的性能。

*2vec

搜索關鍵字 2vec

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference

Reasoning about implied relationships (e.g. paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems.

This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships.

Our pairwise embeddings are computed as a compositional function of each word’s representation, which is learned by maximizing the pointwise mutual information (PMI) with the contexts in which the the two words co-occur.

We add these representations to the cross-sentence attention layer of existing inference models (e.g. BiDAF for QA, ESIM for NLI), instead of extending or replacing existing word embeddings.

Experiments show a gain of 2.7% on the recently released SQuAD 2.0 and 1.3% on MultiNLI.

Our representations also aid in better generalization with gains of around 6-7% on adversarial SQuAD datasets, and 8.8% on the adversarial entailment test set by Glockner et al. (2018).

《pair2vec：用於跨句推理的複合詞對兒嵌入》

對於許多跨句推理問題來講，單詞對之間的隱含關係（如釋義、常識、百科全書）的推理相當重要。

本文提出了一種新的學習和使用嵌入詞對的方法，它隱式地表示了這種關係的背景知識。

咱們的成對嵌入被計算爲每一個單詞表示的複合函數，這是經過最大化點態互信息（PMI）和兩個單詞共同出現的上下文來學習的。

咱們將這些表示添加到現有推理模型的跨句子關注層（例如，QA的BiDAF 、NLI的ESIM），而不是擴展或替換現有的單詞嵌入。

實驗代表，在最近發佈SQuAD 2.0上取得了2.7%的提高，在 MultiNLI 上取得了1.3%的提高。

咱們的表示也有助於更好的歸納，在對抗SQuAD數據集上得到約6-7%的提高，在Glockner（2018）等人的對抗限定測試集上得到8.8%的提高。

Augmenting word2vec with latent Dirichlet allocation within a clinical application

This paper presents three hybrid models that directly combine latent Dirichlet allocation and word embedding for distinguishing between speakers with and without Alzheimer’s disease from transcripts of picture descriptions.

Two of our models get F-scores over the current state-of-the-art using automatic methods on the DementiaBank dataset.

《在臨牀應用中使用LDA來加強word2vec》

本文提出了三種混合模型，將LDA和詞嵌入直接結合起來，用於區分阿爾茨海默病患者和非阿爾茨海默病患者的圖像描述轉錄本。

咱們的兩個模型在 DementiaBank 數據集上使用自動方法，取得了比最早進的水平更高的 F 值。

vector

搜索關鍵詞 vector

Vector of Locally Aggregated Embeddings for Text Representation

We present Vector of Locally Aggregated Embeddings (VLAE) for effective and, ultimately, lossless representation of textual content.

Our model encodes each input text by effectively identifying and integrating the representations of its semantically-relevant parts.

The proposed model generates high quality representation of textual content and improves the classification performance of current state-of-the-art deep averaging networks across several text classification tasks.

《文本表示的局部彙集嵌入向量》

咱們提出了局部彙集嵌入（VLAE）的向量，以有效且最終無損地表示文本內容。

咱們的模型經過有效地識別和集成其語義相關部分的表示來對每一個輸入文本進行編碼。

該模型能夠生成高質量的文本內容表示，並提升當前最早進的深度平均網絡在多個文本分類任務中的分類性能。