EMNLP2020 | Language Model精選論文解讀

AMiner平臺由清華大學計算機系研發，擁有我國徹底自主知識產權。平臺包含了超過2.3億學術論文/專利和1.36億學者的科技圖譜，提供學者評價、專家發現、智能指派、學術地圖等科技情報專業化服務。系統2006年上線，吸引了全球220個國家/地區1000多萬獨立IP訪問，數據下載量230萬次，年度訪問量超過1100萬，成爲學術搜索和社會網絡挖掘研究的重要數據和實驗平臺。web

AMiner平臺：https://www.aminer.cn算法

導語：EMNLP，天然語言處理經驗方法會議（Conference on Empirical Methods in Natural Language Processing），是由國際語言學會（ACL）下屬的SIGDAT小組主辦的天然語言處理領域的頂級國際會議，也是天然語言算法的A類會議。
根據EMNLP2020官方數據，今年共審閱論文3359篇，接收754篇，接收率爲22.4%。express

在AMiner平臺EMNLP2020會議專題的首頁中，AMiner根據本屆論文數據，最後生成了熱門Topic詞雲圖，Language Model（語言模型）是今年比較火的選題之一。
網絡

EMNLP2020專題：https://www.aminer.cn/conf/emnlp2020/homepageapp

今天就爲你們奉上7篇必讀的Language Model相關論文。更多EMNLP2020主題論文，能夠查看專題頁面：less

1.論文名稱：How Much Knowledge Can You Pack Into the Parameters of a Language Model?
論文連接：https://www.aminer.cn/pub/5e4faa9f3a55ac969512bc33?conf=emnlp2020
做者：Roberts Adam, Raffel Colin, Shazeer Noam
簡介：
Deep neural language models that have been pre-trained on unlabeled text have proven to be extremely performant when fine-tuned on downstream Natural Language Processing (NLP) tasks (Devlin et al, 2018; Yang et al, 2019; Liu et al, 2019; Lan et al, 2019; Raffel et al, 2019).
The authors take a different approach by evaluating the capability of language models on the practical task of opendomain question answering – the authors finetune the model to answer questions without access to any external knowledge or context.
dom

2.論文名稱：On Extractive, Abstractive Neural Document Summarization with Transformer Language Models
論文連接：https://www.aminer.cn/pub/5f7fe6d80205f07f689732a0?conf=emnlp2020
做者：Jonathan Pilault, Raymond Li, Sandeep Subramanian, Chris Pal
簡介：
Language models (LMs) are trained to estimate the joint probability of an arbitrary sequence of words or characters using a large corpus of text.
Markovian assumptions and the curse of dimensionality make it harder for n-gram LMs to model long range dependencies and learn smooth functions that can learn similarities between words in the vocabulary
This has led to a preference for recurrent or feed-forward neural language models (Bengio et al, 2003; Mikolov et al, 2010) in recent years due to to their ability to learn expressive conditional probability distributions (Merity et al, 2017; Radford et al, 2019).
RNNs are limited by their sequential nature, making them 1) difficult to optimize and learn for long sequences with long range dependencies (Hochreiter, 1998; Pascanu et al, 2013), and 2) hard to parallelize on modern hardware like GPUs, limiting their scalability
svg

3.論文名稱：Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
論文連接：https://www.aminer.cn/pub/5f02e52b9e795e22854aeb37?conf=emnlp2020
做者：Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He
簡介：
Notable works include ELMo (Peters et al, 2018), BERT (Devlin et al, 2019), Transformer-XL (Dai et al, 2019), ALBERT (Lan et al, 2019), StructBERT (Wang et al, 2019b) and many others
These models revolutionize the learning paradigms of various NLP tasks.
State-of-the art language models mostly utilize self-supervised tasks during pretraining (for instance, masked language modeling and sentence prediction in BERT (Devlin et al, 2019))
This unavoidably creates a learning gap between pre-training and fine-tuning.
For a group of similar tasks, conventional practices require the parameters of all task-specific models to be initialized from the same pre-trained language model, ignoring how the learning process in different domains is correlated and mutually reinforced
ui

4.論文：Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
論文連接：https://www.aminer.cn/pub/5e4129b13a55ac9f8f89e019?conf=emnlp2020
做者：Welleck Sean, Kulikov Ilia, Kim Jaedeok, Pang Richard Yuanzhe, Cho Kyunghyun
簡介：
Neural sequence models trained with maximum likelihood estimation (MLE) have become a standard approach to modeling sequences in a variety of natural language applications such as machine translation (Bahdanau et al, 2015), dialogue modeling (Vinyals et al, 2015), and language modeling (Radford et al, 2018).
Despite this success, MLEtrained neural sequence models have been shown to exhibit issues such as length bias (Sountsov & Sarawagi, 2016; Stahlberg & Byrne, 2019) and degenerate repetition (Holtzman et al, 2019)
These issues are suspected to be related to the maximum likelihood objective’s local normalization, which results in a discrepancy between the learned model’s distribution and the distribution induced by the decoding algorithm used to generate sequences (Lafferty et al, 2001; Andor et al, 2016).
this

5.論文名稱：DagoBERT: Generating Derivational Morphology with a Pretrained Language Model.
論文連接：https://www.aminer.cn/pub/5f7fe6d80205f07f689731a2?conf=emnlp2020
做者：Valentin Hofmann, Janet Pierrehumbert, Hinrich Schütze
簡介：
This question has attracted a lot of attention in NLP recently, with a focus on syntax (e.g., Goldberg, 2019) and semantics (e.g., Ethayarajh, 2019).
It is much less clear what PLMs learn about other aspects of language.
PLMs about derivational morphology, taking BERT as the example PLM.

6.論文名稱：Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models
論文連接：https://www.aminer.cn/pub/5eb78919da5629cf244303f4?conf=emnlp2020
做者：Lin Bill Yuchen, Lee Seyeon, Khanna Rahul, Ren Xiang
簡介：
Pre-trained language models (PTLMs), such as BERT (Devlin et al, 2019), have yielded stateof-the-art performance on many natural language processing tasks.
Given PTLMs’ cited ability to create general, yet useful text representations, an investigation into their ability to encode commonsense knowledge into representations is warrantedcommonsense knowledge is often required to have a full understanding of language.
Motivated by this and similar inquiries, probing tasks for analyzing PTLMs’ behaviors have been created.

7.論文名稱：Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models.
論文連接：https://www.aminer.cn/pub/5f7fe6d80205f07f68973254?conf=emnlp2020
做者：Isabel Papadimitriou, Dan Jurafsky
簡介：
The authors train LSTM models on data with varying degrees of language-like structure, and evaluate their performance on natural language.
The authors freeze the LSTM parameters and fine-tune the word embeddings on the evaluation language.
This lets them see if the training data induces language-like structure in the recurrent parameters of LSTMs— despite removing vocabulary-level confounders.
By assessing if representations are useful across languages, the authors examine the attributes of grammar

更詳細瞭解EMNLP2020論文，能夠關注公衆號或者連接直達EMNLP2020專題，最前沿的研究方向和最全面的論文數據等你來~

本文同步分享在博客「AMiner科技」（CSDN）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。