WMT15 單句評價任務的分析

時間 2019-12-10

標籤 wmt15 wmt 單句評價任務分析简体版

原文原文鏈接

關於baseline

使用的SVM regression, RBF kernel. 用 grid search 設定 hpyer parameter. 使用了17個feature:markdown

<http://www.quest.dcs.shef.ac.uk/quest_files/features_blackbox_baseline_17>
number of tokens in the source sentence
number of tokens in the target sentence
average source token length
LM probability of source sentence
LM probability of target sentence
number of occurrences of the target word within the target hypothesis (averaged for all words in the hypothesis - type/token ratio)
average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob(t|s) > 0.2)
average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob(t|s) > 0.01) weighted by the inverse frequency of each word in the source corpus
percentage of unigrams in quartile 1 of frequency (lower frequency words) in a corpus of the source language (SMT training corpus)
percentage of unigrams in quartile 4 of frequency (higher frequency words) in a corpus of the source language
percentage of bigrams in quartile 1 of frequency of source words in a corpus of the source language
percentage of bigrams in quartile 4 of frequency of source words in a corpus of the source language
percentage of trigrams in quartile 1 of frequency of source words in a corpus of the source language
percentage of trigrams in quartile 4 of frequency of source words in a corpus of the source language
percentage of unigrams in the source sentence seen in a corpus (SMT training corpus)
number of punctuation marks in the source sentence
number of punctuation marks in the target sentence

關於任務背景

翻譯評價任務有3個: Task 1 是句子級別的; Task 2 是單詞級別的; Task 3 是文檔級別的。
下邊是全部參賽(評測任務)的小組，這裏只關注句子級別(Task 2)的。ide

ID	Tasks	Participating team	Paper
DCU-SHEFF	2	Dublin City University, Ireland and University of Sheffield, UK	Logachevaet al., 2015
HDCL	2	Heidelberg University, Germany	Kreutzer et al., 2015
LORIA	1	Lorraine Laboratory of Research in Computer Science and its Applications,France	Langlois, 2015
RTM-DCU	1,2,3	Dublin City University, Ireland	Bicici et al., 2015
SAU-KERC	2	Shenyang Aerospace University, China	Shang et al., 2015
SHEFF-NN	1,2	University of Sheffield Team 1, UK	Shah et al., 2015
UAlacant	2	Alicant University, Spain	Esplà-Gomis et al., 2015a
UGENT	1,2	Ghent University, Belgium	Tezcan et al., 2015
USAAR-USHEF	3	University of Sheffield, UK and Saarland University, Germany	Scarton et al.,2015a
USHEF	3	University of Sheffield, UK	Scarton et al., 2015a
HIDDEN	3	Undisclose

評測的結果有兩種，HTER 和 ranking。HTER (Human-targeted Translation Error Rate) 越小越好。評價指標是 MAE 和 RMSE。(經過計算 ranking 是將翻譯的句子從好到壞排序，不考慮。)優化

ID	System	MAE↓	RMSE↓
RTM-DCU	RTM-FS+PLS-SVR	13.25	17.48
LORIA	17+LSI+MT+FILTRE	13.34	17.35
RTM-DCU	RTM-FS-SVR	13.35	17.68
LORIA	17+LSI+MT	13.42	17.45
UGENT-LT3	SCATE-SVM	13.71	17.45
UGENT-LT3	SCATE-SVM-single	13.76	17.79
SHEF	SVM	13.83	18.01
Baseline	SVM	14.82	19.13
SHEF	GP	15.16	18.97