機器翻譯評價指標 — BLEU算法

時間 2019-12-13

標籤機器翻譯評價指標 bleu 算法简体版

原文原文鏈接

1，概述算法

　　機器翻譯中經常使用的自動評價指標是 $BLEU$ 算法，除了在機器翻譯中的應用，在其餘的 $seq2seq$ 任務中也會使用，例如對話系統。函數

2 $BLEU$算法詳解工具

　　假定人工給出的譯文爲$reference$，機器翻譯的譯文爲$candidate$。spa

　　1）最先的$BLEU$算法翻譯

　　　　最先的$BLEU$算法是直接統計$cadinate$中的單詞有多少個出如今$reference$中，具體的式子是：code

　　　　$BLEU = \frac {出如今reference中的candinate的單詞的個數} {cadinate中單詞的總數}$對象

　　　　如下面例子爲例：blog

　　　　$ candinate:$ the the the the the the the ip

　　　　$ reference:$ the cat is on the mat io

　　　　$cadinate$中全部的單詞都在$reference$中出現過，所以：

　　　　$BLEU = \frac {7} {7} = 1$

　　　　對上面的結果顯然是不合理的，並且主要是分子的統計不合理，所以對上面式子中的分子進行了改進。

　　2）改進的$BLEU$算法 — 分子截斷計數

　　　　針對上面不合理的結果，對分子的計算進行了改進，具體的作法以下：

　　　　$Count_{w_i}^{clip} = min(Count_{w_i},Ref\_Count_{w_i})$

　　　　上面式子中：

　　　　$Count_{w_i}$ 表示單詞$w_i$在$candinate$中出現的次數；

　　　　$Ref\_Count_{w_i}$ 表示單詞$w_i$在$reference$中出現的次數；

　　　　但通常狀況下$reference$可能會有多個，所以有：

　　　　$Count^{clip} = max(Count_{w_i,j}^{clip}), j=1,2,3......$

　　　　上面式子中：$j$表示第$j$個$reference$。

　　　　仍然以上面的例子爲例，在$candinate$中只有一個單詞$the$，所以只要計算一個$Count^{clip}$，$the$在$reference$中只出現了兩次，所以：

　　　　$BLEU = \frac {2} {7}$

　　3）引入$n-gram$

　　　　在上面咱們一直談的都是對於單個單詞進行計算，單個單詞能夠看做時$1-gram$，$1-gram$能夠描述翻譯的充分性，即逐字翻譯的能力，但不能關注翻譯的流暢性，所以引入了$n-gram$，在這裏通常$n$不大於4。引入$n-gram$後的表達式以下：

　　　　$p_{n}=\frac{\sum_{c_{\in candidates}}\sum_{n-gram_{\in c}}Count_{clip}(n-gram)}{\sum_{c^{'}_{\in candidates}}\sum_{n-gram^{'}_{\in c^{'}}}Count(n-gram^{'})}$

　　　　不少時候在評價一個系統時會用多條$candinate$來評價，所以上面式子中引入了一個候選集合$candinates$。$p_{n}$ 中的$n$表示$n-gram$，$p_{n}$表示$n_gram$的精度，即$1-gram$時，$n = 1$。

　　　　接下來簡單的理解下上面的式子，首先來看分子：

　　　　1）第一個$\sum$ 描述的是各個$candinate$的總和；

　　　　2）第二個$\sum$ 描述的是一條$candinate$中全部的$n-gram$的總和；

　　　　3）$Count_{clip}(n-gram)$ 表示某一個$n-gram$詞的截斷計數；

　　　　再來看分母，前兩個$\sum$和分子中的含義同樣，$Count(n-gram^{'})$表示$n-gram^{'}$在$candinate$中的計數。

　　　　再進一步來看，實際上分母就是$candinate$中$n-gram$的個數，分子是出如今$reference$中的$candinate$中$n-gram$的個數。

　　　　舉一個例子來看看實際的計算：

　　　　$candinate:$ the cat sat on the mat

　　　　$reference:$ the cat is on the mat

　　　　計算$n-gram$的精度：

　　　　$p_1 = \frac {5} {6} = 0.83333$

　　　　$p_2 = \frac {3} {5} = 0.6$

　　　　$p_3 = \frac {1} {4} = 0.25$

　　　　$p_4 = \frac {0} {3} = 0$

　　4）添加對句子長度的乘法因子

　　　　在翻譯時，若出現譯文很短的句子時每每會有較高的$BLEU$值，所以引入對句子長度的乘法因子，其表達式以下：

　　　　在這裏$c$表示$cadinate$的長度，$r$表示$reference$的長度。

　　將上面的整合在一塊兒，獲得最終的表達式：

　　　　$BLEU = BP exp(\sum_{n=1}^N w_n \log p_n)$

　　其中$exp(\sum_{n=1}^N w_n \log p_n)$ 表示不一樣的$n-gram$的精度的對數的加權和。

3，$NLTK$實現

　　能夠直接用工具包實現

from nltk.translate.bleu_score import sentence_bleu, corpus_bleu
from nltk.translate.bleu_score import SmoothingFunction
reference = [['The', 'cat', 'is', 'on', 'the', 'mat']]
candidate = ['The', 'cat', 'sat', 'on', 'the', 'mat']
smooth = SmoothingFunction()  # 定義平滑函數對象
score = sentence_bleu(reference, candidate, weight=(0.25,0.25, 0.25, 0.25), smoothing_function=smooth.method1)
corpus_score = corpus_bleu([reference], [candidate], smoothing_function=smooth.method1)

　　$NLTK$中提供了兩種計算$BLEU$的方法，實際上在sentence_bleu中是調用了corpus_bleu方法，另外要注意$reference$和$candinate$連個參數的列表嵌套不要錯了，weight參數是設置不一樣的$n-gram$的權重，另外weight元祖中的數量決定了計算$BLEU$時，會用幾個$n-gram$，以上面爲例，會用$1-gram, 2-gram, 3-gram, 4-gram$。SmoothingFunction是用來平滑log函數的結果的，防止$f_n = 0$時，取對數爲負無窮。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。