机器翻译评价指标之BLEU详细计算过程

您所在的位置：网站首页 › bleu法语怎么读 › 机器翻译评价指标之BLEU详细计算过程

机器翻译评价指标之BLEU详细计算过程

2023-12-25 17:21| 来源: 网络整理| 查看: 265

1. 简介

BLEU（Bilingual Evaluation Understudy），相信大家对这个评价指标的概念已经很熟悉，随便百度谷歌就有相关介绍。原论文为BLEU: a Method for Automatic Evaluation of Machine Translation，IBM出品。

本文通过一个例子详细介绍BLEU是如何计算以及NLTKnltk.align.bleu_score模块的源码。

首先祭出公式：

BLEU=BP⋅exp(∑n=1NwnlogPn) B L E U = B P ⋅ e x p ( ∑ n = 1 N w n l o g P n ) 其中， BP={1e1−r/cif c>rif c≤r B P = { 1 if c > r e 1 − r / c if c ≤ r

注意这里的BLEU值是针对一条翻译（一个样本）来说的。

NLTKnltk.align.bleu_score模块实现了这里的公式，主要包括三个函数，两个私有函数分别计算P和BP，一个函数整合计算BLEU值。

# 计算BLEU值 def bleu(candidate, references, weights) # （1）私有函数，计算修正的n元精确率（Modified n-gram Precision） def _modified_precision(candidate, references, n) # （2）私有函数，计算BP惩罚因子 def _brevity_penalty(candidate, references)

例子：

候选译文（Predicted）： It is a guide to action which ensures that the military always obeys the commands of the party

参考译文（Gold Standard） 1：It is a guide to action that ensures that the military will forever heed Party commands 2：It is the guiding principle which guarantees the military forces always being under the command of the Party 3：It is the practical guide for the army always to heed the directions of the party

2. Modified n-gram Precision计算（也即是 Pn P n ） def _modified_precision(candidate, references, n): counts = Counter(ngrams(candidate, n)) if not counts: return 0 max_counts = {} for reference in references: reference_counts = Counter(ngrams(reference, n)) for ngram in counts: max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram]) clipped_counts = dict((ngram, min(count, max_counts[ngram])) for ngram, count in counts.items()) return sum(clipped_counts.values()) / sum(counts.values())

我们这里 n n 取值为4，也就是从1-gram计算到4-gram。

Modified 1-gram precision：

首先统计候选译文里每个词出现的次数，然后统计每个词在参考译文中出现的次数，Max表示3个参考译文中的最大值，Min表示候选译文和Max两个的最小值。

词候选译文参考译文1 参考译文2 参考译文3 Max Min the 3 1 4 4 4 3 obeys 1 0 0 0 0 0 a 1 1 0 0 1 1 which 1 0 1 0 1 1 ensures 1 1 0 0 1 1 guide 1 1 0 1 1 1 always 1 0 1 1 1 1 is 1 1 1 1 1 1 of 1 0 1 1 1 1 to 1 1 0 1 1 1 commands 1 1 0 0 1 1 that 1 2 0 0 2 1 It 1 1 1 1 1 1 action 1 1 0 0 1 1 party 1 0 0 1 1 1 military 1 1 1 0 1 1

然后将每个词的Min值相加，将候选译文每个词出现的次数相加，然后两值相除即得P1=3+0+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+13+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1=0.95P1=3+0+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+13+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1=0.95。

类似可得：

Modified 2-gram precision：词候选译文参考译文1参考译文2参考译文3MaxMinensures that110011guide to110011which ensures100000obeys the100000commands of100000that the110011a guide110011of the101111always obeys100000the commands100000to action110011the party100111is a110011action which100000It is111111military always100000the military111011

P2=1017=0.588235294 P 2 = 10 17 = 0.588235294

Modified 3-gram precision：词候选译文参考译文1参考译文2参考译文3MaxMinensures that the110011which ensures that100000action which ensures100000a guide to110011military always obeys100000the commands of100000commands of the100000to action which100000the military always100000obeys the commands100000It is a110011of the party100111is a guide110011that the military110011always obeys the100000guide to action110011

P3=716=0.4375 P 3 = 7 16 = 0.4375

Modified 4-gram precision：词候选译文参考译文1参考译文2参考译文3MaxMinto action which ensures100000action which ensures that100000guide to action which100000obeys the commands of100000which ensures that the100000commands of the party100000ensures that the military110011a guide to action110011always obeys the commands100000that the military always100000the commands of the100000the military always obeys100000military always obeys the100000is a guide to110011It is a guide110011

P4=415=0.266666667 P 4 = 4 15 = 0.266666667

然后我们取 w1=w2=w3=w4=0.25 w 1 = w 2 = w 3 = w 4 = 0.25 ，也就是Uniform Weights。

所以：

∑Ni=1wnlogPn=0.25∗logP1+0.25∗logP2+0.25∗logP3+0.25∗logP4=−0.684055269517 ∑ i = 1 N w n log ⁡ P n = 0.25 ∗ log ⁡ P 1 + 0.25 ∗ log ⁡ P 2 + 0.25 ∗ log ⁡ P 3 + 0.25 ∗ log ⁡ P 4 = − 0.684055269517

3. Brevity Penalty 计算 def _brevity_penalty(candidate, references): c = len(candidate) ref_lens = (len(reference) for reference in references) #这里有个知识点是Python中元组是可以比较的，如(0,1)>(1,0)返回False，这里利用元组比较实现了选取参考翻译中长度最接近候选翻译的句子，当最接近的参考翻译有多个时，选取最短的。例如候选翻译长度是10，两个参考翻译长度分别为9和11，则r=9. r = min(ref_lens, key=lambda ref_len: (abs(ref_len - c), ref_len)) print 'r:',r if c > r: return 1 else: return math.exp(1 - r / c)

下面计算BP（Brevity Penalty），翻译过来就是“过短惩罚”。由BP的公式可知取值范围是(0,1]，候选句子越短，越接近0。

候选翻译句子长度为18，参考翻译分别为：16，18，16。所以 c=18 c = 18 ， r=18 r = 18 （参考翻译中选取长度最接近候选翻译的作为 r r ）

所以BP=e0=1BP=e0=1

4. 整合

最终 BLEU=1⋅exp(−0.684055269517)=0.504566684006 B L E U = 1 ⋅ e x p ( − 0.684055269517 ) = 0.504566684006 。

BLEU的取值范围是[0,1]，0最差，1最好。

通过计算过程，我们可以看到，BLEU值其实也就是“改进版的n-gram”加上“过短惩罚因子”。

【本文地址】

机器翻译评价指标之BLEU详细计算过程

机器翻译评价指标之BLEU详细计算过程

今日新闻

推荐新闻