Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new metrics devised every year. Evaluation metrics are generally benchmarked against manual assessment of translation quality, with performance measured in terms of overall correlation with human scores. Much work has been dedicated to the improvement of evaluation metrics to achieve a higher correlation with human judgments. However, little insight has been provided regarding the weaknesses and strengths of existing approaches and their behavior in different settings. In this work we conduct a broad meta-evaluation study of the performance of a wide range of evaluation metrics focusing on three major aspects. First, we analyze the performance of...
Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the corre...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Translations generated by current statistical systems often have a large variance, in terms of their...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic metrics are fundamental for the development and evaluation of machine translation systems....
We describe a large-scale investigation of the correlation between human judgments of machine transl...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the corre...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Translations generated by current statistical systems often have a large variance, in terms of their...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a s...
Automatic metrics are fundamental for the development and evaluation of machine translation systems....
We describe a large-scale investigation of the correlation between human judgments of machine transl...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the corre...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...