Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which often leads to falsely confident conclusions about a metric’s efficacy. Finally, we turn to pairwise system ranking, developing a method for thresholding performance improvement under an automatic metric against human judgements, which allows quantification of type I versus type II errors incurred, i.e., insignificant human differences in system qualit...
Machine translation evaluation is a very important activity in machine translation development. Auto...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
The success of Transformer architecture has seen increased interest in machine translation (MT). The...
We describe a large-scale investigation of the correlation between human judgments of machine transl...
Includes bibliographical references (pages 45-46).Statistical Machine Translation became the dominan...
We describe a large-scale investigation of the correlation between human judgments of machine transl...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
We present a comparison of automatic metrics against human evaluations of translation quality in sev...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Any scientific endeavour must be evaluated in order to assess its correctness. In many applied scien...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Machine translation evaluation is a very important activity in machine translation development. Auto...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
The success of Transformer architecture has seen increased interest in machine translation (MT). The...
We describe a large-scale investigation of the correlation between human judgments of machine transl...
Includes bibliographical references (pages 45-46).Statistical Machine Translation became the dominan...
We describe a large-scale investigation of the correlation between human judgments of machine transl...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
We present a comparison of automatic metrics against human evaluations of translation quality in sev...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Human-targeted metrics provide a compromise between human evaluation of machine translation, where h...
Any scientific endeavour must be evaluated in order to assess its correctness. In many applied scien...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Machine translation evaluation is a very important activity in machine translation development. Auto...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
The success of Transformer architecture has seen increased interest in machine translation (MT). The...