Taking MT evaluation metrics to extremes : beyond correlation with human judgments

Fomicheva, M.
Specia, L.

Open PDF

Open link

Publication date

October 2019

DOI

10.1162/coli_a_00356

Publisher

MIT Press - Journals

Abstract

Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new metrics devised every year. Evaluation metrics are generally benchmarked against manual assessment of translation quality, with performance measured in terms of overall correlation with human scores. Much work has been dedicated to the improvement of evaluation metrics to achieve a higher correlation with human judgments. However, little insight has been provided regarding the weaknesses and strengths of existing approaches and their behavior in different settings. In this work we conduct a broad meta-evaluation study of the performance of a wide range of evaluation metrics focusing on three major aspects. First, we analyze the performance of...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Taking MT evaluation metrics to extremes : beyond correlation with human judgments

Abstract

Extracted data

Taking MT evaluation metrics to extremes : beyond correlation with human judgments

Abstract

Extracted data

Related items

Related items