This paper aims at providing a reliable method for measuring the correlations between different scores of evaluation metrics applied to machine translated texts. A series of examples from recent MT evaluation experiments are first discussed, including results and data from the recent French MT evaluation campaign, CESTA, which is used here. To compute correlation, a set of 1,500 samples for each system and each evaluation metric are created using bootstrapping. Correlations between metrics, both automatic and applied by human judges, are then computed over these samples. The results confirm the previously observed correlations between some automatic metrics, but also indicate a lack of correlation between human and automatic metrics on the ...
This paper presents the results of the WMT17 Metrics Shared Task. We asked participants of this task...
Evaluation measures for machine trans-lation depend on several common meth-ods, such as preprocessin...
We propose three new features for MT evaluation: source-sentence constrained n-gram precision, sourc...
This paper aims at providing a reliable method for measuring the correlations between different scor...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
This paper applies nonparametric statistical techniques to Machine Translation (MT) Eval-uation usin...
Evaluating the output quality of machine translation system requires test data and quality metrics t...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexi...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Fle...
State-of-the-art MT systems use so called log-linear model, which combines several components to pre...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
This paper presents the results of the WMT17 Metrics Shared Task. We asked participants of this task...
Evaluation measures for machine trans-lation depend on several common meth-ods, such as preprocessin...
We propose three new features for MT evaluation: source-sentence constrained n-gram precision, sourc...
This paper aims at providing a reliable method for measuring the correlations between different scor...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
This paper applies nonparametric statistical techniques to Machine Translation (MT) Eval-uation usin...
Evaluating the output quality of machine translation system requires test data and quality metrics t...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexi...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Fle...
State-of-the-art MT systems use so called log-linear model, which combines several components to pre...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
This paper presents the results of the WMT17 Metrics Shared Task. We asked participants of this task...
Evaluation measures for machine trans-lation depend on several common meth-ods, such as preprocessin...
We propose three new features for MT evaluation: source-sentence constrained n-gram precision, sourc...