Evaluating the output quality of machine translation system requires test data and quality metrics to be applied. Based on the results of the French MT evaluation campaign CESTA, this paper studies the statistical reliability of the scores depending on the amount of test data used to obtain them. Bootstrapping is used to compute standard deviation of scores assigned by human judges (mainly of adequacy) as well as of five automatic metrics. The reliability of the scores is measured using two formal criteria, and the minimal number of documents or segments needed to reach reliable scores is estimated. This number does not depend on the exact subset of documents that is used
This article outlines the evaluation protocol and provides the main results of the French Evaluation...
This article outlines the evaluation protocol and provides the main results of the French Evaluation...
Evaluation of machine translation (MT) is a difficult task, both for humans, and using automatic met...
This paper aims at providing a reliable method for measuring the correlations between different scor...
This paper aims at providing a reliable method for measuring the correlations between different scor...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
Any scientific endeavour must be evaluated in order to assess its correctness. In many applied scien...
MT systems are traditionally evaluated with different criteria, such as adequacy and fluency. Automa...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Most evaluation metrics for machine translation (MT) require reference translations for each sentenc...
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, ma...
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, ma...
This article outlines the evaluation protocol and provides the main results of the French Evaluation...
This article outlines the evaluation protocol and provides the main results of the French Evaluation...
Evaluation of machine translation (MT) is a difficult task, both for humans, and using automatic met...
This paper aims at providing a reliable method for measuring the correlations between different scor...
This paper aims at providing a reliable method for measuring the correlations between different scor...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
Any scientific endeavour must be evaluated in order to assess its correctness. In many applied scien...
MT systems are traditionally evaluated with different criteria, such as adequacy and fluency. Automa...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no sing...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
Most evaluation metrics for machine translation (MT) require reference translations for each sentenc...
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, ma...
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, ma...
This article outlines the evaluation protocol and provides the main results of the French Evaluation...
This article outlines the evaluation protocol and provides the main results of the French Evaluation...
Evaluation of machine translation (MT) is a difficult task, both for humans, and using automatic met...