We propose three new features for MT evaluation: source-sentence constrained n-gram precision, source-sentence re-ordering metrics, and discriminative un-igram precision, as well as a method of learning linear feature weights to directly maximize correlation with human judg-ments. By aligning both the hypothe-sis and the reference with the source-language sentence, we achieve better cor-relation with human judgments than pre-viously proposed metrics. We further improve performance by combining indi-vidual evaluation metrics using maximum correlation training, which is shown to be better than the classification-based frame-work.
As the performance of machine translation has improved, the need for a human-like automatic evaluati...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Recent studies suggest that machine learn-ing can be applied to develop good auto-matic evaluation m...
We propose three new features for MT evaluation: source-sentence constrained n-gram precision, sourc...
State-of-the-art MT systems use so called log-linear model, which combines several components to pre...
Most evaluation metrics for machine translation (MT) require reference translations for each sentenc...
The problem of evaluating machine translation (MT) systems is more challenging than it may first app...
Discriminative training, a.k.a. tuning, is an important part of Statistical Machine Translation. Thi...
Machine Translation (MT) systems are more complex to test than they appear to be at first, since man...
Recently novel MT evaluation metrics have been presented which go beyond pure string matching, and w...
MT evaluation metrics are tested for correlation with human judgments either at the sentence- or the...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexi...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
This paper aims at providing a reliable method for measuring the correlations between different scor...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Fle...
As the performance of machine translation has improved, the need for a human-like automatic evaluati...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Recent studies suggest that machine learn-ing can be applied to develop good auto-matic evaluation m...
We propose three new features for MT evaluation: source-sentence constrained n-gram precision, sourc...
State-of-the-art MT systems use so called log-linear model, which combines several components to pre...
Most evaluation metrics for machine translation (MT) require reference translations for each sentenc...
The problem of evaluating machine translation (MT) systems is more challenging than it may first app...
Discriminative training, a.k.a. tuning, is an important part of Statistical Machine Translation. Thi...
Machine Translation (MT) systems are more complex to test than they appear to be at first, since man...
Recently novel MT evaluation metrics have been presented which go beyond pure string matching, and w...
MT evaluation metrics are tested for correlation with human judgments either at the sentence- or the...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexi...
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new ...
This paper aims at providing a reliable method for measuring the correlations between different scor...
We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Fle...
As the performance of machine translation has improved, the need for a human-like automatic evaluati...
Automatic Machine Translation metrics, such as BLEU, are widely used in empirical evaluation as a su...
Recent studies suggest that machine learn-ing can be applied to develop good auto-matic evaluation m...