This paper proposes a new automatic machine translation evaluation metric: AMBER, which is based on the metric BLEU but incorporates recall, extra penalties, and some text processing variants. There is very little linguistic information in AMBER. We evaluate its system-level correlation and sentence-level consistency scores with human rankings from the WMT shared evaluation task; AMBER achieves state-of-the-art performance.Peer reviewed: NoNRC publication: Ye
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
The gold standard for measuring machine translation quality is the rating of candidate sentences by ...
This paper describes the application of discrim-inative reranking techniques to the problem of machi...
A recent paper described a new machine translation evaluation metric, AMBER. This paper describes tw...
A recent paper described a new machine translation evaluation metric, AMBER. This paper describes tw...
This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08...
This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08...
Automatic metrics are fundamental for the development and evaluation of machine translation systems....
Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but t...
Comparisons of automatic evaluation metrics for machine translation are usually conducted on corpus ...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU or NIST, are now w...
We argue that the machine translation community is overly reliant on the Bleu machine translation ev...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have...
Many machine translation (MT) evaluation metrics have been shown to correlate better with human judg...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
The gold standard for measuring machine translation quality is the rating of candidate sentences by ...
This paper describes the application of discrim-inative reranking techniques to the problem of machi...
A recent paper described a new machine translation evaluation metric, AMBER. This paper describes tw...
A recent paper described a new machine translation evaluation metric, AMBER. This paper describes tw...
This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08...
This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08...
Automatic metrics are fundamental for the development and evaluation of machine translation systems....
Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but t...
Comparisons of automatic evaluation metrics for machine translation are usually conducted on corpus ...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU or NIST, are now w...
We argue that the machine translation community is overly reliant on the Bleu machine translation ev...
Automatic evaluation metrics are fast and cost-effective measurements of the quality of a Machine Tr...
Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have...
Many machine translation (MT) evaluation metrics have been shown to correlate better with human judg...
Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST...
The gold standard for measuring machine translation quality is the rating of candidate sentences by ...
This paper describes the application of discrim-inative reranking techniques to the problem of machi...