This work describes an analysis of inter-annotator disagreements in human evaluation of machine translation output. The errors in the analysed texts were marked by multiple annotators under guidance of different quality criteria: adequacy, comprehension, and an unspecified generic mixture of adequacy and fluency. Our results show that different criteria result in different disagreements, and indicate that a clear definition of quality criterion can improve the inter-annotator agreement. Furthermore, our results show that for certain linguistic phenomena which are not limited to one or two words (such as word ambiguity or gender) but span over several words or even entire phrases (such as negation or relative clause), disagreements do not ne...
Document-level human evaluation of machine translation (MT) has been raising interest in the communi...
In this paper we report our reproduction study of the Croatian part of an annotation-based human eva...
When evaluating machine translation outputs, linguistics is usually taken into account implicitly. A...
This work describes an analysis of inter-annotator disagreements in human evaluation of machine tran...
Document-level evaluation of machine translation has raised interest in the community especially sin...
Error analysis is a means to assess machine translation output in qualitative terms, which can be us...
We propose facilitating the error annotation task of translation quality assessment by introducing a...
This work proposes a new method for manual evaluation of Machine Translation (MT) output based on ma...
Recently, document-level (doc-level) human evaluation of machine translation (MT) has raised intere...
International audienceComputing inter-annotator agreement measures on a manually annotated corpus is...
This work examines different ways of aggregating scores for error annotation in MT outputs: raw erro...
In the translation industry, human translations are assessed by comparison with the source texts. In...
This work describes analysis of nature and causes of MT errors observed by different evaluators unde...
Researchers who make use of multimodal annotated corpora are always presented with something of a di...
Quality Estimation (QE) and error analysis of Machine Translation (MT) output remain active areas in...
Document-level human evaluation of machine translation (MT) has been raising interest in the communi...
In this paper we report our reproduction study of the Croatian part of an annotation-based human eva...
When evaluating machine translation outputs, linguistics is usually taken into account implicitly. A...
This work describes an analysis of inter-annotator disagreements in human evaluation of machine tran...
Document-level evaluation of machine translation has raised interest in the community especially sin...
Error analysis is a means to assess machine translation output in qualitative terms, which can be us...
We propose facilitating the error annotation task of translation quality assessment by introducing a...
This work proposes a new method for manual evaluation of Machine Translation (MT) output based on ma...
Recently, document-level (doc-level) human evaluation of machine translation (MT) has raised intere...
International audienceComputing inter-annotator agreement measures on a manually annotated corpus is...
This work examines different ways of aggregating scores for error annotation in MT outputs: raw erro...
In the translation industry, human translations are assessed by comparison with the source texts. In...
This work describes analysis of nature and causes of MT errors observed by different evaluators unde...
Researchers who make use of multimodal annotated corpora are always presented with something of a di...
Quality Estimation (QE) and error analysis of Machine Translation (MT) output remain active areas in...
Document-level human evaluation of machine translation (MT) has been raising interest in the communi...
In this paper we report our reproduction study of the Croatian part of an annotation-based human eva...
When evaluating machine translation outputs, linguistics is usually taken into account implicitly. A...