Evaluating the quality of generated text is difficult, since traditional NLG evaluation metrics, focusing more on surface form than meaning, often fail to assign appropriate scores. This is especially problematic for AMR-to-text evaluation, given the abstract nature of AMR. Our work aims to support the development and improvement of NLG evaluation metrics that focus on meaning, by developing a dynamic CheckList for NLG metrics that is interpreted by being organized around meaning-relevant linguistic phenomena. Each test instance consists of a pair of sentences with their AMR graphs and a human-produced textual semantic similarity or relatedness score. Our CheckList facilitates comparative evaluation of metrics and reveals strengths and weak...
Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but t...
There is growing interest in using automatically computed corpus-based evaluation metrics to evaluat...
We explore efficient evaluation metrics for Natural Language Generation (NLG). To implement efficien...
Several metrics have been proposed for assessing the similarity of (abstract) meaning representation...
Metrics for graph-based meaning representations (e.g., Abstract Meaning Representation, AMR) can hel...
Automatic evaluation remains an open research question in Natural Language Generation. In the contex...
In the field of automatic text simplification, assessing whether or not the meaning of the original ...
NLG researchers often use uncontrolled corpora to train and evaluate their systems, using textual si...
Evaluating generated text received new attention with the introduction of model-based metrics in rec...
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribu...
Cramer I, Wandmacher T, Waltinger U. Exploring Resources for Lexical Chaining: A Comparison of Autom...
A metric for assessing the e~mplemlty of mmmntic (and pragmatic) analysis in natural language proces...
A metric for assessing the e~mplemlty of mmmntic (and pragmatic) analysis in natural language proces...
We discuss methodological choices in contrastive and diagnostic evaluation in meaning representation...
Quantifying semantic similarity between linguistic items lies at the core of many applications in Na...
Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but t...
There is growing interest in using automatically computed corpus-based evaluation metrics to evaluat...
We explore efficient evaluation metrics for Natural Language Generation (NLG). To implement efficien...
Several metrics have been proposed for assessing the similarity of (abstract) meaning representation...
Metrics for graph-based meaning representations (e.g., Abstract Meaning Representation, AMR) can hel...
Automatic evaluation remains an open research question in Natural Language Generation. In the contex...
In the field of automatic text simplification, assessing whether or not the meaning of the original ...
NLG researchers often use uncontrolled corpora to train and evaluate their systems, using textual si...
Evaluating generated text received new attention with the introduction of model-based metrics in rec...
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribu...
Cramer I, Wandmacher T, Waltinger U. Exploring Resources for Lexical Chaining: A Comparison of Autom...
A metric for assessing the e~mplemlty of mmmntic (and pragmatic) analysis in natural language proces...
A metric for assessing the e~mplemlty of mmmntic (and pragmatic) analysis in natural language proces...
We discuss methodological choices in contrastive and diagnostic evaluation in meaning representation...
Quantifying semantic similarity between linguistic items lies at the core of many applications in Na...
Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but t...
There is growing interest in using automatically computed corpus-based evaluation metrics to evaluat...
We explore efficient evaluation metrics for Natural Language Generation (NLG). To implement efficien...