NLG researchers often use uncontrolled corpora to train and evaluate their systems, using textual similarity metrics, such as BLEU. This position paper argues in favour of two alternative evaluation strategies, using grammars or rule-based systems. These strategies are particularly useful to identify the strengths and weaknesses of different systems. We contrast our proposals with the (extended) WebNLG dataset, which is revealed to have a skewed distribution of predicates. We predict that this distribution affects the quality of the predictions for systems trained on this data. However, this hypothesis can only be thoroughly tested (without any confounds) once we are able to systematically manipulate the skewness of the data, using a rule-b...
In this article we investigate how (computational) grammar inference systems are evaluated and how t...
In this article we investigate how (computational) grammar inference systems are evaluated and how t...
The evaluation of grammar inference systems is clearly a non-trivial task, as it is possible to have...
We consider the evaluation problem in Natural Language Generation (NLG) and present results for eval...
We consider the evaluation problem in Natural Language Generation (NLG) and present results for eval...
In this position paper, we argue that a common task and corpus are not the only ways to evaluate Nat...
There is growing interest in using automatically computed corpus-based evaluation metrics to evaluat...
In recent years, a concern has grown within the NLG community about the comparability of systems and...
The availability of a huge mass of textual data in electronic format has increased the need for fast...
The availability of a huge mass of textual data in electronic format has increased the need for fast...
Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna ste...
International audienceThe availability of a huge mass of textual data in electronic format has incre...
Automatic methods and metrics that assess various quality criteria of automatically generated texts ...
Automated evaluation of open domain natural language generation (NLG) models remains a challenge and...
The concept of automated grammar evaluation of natural language texts is one that has attracted sign...
In this article we investigate how (computational) grammar inference systems are evaluated and how t...
In this article we investigate how (computational) grammar inference systems are evaluated and how t...
The evaluation of grammar inference systems is clearly a non-trivial task, as it is possible to have...
We consider the evaluation problem in Natural Language Generation (NLG) and present results for eval...
We consider the evaluation problem in Natural Language Generation (NLG) and present results for eval...
In this position paper, we argue that a common task and corpus are not the only ways to evaluate Nat...
There is growing interest in using automatically computed corpus-based evaluation metrics to evaluat...
In recent years, a concern has grown within the NLG community about the comparability of systems and...
The availability of a huge mass of textual data in electronic format has increased the need for fast...
The availability of a huge mass of textual data in electronic format has increased the need for fast...
Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna ste...
International audienceThe availability of a huge mass of textual data in electronic format has incre...
Automatic methods and metrics that assess various quality criteria of automatically generated texts ...
Automated evaluation of open domain natural language generation (NLG) models remains a challenge and...
The concept of automated grammar evaluation of natural language texts is one that has attracted sign...
In this article we investigate how (computational) grammar inference systems are evaluated and how t...
In this article we investigate how (computational) grammar inference systems are evaluated and how t...
The evaluation of grammar inference systems is clearly a non-trivial task, as it is possible to have...