We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.Peer reviewe
One of the advantages of deep grammars, such as those based on HPSG, is that they can be used for ge...
With the fast-growing popularity of current large pre-trained language models (LLMs), it is necessar...
In this paper, we present the results of two re- production studies for the human evaluation origina...
We observe a severe under-reporting of the different kinds of errors that Natural Language Generatio...
Earlier research has shown that few studies in Natural Language Generation (NLG) evaluate their syst...
ABSTRACT Many evaluation issues for grammatical error detection have previously been overlooked, mak...
While automatically computing numerical scores remains the dominant paradigm in NLP system evaluatio...
Recently, there has been an increased interest in demographically grounded bias in natural language ...
Natural language generation (nlg) systems are computer software systems that pro-duce texts in Engli...
International audienceThis paper presents a "didactic triangulation" strategy to cope with the probl...
This paper explores the issue of automatically generated ungrammatical data and its use in error det...
Context: Conducting experiments is central to research machine learning research to benchmark, evalu...
Deep neural networks that dominate NLP rely on an immense amount of parameters and require large tex...
This study evaluates a natural language generation system that creates literacy assessment reports i...
Error Correction has applications in a variety of domains given the prevalence of errors of various ...
One of the advantages of deep grammars, such as those based on HPSG, is that they can be used for ge...
With the fast-growing popularity of current large pre-trained language models (LLMs), it is necessar...
In this paper, we present the results of two re- production studies for the human evaluation origina...
We observe a severe under-reporting of the different kinds of errors that Natural Language Generatio...
Earlier research has shown that few studies in Natural Language Generation (NLG) evaluate their syst...
ABSTRACT Many evaluation issues for grammatical error detection have previously been overlooked, mak...
While automatically computing numerical scores remains the dominant paradigm in NLP system evaluatio...
Recently, there has been an increased interest in demographically grounded bias in natural language ...
Natural language generation (nlg) systems are computer software systems that pro-duce texts in Engli...
International audienceThis paper presents a "didactic triangulation" strategy to cope with the probl...
This paper explores the issue of automatically generated ungrammatical data and its use in error det...
Context: Conducting experiments is central to research machine learning research to benchmark, evalu...
Deep neural networks that dominate NLP rely on an immense amount of parameters and require large tex...
This study evaluates a natural language generation system that creates literacy assessment reports i...
Error Correction has applications in a variety of domains given the prevalence of errors of various ...
One of the advantages of deep grammars, such as those based on HPSG, is that they can be used for ge...
With the fast-growing popularity of current large pre-trained language models (LLMs), it is necessar...
In this paper, we present the results of two re- production studies for the human evaluation origina...