Grounded text generation systems often generate text that contains factual inconsistencies, hindering their real-world applicability. Automatic factual consistency evaluation may help alleviate this limitation by accelerating evaluation cycles, filtering inconsistent outputs and augmenting training data. While attracting increasing attention, such evaluation metrics are usually developed and evaluated in silo for a single task or dataset, slowing their adoption. Moreover, previous meta-evaluation protocols focused on system-level correlations with human annotations, which leave the example-level accuracy of such metrics unclear. In this work, we introduce TRUE: a comprehensive survey and assessment of factual consistency metrics on a standa...
A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensiv...
Causal inference studies using textual social media data can provide actionable insights on human be...
Despite the recent advances in abstractive text summarization, current summarization models still su...
Semantic consistency of a language model is broadly defined as the model's ability to produce semant...
Recently neural response generation models have leveraged large pre-trained transformer models and k...
Automatic text summarization has achieved remarkable success with the development of deep neural net...
Despite the recent progress in language generation models, their outputs may not always meet user ex...
We propose a benchmark to measure whether a language model is truthful in generating answers to ques...
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread ...
Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is l...
Pretrained language models (PLMs) based knowledge-grounded dialogue systems are prone to generate re...
Recent progress in pre-trained language models led to systems that are able to generate text of an i...
Maintaining factual consistency is a critical issue in abstractive text summarisation, however, it c...
LLMs (large language models) such as ChatGPT have shown remarkable language understanding and genera...
Large Language Models (LLMs), such as ChatGPT/GPT-4, have garnered widespread attention owing to the...
A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensiv...
Causal inference studies using textual social media data can provide actionable insights on human be...
Despite the recent advances in abstractive text summarization, current summarization models still su...
Semantic consistency of a language model is broadly defined as the model's ability to produce semant...
Recently neural response generation models have leveraged large pre-trained transformer models and k...
Automatic text summarization has achieved remarkable success with the development of deep neural net...
Despite the recent progress in language generation models, their outputs may not always meet user ex...
We propose a benchmark to measure whether a language model is truthful in generating answers to ques...
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread ...
Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is l...
Pretrained language models (PLMs) based knowledge-grounded dialogue systems are prone to generate re...
Recent progress in pre-trained language models led to systems that are able to generate text of an i...
Maintaining factual consistency is a critical issue in abstractive text summarisation, however, it c...
LLMs (large language models) such as ChatGPT have shown remarkable language understanding and genera...
Large Language Models (LLMs), such as ChatGPT/GPT-4, have garnered widespread attention owing to the...
A major challenge in the field of Text Generation is evaluation: Human evaluations are cost-intensiv...
Causal inference studies using textual social media data can provide actionable insights on human be...
Despite the recent advances in abstractive text summarization, current summarization models still su...