We investigate how disagreement in natural language inference (NLI) annotation arises. We developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level classes. We found that some disagreements are due to uncertainty in the sentence meaning, others to annotator biases and task artifacts, leading to different interpretations of the label distribution. We explore two modeling approaches for detecting items with potential disagreement: a 4-way classification with a "Complicated" label in addition to the three standard NLI labels, and a multilabel classification approach. We found that the multilabel classification is more expressive and gives better recall of the possible interpretations in the data.Comment: accepted at...
| openaire: EC/H2020/101016775/EU//INTERVENEExperts and crowds can work together to generate high-qu...
We commonly use agreement measures to assess the utility of judgements made by human annotators in N...
This paper describes a methodology for supporting the task of annotating sentiment in natural langua...
We investigate how disagreement in natural language inference (NLI) annotation arises. We developed ...
Natural language inference (NLI) is the task of determining whether a piece of text is entailed, con...
In NLP annotation, it is common to have multiple annotators label the text and then obtain the groun...
Many tasks in Natural Language Processing (nlp) and Computer Vision (cv) offer evidence that humans ...
Supervised learning assumes that a ground truth label exists. However, the reliability of this groun...
For a highly subjective task such as recognising speaker intention and argumentation, the traditiona...
Many believe human-level natural language inference (NLI) has already been achieved. In reality, mod...
International audienceLinguistic annotation underlies many successful approaches in Natural Language...
This work describes an analysis of inter-annotator disagreements in human evaluation of machine tran...
Article presents a new benchmark for natural language inference in which negation plays a critical r...
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on rep...
As deep learning models become increasingly complex, practitioners are relying more on post hoc expl...
| openaire: EC/H2020/101016775/EU//INTERVENEExperts and crowds can work together to generate high-qu...
We commonly use agreement measures to assess the utility of judgements made by human annotators in N...
This paper describes a methodology for supporting the task of annotating sentiment in natural langua...
We investigate how disagreement in natural language inference (NLI) annotation arises. We developed ...
Natural language inference (NLI) is the task of determining whether a piece of text is entailed, con...
In NLP annotation, it is common to have multiple annotators label the text and then obtain the groun...
Many tasks in Natural Language Processing (nlp) and Computer Vision (cv) offer evidence that humans ...
Supervised learning assumes that a ground truth label exists. However, the reliability of this groun...
For a highly subjective task such as recognising speaker intention and argumentation, the traditiona...
Many believe human-level natural language inference (NLI) has already been achieved. In reality, mod...
International audienceLinguistic annotation underlies many successful approaches in Natural Language...
This work describes an analysis of inter-annotator disagreements in human evaluation of machine tran...
Article presents a new benchmark for natural language inference in which negation plays a critical r...
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on rep...
As deep learning models become increasingly complex, practitioners are relying more on post hoc expl...
| openaire: EC/H2020/101016775/EU//INTERVENEExperts and crowds can work together to generate high-qu...
We commonly use agreement measures to assess the utility of judgements made by human annotators in N...
This paper describes a methodology for supporting the task of annotating sentiment in natural langua...