Computing inter-annotator agreement measures on a manually annotated corpus is necessary to evaluate the reliability of its annotation. However, the interpretation of the obtained results is recognized as highly arbitrary. We describe in this article a method and a tool that we developed which “shuffles ” a reference annotation according to different error paradigms, thereby creating artificial annotations with controlled errors. Agreement measures are computed on these corpora, and the obtained results are used to model the behavior of these measures and understand their actual meaning
Labelling data is one of the most fundamental activities in science, and has underpinned practice, p...
Large-scale annotation efforts typically involve several experts who may disagree with each other. W...
International audienceAgreement measures have been widely used in Computational Linguistics for more...
International audienceComputing inter-annotator agreement measures on a manually annotated corpus is...
The usual practice in assessing whether a multimodal annotated corpus is fit for purpose is to calcu...
In the TDB[1]-like corpora annotation efforts, which are constructed by the intuitions of the annota...
Researchers who make use of multimodal annotated corpora are always presented with something of a di...
International audienceReference annotated (or gold-standard) datasets are required for various commo...
Standard agreement measures for interannota-tor reliability are neither necessary nor suffi-cient to...
International audienceIn this abstract we present a methodology to improve Argument annotation guide...
National audienceBuilding reference corpora makes it necessary to align annotations and to measure a...
National audienceA lot of data is produced by NLP (automatic systems) and for NLP (reference corpus,...
This work describes an analysis of inter-annotator disagreements in human evaluation of machine tran...
International audienceInter-coders agreement measures are used to assess the reliability of annotate...
Many interesting phenomena in conversation can only be annotated as a subjective task, requiring int...
Labelling data is one of the most fundamental activities in science, and has underpinned practice, p...
Large-scale annotation efforts typically involve several experts who may disagree with each other. W...
International audienceAgreement measures have been widely used in Computational Linguistics for more...
International audienceComputing inter-annotator agreement measures on a manually annotated corpus is...
The usual practice in assessing whether a multimodal annotated corpus is fit for purpose is to calcu...
In the TDB[1]-like corpora annotation efforts, which are constructed by the intuitions of the annota...
Researchers who make use of multimodal annotated corpora are always presented with something of a di...
International audienceReference annotated (or gold-standard) datasets are required for various commo...
Standard agreement measures for interannota-tor reliability are neither necessary nor suffi-cient to...
International audienceIn this abstract we present a methodology to improve Argument annotation guide...
National audienceBuilding reference corpora makes it necessary to align annotations and to measure a...
National audienceA lot of data is produced by NLP (automatic systems) and for NLP (reference corpus,...
This work describes an analysis of inter-annotator disagreements in human evaluation of machine tran...
International audienceInter-coders agreement measures are used to assess the reliability of annotate...
Many interesting phenomena in conversation can only be annotated as a subjective task, requiring int...
Labelling data is one of the most fundamental activities in science, and has underpinned practice, p...
Large-scale annotation efforts typically involve several experts who may disagree with each other. W...
International audienceAgreement measures have been widely used in Computational Linguistics for more...