This is the accompanying data for the paper "Analyzing Dataset Annotation Quality Management in the Wild". Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models and their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, bias or annotation artifacts. There exist best practices and guidelines regarding annotation projects. But to the best of our knowledge, no large-scale analysis has been performed as of yet on how quality management is actually conducted when creating natural language datasets and whether these recommendations are followed. Therefore, we fi...
Deep Learning, a growing sub-field of machine learning, has been applied with tremendous success in ...
Supervised approaches to NLP tasks rely on high-quality data annotations, which typically result fro...
The rapid accumulation of genome annotations, as well as their widespread reuse in clinical and scie...
This is the accompanying data for our paper "Annotation Error Detection: Analyzing the Past and Pres...
Abstract Advanced computer-assisted translation (CAT) tools include automatic quality estimation (QE...
International audienceReference annotated (or gold-standard) datasets are required for various commo...
Labelling data is one of the most fundamental activities in science, and has underpinned practice, p...
Context: - Machine learning is a part of artificial intelligence, this area is now continuously grow...
The analysis of crowdsourced annotations in natural language processing is concerned with identifyin...
Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge...
The usual practice in assessing whether a multimodal annotated corpus is fit for purpose is to calcu...
Computing inter-annotator agreement measures on a manually annotated corpus is necessary to evaluate...
Researchers who make use of multimodal annotated corpora are always presented with something of a di...
In the current post-truth era, online information is consistently under scrutiny with respect to its...
Prepared domain specific datasets plays an important role to supervised learning approaches. In this...
Deep Learning, a growing sub-field of machine learning, has been applied with tremendous success in ...
Supervised approaches to NLP tasks rely on high-quality data annotations, which typically result fro...
The rapid accumulation of genome annotations, as well as their widespread reuse in clinical and scie...
This is the accompanying data for our paper "Annotation Error Detection: Analyzing the Past and Pres...
Abstract Advanced computer-assisted translation (CAT) tools include automatic quality estimation (QE...
International audienceReference annotated (or gold-standard) datasets are required for various commo...
Labelling data is one of the most fundamental activities in science, and has underpinned practice, p...
Context: - Machine learning is a part of artificial intelligence, this area is now continuously grow...
The analysis of crowdsourced annotations in natural language processing is concerned with identifyin...
Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge...
The usual practice in assessing whether a multimodal annotated corpus is fit for purpose is to calcu...
Computing inter-annotator agreement measures on a manually annotated corpus is necessary to evaluate...
Researchers who make use of multimodal annotated corpora are always presented with something of a di...
In the current post-truth era, online information is consistently under scrutiny with respect to its...
Prepared domain specific datasets plays an important role to supervised learning approaches. In this...
Deep Learning, a growing sub-field of machine learning, has been applied with tremendous success in ...
Supervised approaches to NLP tasks rely on high-quality data annotations, which typically result fro...
The rapid accumulation of genome annotations, as well as their widespread reuse in clinical and scie...