Thesis (Master's)--University of Washington, 2021This thesis compares three methods for identifying mislabeled examples in datasets: Dataset Cartography (Swayamdipta et al. [2020]), Cleanlab, (Northcutt et al. [2021b]), and Ensem- bling (Brodley and Friedl [1999], Reiss et al. [2020]). Mislabeled examples in the training data of a dataset deteriorate the learning signal that models can use for the task, and mis- labeled data in the test split prevent accurate assessment of a model’s performance, so it is useful to have methods to identify and correct those labels. In order to compare the methods as directly as possible, we use the Multi-Genre Natural Language Inference corpus (MNLI) as the dataset that all methods will inspect for mislabele...
In multi-label classification, each example in a dataset may be annotated as belonging to one or mor...
Sample mislabeling or incorrect annotation has been a long-standing problem in biomedical research a...
Mislabeled examples affect the performance of supervised learning algorithms. Two novel approaches t...
A key requirement for supervised machine learning is labeled training data, which is created by anno...
The amount of data keeps growing thus making the handling of all data to anextensive task. Most data...
A lot of data sets, gathered for instance during user experiments, are contaminated with noise. Some...
This paper presents a new approach to identifying and eliminating mislabeled training instances for ...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
Data quality affects machine learning (ML) model performances, and data scientists spend considerabl...
International audienceThis paper focuses on the detection of likely mislabeled instances in a learni...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
This paper presents a new approach to identifying and eliminating mislabeled training samples. The g...
Deep neural networks that dominate NLP rely on an immense amount of parameters and require large tex...
When applying supervised machine learning algorithms to classification, the classical goal is to rec...
Label quality is an important and common problem in contemporary supervised machine learning researc...
In multi-label classification, each example in a dataset may be annotated as belonging to one or mor...
Sample mislabeling or incorrect annotation has been a long-standing problem in biomedical research a...
Mislabeled examples affect the performance of supervised learning algorithms. Two novel approaches t...
A key requirement for supervised machine learning is labeled training data, which is created by anno...
The amount of data keeps growing thus making the handling of all data to anextensive task. Most data...
A lot of data sets, gathered for instance during user experiments, are contaminated with noise. Some...
This paper presents a new approach to identifying and eliminating mislabeled training instances for ...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
Data quality affects machine learning (ML) model performances, and data scientists spend considerabl...
International audienceThis paper focuses on the detection of likely mislabeled instances in a learni...
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that cou...
This paper presents a new approach to identifying and eliminating mislabeled training samples. The g...
Deep neural networks that dominate NLP rely on an immense amount of parameters and require large tex...
When applying supervised machine learning algorithms to classification, the classical goal is to rec...
Label quality is an important and common problem in contemporary supervised machine learning researc...
In multi-label classification, each example in a dataset may be annotated as belonging to one or mor...
Sample mislabeling or incorrect annotation has been a long-standing problem in biomedical research a...
Mislabeled examples affect the performance of supervised learning algorithms. Two novel approaches t...