Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampli...
Natural languages are known for their expressive richness. Many sentences can be used to represent t...
Word alignment is to estimate a lexical translation probability p(e|f), or to estimate the correspon...
We show that label noise exists in adversarial training. Such label noise is due to the mismatch bet...
Noisy labels in training data present a challenging issue in classification tasks, misleading a mode...
We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
This thesis focuses on the aspect of label noise for real-life datasets. Due to the upcoming growing...
For high-resource languages like English, text classification is a well-studied task. The performanc...
Distant and weak supervision allow to obtain large amounts of labeled training data quickly and chea...
Expressive richness in natural languages presents a significant challenge for statistical language m...
Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled...
© 2018 Curran Associates Inc.All rights reserved. It is important to learn various types of classifi...
Paraphrase Generation is one of the most important and challenging tasks in the field of Natural Lan...
Noise is inherent in real world datasets and modeling noise is critical during training, as it is ef...
Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model pre...
Natural languages are known for their expressive richness. Many sentences can be used to represent t...
Word alignment is to estimate a lexical translation probability p(e|f), or to estimate the correspon...
We show that label noise exists in adversarial training. Such label noise is due to the mismatch bet...
Noisy labels in training data present a challenging issue in classification tasks, misleading a mode...
We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
This thesis focuses on the aspect of label noise for real-life datasets. Due to the upcoming growing...
For high-resource languages like English, text classification is a well-studied task. The performanc...
Distant and weak supervision allow to obtain large amounts of labeled training data quickly and chea...
Expressive richness in natural languages presents a significant challenge for statistical language m...
Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled...
© 2018 Curran Associates Inc.All rights reserved. It is important to learn various types of classifi...
Paraphrase Generation is one of the most important and challenging tasks in the field of Natural Lan...
Noise is inherent in real world datasets and modeling noise is critical during training, as it is ef...
Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model pre...
Natural languages are known for their expressive richness. Many sentences can be used to represent t...
Word alignment is to estimate a lexical translation probability p(e|f), or to estimate the correspon...
We show that label noise exists in adversarial training. Such label noise is due to the mismatch bet...