Data Augmentation approaches often use Language Models, pretrained on large quantities of unlabeled generic data, to conditionally generate examples. However, the generated data can be of subpar quality and struggle to maintain the same characteristics as the original dataset. To this end, we propose a Data Augmentation method for low-resource and imbalanced datasets, by aligning Language Models to in-domain data prior to generating synthetic examples. In particular, we propose the alignment of existing generic models in task-specific unlabeled data, in order to create better synthetic examples and boost performance in Text Classification tasks. We evaluate our approach on three diverse and well-known Language Models, four datasets, and two...
Data annotation is the process of labeling text, images, or other types of content for machine learn...
Many machine learning classification algorithms assume that the target classes share similar prior p...
Data augmentation techniques are widely used for enhancing the performance of machine learning model...
Based on recent advances in natural language modeling and those in text generation capabilities, we ...
In many cases of machine learning, research suggests that the development of training data might hav...
This paper focuses on the insensitivity of existing word alignment models to domain differences, whi...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
Within a situation where Semi-Supervised Learning (SSL) is available to exploit unlabeled data, this...
Data augmentation, the artificial creation of training data for machine learning by transformations,...
Learning from imbalanced data has emerged as a new challenge to the machine learning (ML), data mini...
Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to i...
Text has traditionally been used to train automated classifiers for a multitude of purposes, such as...
In recent years, the community of natural language processing (NLP) has seen amazing progress in the...
The problem of imbalanced data has a heavy impact on the performance of learning models. In the case...
Recent advances in the field of natural language processing were achieved with deep learning models....
Data annotation is the process of labeling text, images, or other types of content for machine learn...
Many machine learning classification algorithms assume that the target classes share similar prior p...
Data augmentation techniques are widely used for enhancing the performance of machine learning model...
Based on recent advances in natural language modeling and those in text generation capabilities, we ...
In many cases of machine learning, research suggests that the development of training data might hav...
This paper focuses on the insensitivity of existing word alignment models to domain differences, whi...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
Within a situation where Semi-Supervised Learning (SSL) is available to exploit unlabeled data, this...
Data augmentation, the artificial creation of training data for machine learning by transformations,...
Learning from imbalanced data has emerged as a new challenge to the machine learning (ML), data mini...
Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to i...
Text has traditionally been used to train automated classifiers for a multitude of purposes, such as...
In recent years, the community of natural language processing (NLP) has seen amazing progress in the...
The problem of imbalanced data has a heavy impact on the performance of learning models. In the case...
Recent advances in the field of natural language processing were achieved with deep learning models....
Data annotation is the process of labeling text, images, or other types of content for machine learn...
Many machine learning classification algorithms assume that the target classes share similar prior p...
Data augmentation techniques are widely used for enhancing the performance of machine learning model...