Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Learning from texts has been widely adopted throughout industry and science. While state-of-the-art ...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
In many cases of machine learning, research suggests that the development of training data might hav...
Data Augmentation approaches often use Language Models, pretrained on large quantities of unlabeled ...
Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of the...
Within a situation where Semi-Supervised Learning (SSL) is available to exploit unlabeled data, this...
Despite much success, the effectiveness of deep learning models largely relies on the availability o...
As the web evolves even faster than expected, the exponential growth of data becomes overwhelming. T...
Current research state-of-the-art in automatic data-to-text generation, a major task in natural lang...
Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing word...
In Natural Language Processing (NLP), applications trained on downstream tasks for text classificati...
In recent years, the exponential growth of digital documents has been met by rapid progress in text ...
User feedback is essential for understanding user needs. In this paper, we use free-text obtained fr...
We introduce the problems of data-to-text generation and the current state of the art, i.e. pretrain...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Learning from texts has been widely adopted throughout industry and science. While state-of-the-art ...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
In many cases of machine learning, research suggests that the development of training data might hav...
Data Augmentation approaches often use Language Models, pretrained on large quantities of unlabeled ...
Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of the...
Within a situation where Semi-Supervised Learning (SSL) is available to exploit unlabeled data, this...
Despite much success, the effectiveness of deep learning models largely relies on the availability o...
As the web evolves even faster than expected, the exponential growth of data becomes overwhelming. T...
Current research state-of-the-art in automatic data-to-text generation, a major task in natural lang...
Pretraining deep neural networks to perform language modeling - that is, to reconstruct missing word...
In Natural Language Processing (NLP), applications trained on downstream tasks for text classificati...
In recent years, the exponential growth of digital documents has been met by rapid progress in text ...
User feedback is essential for understanding user needs. In this paper, we use free-text obtained fr...
We introduce the problems of data-to-text generation and the current state of the art, i.e. pretrain...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Learning from texts has been widely adopted throughout industry and science. While state-of-the-art ...
Data augmentation is widely used in text classification, especially in the low-resource regime where...