We present three large-scale experiments on binary text matching classification task both in Chinese and English to evaluate the effectiveness and generalizability of random text perturbations as a data augmentation approach for NLP. It is found that the augmentation can bring both negative and positive effects to the test set performance of three neural classification models, depending on whether the models train on enough original training examples. This remains true no matter whether five random text editing operations, used to augment text, are applied together or separately. Our study demonstrates with strong implication that the effectiveness of random text perturbations is task specific and not generally positive.Comment: 7 pages; 8 ...
Text classification is a fundamental Natural Language Processing task that has a wide variety of app...
Text simplification is a common task where the text is adapted to make it easier to understand. Simi...
We describe the work carried out by the DCU-ADAPT team on the Lexical Normalisation shared task at W...
Data augmentation is a widely used technique in machine learning to improve model performance. Howev...
To investigate the role of linguistic knowledge in data augmentation (DA) for Natural Language Proce...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
In low resource settings, data augmentation strategies are commonly leveraged to improve performance...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
A line of work has shown that natural text processing models are vulnerable to adversarial examples....
This paper introduces a new data augmentation method for neural machine translation that can enforce...
Thanks to increases in computing power and the growing availability of large datasets, neural netwo...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to i...
In many cases of machine learning, research suggests that the development of training data might hav...
Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and ...
Text classification is a fundamental Natural Language Processing task that has a wide variety of app...
Text simplification is a common task where the text is adapted to make it easier to understand. Simi...
We describe the work carried out by the DCU-ADAPT team on the Lexical Normalisation shared task at W...
Data augmentation is a widely used technique in machine learning to improve model performance. Howev...
To investigate the role of linguistic knowledge in data augmentation (DA) for Natural Language Proce...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
In low resource settings, data augmentation strategies are commonly leveraged to improve performance...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
A line of work has shown that natural text processing models are vulnerable to adversarial examples....
This paper introduces a new data augmentation method for neural machine translation that can enforce...
Thanks to increases in computing power and the growing availability of large datasets, neural netwo...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to i...
In many cases of machine learning, research suggests that the development of training data might hav...
Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and ...
Text classification is a fundamental Natural Language Processing task that has a wide variety of app...
Text simplification is a common task where the text is adapted to make it easier to understand. Simi...
We describe the work carried out by the DCU-ADAPT team on the Lexical Normalisation shared task at W...