Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime

Chen, Junfan
Zhang, Richong
Luo, Zheyan
Hu, Chunming
Mao, Yongyi

Publication date

August 2023

Language

English

Abstract

Data augmentation is widely used in text classification, especially in the low-resource regime where a few examples for each class are available during training. Despite the success, generating data augmentations as hard positive examples that may increase their effectiveness is under-explored. This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations to train the low-resource text classification model efficiently. Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding, making the augmented inputs hard to be recognized as positive by the classification model. We adversarially learn the dilu...