We study the effect of different approaches to text augmentation. To do this we use three datasets that include social media and formal text in the form of news articles. Our goal is to provide insights for practitioners and researchers on making choices for augmentation for classification use cases. We observe that Word2Vec-based augmentation is a viable option when one does not have access to a formal synonym model (like WordNet-based augmentation). The use of mixup further improves performance of all text based augmentations and reduces the effects of overfitting on a tested deep learning model. Round-trip translation with a translation service proves to be harder to use due to cost and as such is less accessible for both normal and low ...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
Clustering low texts (like news titles) by their context is a challenging task. The syntactic disfig...
Only humans can understand and comprehend the actual meaning that underlies natural written language...
International audienceWe study the effect of different approaches to text augmentation. To do this w...
In many cases of machine learning, research suggests that the development of training data might hav...
Data augmentation, the artificial creation of training data for machine learning by transformations,...
Text classification typically performs best with large training sets, but short texts are very commo...
Data augmentation is one of the ways of dealing with labeled data scarcity and overfitting. Both the...
Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to i...
Thanks to increases in computing power and the growing availability of large datasets, neural netwo...
Recent increases in the use and availability of short messages have created opportunities to harvest...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
With the rapid development of Internet technology, text data on the Internet is growing significantl...
The often observed unavailability of large amounts of training data typically required by deep learn...
The proliferation of textual data in the form of online news articles and social media feeds has had...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
Clustering low texts (like news titles) by their context is a challenging task. The syntactic disfig...
Only humans can understand and comprehend the actual meaning that underlies natural written language...
International audienceWe study the effect of different approaches to text augmentation. To do this w...
In many cases of machine learning, research suggests that the development of training data might hav...
Data augmentation, the artificial creation of training data for machine learning by transformations,...
Text classification typically performs best with large training sets, but short texts are very commo...
Data augmentation is one of the ways of dealing with labeled data scarcity and overfitting. Both the...
Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to i...
Thanks to increases in computing power and the growing availability of large datasets, neural netwo...
Recent increases in the use and availability of short messages have created opportunities to harvest...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
With the rapid development of Internet technology, text data on the Internet is growing significantl...
The often observed unavailability of large amounts of training data typically required by deep learn...
The proliferation of textual data in the form of online news articles and social media feeds has had...
Data augmentation is widely used in text classification, especially in the low-resource regime where...
Clustering low texts (like news titles) by their context is a challenging task. The syntactic disfig...
Only humans can understand and comprehend the actual meaning that underlies natural written language...