In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the do...
Recently, the development of pre-trained language models has brought natural language processing (NL...
We present three large-scale experiments on binary text matching classification task both in Chinese...
Neural Machine Translation (NMT) models tend to achieve best performance when larger sets of paral...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In many cases of machine learning, research suggests that the development of training data might hav...
Thanks to increases in computing power and the growing availability of large datasets, neural netwo...
In recent years, there has been significant progress in developing pre-trained language models for N...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
In low resource settings, data augmentation strategies are commonly leveraged to improve performance...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learn...
In the context of neural machine translation, data augmentation (DA) techniques may be used for gene...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Recently, the development of pre-trained language models has brought natural language processing (NL...
We present three large-scale experiments on binary text matching classification task both in Chinese...
Neural Machine Translation (NMT) models tend to achieve best performance when larger sets of paral...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In many cases of machine learning, research suggests that the development of training data might hav...
Thanks to increases in computing power and the growing availability of large datasets, neural netwo...
In recent years, there has been significant progress in developing pre-trained language models for N...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
In low resource settings, data augmentation strategies are commonly leveraged to improve performance...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learn...
In the context of neural machine translation, data augmentation (DA) techniques may be used for gene...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Recently, the development of pre-trained language models has brought natural language processing (NL...
We present three large-scale experiments on binary text matching classification task both in Chinese...
Neural Machine Translation (NMT) models tend to achieve best performance when larger sets of paral...