The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs'...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
A fundamental challenge of over-parameterized deep learning models is learning meaningful data repre...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Masked language models conventionally use a masking rate of 15% due to the belief that more masking ...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
The advent of large-scale pre-trained language models has contributed greatly to the recent progress...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
The core of self-supervised learning for pre-training language models includes pre-training task des...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
A fundamental challenge of over-parameterized deep learning models is learning meaningful data repre...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Masked language models conventionally use a masking rate of 15% due to the belief that more masking ...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
The advent of large-scale pre-trained language models has contributed greatly to the recent progress...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
The core of self-supervised learning for pre-training language models includes pre-training task des...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
A fundamental challenge of over-parameterized deep learning models is learning meaningful data repre...