Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of-the-art results for various NLP tasks. Pre-training is usually independent of the downstream task, and previous works have shown that this pre-training alone might not be sufficient to capture the task-specific nuances. We propose a way to tailor a pre-trained BERT model for the downstream task via task-specific masking before the standard supervised fine-tuning. For this, a word list is first collected specific to the task. For example, if the task is sentiment classification, we collect a small sample of words representing both positive and negative sentiments. Next, a word's importance for the task, called the word's task score, is measur...
Pre-trained Language Models are widely used in many important real-world applications. However, rece...
Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous ...
Word order, an essential property of natural languages, is injected in Transformer-based neural lang...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Masked language models conventionally use a masking rate of 15% due to the belief that more masking ...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense pas...
The current era of natural language processing (NLP) has been defined by the prominence of pre-train...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
The current era of natural language processing (NLP) has been defined by the prominence of pre-train...
A fundamental challenge of over-parameterized deep learning models is learning meaningful data repre...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretr...
Pre-trained Language Models are widely used in many important real-world applications. However, rece...
Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous ...
Word order, an essential property of natural languages, is injected in Transformer-based neural lang...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Masked language models conventionally use a masking rate of 15% due to the belief that more masking ...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Pre-trained language model (PTM) has been shown to yield powerful text representations for dense pas...
The current era of natural language processing (NLP) has been defined by the prominence of pre-train...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
The current era of natural language processing (NLP) has been defined by the prominence of pre-train...
A fundamental challenge of over-parameterized deep learning models is learning meaningful data repre...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretr...
Pre-trained Language Models are widely used in many important real-world applications. However, rece...
Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous ...
Word order, an essential property of natural languages, is injected in Transformer-based neural lang...