Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pre...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
In this paper we address the issue of building language models for very small training sets by adapt...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Pretrained language models (PLMs) are today the primary model for natural language processing. Despi...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Pre-trained models have revolutionized the natural language processing field by leveraging large-sca...
Pre-trained models have revolutionized the natural language processing field by leveraging large-sca...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
It is today acknowledged that neural network language models outperform backoff language models in a...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
In this paper we address the issue of building language models for very small training sets by adapt...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Pretrained language models (PLMs) are today the primary model for natural language processing. Despi...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Pre-trained models have revolutionized the natural language processing field by leveraging large-sca...
Pre-trained models have revolutionized the natural language processing field by leveraging large-sca...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
It is today acknowledged that neural network language models outperform backoff language models in a...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...