Pre-trained language models received extensive attention in recent years. However, it is still challenging to incorporate a pre-trained model such as BERT into natural language generation tasks. This work investigates a recent method called adapters as an alternative to fine-tuning the whole model in machine translation. Adapters are a promising approach that allows fine-tuning only a tiny fraction of a pre-trained network. We show that with proper initialization, adapters can help achieve better performance than training models from scratch while training substantially fewer weights than the original model. We further show that even with randomly set weights used as the base models for fine-tuning, we can achieve similar performance to one...
Pre-training and fine-tuning have become the de facto paradigm in many natural language processing (...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
International audienceAdapter modules were recently introduced as an efficient alternative to fine-t...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Statistical machine translation (SMT) systems use statistical learning methods to learn how to trans...
The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
This article describes our experiments in neural machine translation using the recent Tensor2Tensor ...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Pre-training and fine-tuning have become the de facto paradigm in many natural language processing (...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
International audienceAdapter modules were recently introduced as an efficient alternative to fine-t...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Statistical machine translation (SMT) systems use statistical learning methods to learn how to trans...
The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
This article describes our experiments in neural machine translation using the recent Tensor2Tensor ...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Pre-training and fine-tuning have become the de facto paradigm in many natural language processing (...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...