Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies. In this study, we propose an innovative method that shifts away from probability prediction and instead focuses on reconstructing input embeddings in a contrastive fashion via Constrastive Weight Tying (CWT). We apply this approach to pretrain Headless Language Models in both monolingual and multilingual contexts. Our method offers practical advantages, substantially reducing training computational requirements by up to 20 times, while simultaneously enhancing downstream performance and data efficiency. We observe a significant +1.6 GLUE score increase and a notable +2.7 LAMBADA accuracy improvement com...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT)...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Neural Machine Translation (NMT) typically leverages monolingual data in training through backtransl...
Neural language models do not scale well when the vocabulary is large. Noise contrastive estimation ...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation ...
Fine-tuning pretrained language models (LMs) without making any architectural changes has become a n...
International audienceAlthough deep pre-trained language models have shown promising benefit in a la...
In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less...
Tying the weights of the target word embeddings with the target word classifiers of neural machine t...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Machine-learning models can reach very high performance with supervised training, where they learn f...
The core of self-supervised learning for pre-training language models includes pre-training task des...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT)...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Neural Machine Translation (NMT) typically leverages monolingual data in training through backtransl...
Neural language models do not scale well when the vocabulary is large. Noise contrastive estimation ...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation ...
Fine-tuning pretrained language models (LMs) without making any architectural changes has become a n...
International audienceAlthough deep pre-trained language models have shown promising benefit in a la...
In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less...
Tying the weights of the target word embeddings with the target word classifiers of neural machine t...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Machine-learning models can reach very high performance with supervised training, where they learn f...
The core of self-supervised learning for pre-training language models includes pre-training task des...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT)...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...