Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning. Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalizat...
Real-world business applications require a trade-off between language model performance and size. We...
Pretrained multilingual contextual representations have shown great success, but due to the limits o...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translati...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Pretrained multilingual language models have become a common tool in transferring NLP capabilities t...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
For a language model (LM) to faithfully model human language, it must compress vast, potentially in...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Transfer learning has led to large gains in performance for nearly all NLP tasks while making downst...
Modern language models leverage increasingly large numbers of parameters to achieve performance on n...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Real-world business applications require a trade-off between language model performance and size. We...
Pretrained multilingual contextual representations have shown great success, but due to the limits o...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translati...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Pretrained multilingual language models have become a common tool in transferring NLP capabilities t...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
For a language model (LM) to faithfully model human language, it must compress vast, potentially in...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Transfer learning has led to large gains in performance for nearly all NLP tasks while making downst...
Modern language models leverage increasingly large numbers of parameters to achieve performance on n...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Real-world business applications require a trade-off between language model performance and size. We...
Pretrained multilingual contextual representations have shown great success, but due to the limits o...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...