Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translation models are often able to adapt to the paucity of data for low-resource languages. However, this performance comes at the cost of significantly bloated models which are not practically deployable. Knowledge Distillation is one popular technique to develop competitive, lightweight models: In this work, we first evaluate its use to compress MT models focusing on languages with extremely limited training data. Through our analysis across 8 languages, we find that the variance in the performance of the distilled models due to their dependence on priors including the amount of synthetic data used for distillation, the student architecture, train...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
In this paper we share findings from our effort to build practical machine translation (MT) systems ...
This paper investigates very low resource language model pretraining, when less than 100 thousand se...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Large language models (LLMs) implicitly learn to perform a range of language tasks, including machin...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Large language models have become a vital component in modern NLP, achieving state of the art perfor...
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Mach...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
�� 2021 The Authors. Published by ACL. This is an open access article available under a Creative Com...
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
In this paper we share findings from our effort to build practical machine translation (MT) systems ...
This paper investigates very low resource language model pretraining, when less than 100 thousand se...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Large language models (LLMs) implicitly learn to perform a range of language tasks, including machin...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Large language models have become a vital component in modern NLP, achieving state of the art perfor...
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Mach...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
�� 2021 The Authors. Published by ACL. This is an open access article available under a Creative Com...
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
In this paper we share findings from our effort to build practical machine translation (MT) systems ...
This paper investigates very low resource language model pretraining, when less than 100 thousand se...