Real-world business applications require a trade-off between language model performance and size. We propose a new method for model compression that relies on vocabulary transfer. We evaluate the method on various vertical domains and downstream tasks. Our results indicate that vocabulary transfer can be effectively used in combination with other compression techniques, yielding a significant reduction in model size and inference time while marginally compromising on performance
Neural machine translation (NMT) has been shown to outperform statistical machine translation. Howe...
N-gram language models are an essential component in statistical natural language processing systems...
Neural network training has been shown to be advantageous in many natural language processing appli...
Real-world business applications require a trade-off between language model performance and size. We...
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transfor...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Word-based context models for text compression have the capacity to outperform more simple character...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Modern language models leverage increasingly large numbers of parameters to achieve performance on n...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Transformers are responsible for the vast majority of recent advances in natural language processing...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Neural machine translation (NMT) has been shown to outperform statistical machine translation. Howe...
N-gram language models are an essential component in statistical natural language processing systems...
Neural network training has been shown to be advantageous in many natural language processing appli...
Real-world business applications require a trade-off between language model performance and size. We...
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transfor...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Word-based context models for text compression have the capacity to outperform more simple character...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Modern language models leverage increasingly large numbers of parameters to achieve performance on n...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Transformers are responsible for the vast majority of recent advances in natural language processing...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Neural machine translation (NMT) has been shown to outperform statistical machine translation. Howe...
N-gram language models are an essential component in statistical natural language processing systems...
Neural network training has been shown to be advantageous in many natural language processing appli...