Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology
Large language models (LLMs), while transformative for NLP, come with significant computational dema...
The possibility of dynamically modifying the computational load of neural models at inference time i...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
With the recent advances in deep learning, different approaches to improving pre-trained language mo...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
In today's world where data plays the very important role, we have various sources of pre-data like ...
In today’s world where data plays the very important role, we have various sources of pre-data like ...
Real-world business applications require a trade-off between language model performance and size. We...
N-gram language models are an essential component in statistical natural language processing systems...
Recent work in natural language processing (NLP) has yielded appealing results from scaling model pa...
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transfor...
This paper describes two techniques for reducing the size of statistical back-off-gram language mode...
LLMs or Large Language Models are the machine learning models that are used to understand and genera...
This work adresses the topic of neural language model acceleration. The aim of this work is to optim...
Large language models (LLMs), while transformative for NLP, come with significant computational dema...
The possibility of dynamically modifying the computational load of neural models at inference time i...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
With the recent advances in deep learning, different approaches to improving pre-trained language mo...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
In today's world where data plays the very important role, we have various sources of pre-data like ...
In today’s world where data plays the very important role, we have various sources of pre-data like ...
Real-world business applications require a trade-off between language model performance and size. We...
N-gram language models are an essential component in statistical natural language processing systems...
Recent work in natural language processing (NLP) has yielded appealing results from scaling model pa...
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transfor...
This paper describes two techniques for reducing the size of statistical back-off-gram language mode...
LLMs or Large Language Models are the machine learning models that are used to understand and genera...
This work adresses the topic of neural language model acceleration. The aim of this work is to optim...
Large language models (LLMs), while transformative for NLP, come with significant computational dema...
The possibility of dynamically modifying the computational load of neural models at inference time i...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...