Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.Comment: Accepted to AAAI 202
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
Effectively scaling large Transformer models is a main driver of recent advances in natural language...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Pre-trained models learn informative representations on large-scale training data through a self-sup...
Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resourc...
This report documents the program and the outcomes of Dagstuhl Seminar 22232 “Efficient and Equitabl...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 10...
With the recent advances in deep learning, different approaches to improving pre-trained language mo...
This paper addresses the challenges of training large neural network models under federated learning...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural langua...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
Effectively scaling large Transformer models is a main driver of recent advances in natural language...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Pre-trained models learn informative representations on large-scale training data through a self-sup...
Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resourc...
This report documents the program and the outcomes of Dagstuhl Seminar 22232 “Efficient and Equitabl...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 10...
With the recent advances in deep learning, different approaches to improving pre-trained language mo...
This paper addresses the challenges of training large neural network models under federated learning...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural langua...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
Effectively scaling large Transformer models is a main driver of recent advances in natural language...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...