Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise. However, it is important to note that parameter sharing does not alleviate computational burdens associated with inference, thus impeding its practicality in situations characterized by limited stringent latency requirements or computational resources. Building upon neural ordinary differential equations (ODEs), we introduce a straightforward technique to enhance the inference efficiency of parameter-shared PLMs. Additionally, we propose a simple pre-training technique that leads to fully or partially share...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One ...
Transformer-based pre-trained language models (PLMs) mostly suffer from excessive overhead despite t...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
LLMs or Large Language Models are the machine learning models that are used to understand and genera...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples...
Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
As language models increase in size by the day, methods for efficient inference are critical to leve...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One ...
Transformer-based pre-trained language models (PLMs) mostly suffer from excessive overhead despite t...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
LLMs or Large Language Models are the machine learning models that are used to understand and genera...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples...
Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
As language models increase in size by the day, methods for efficient inference are critical to leve...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One ...