Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more subst...
Backpropagation learning algorithms typically collapse the network's structure into a single ve...
The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream ...
The recent success of large and deep neural network models has motivated the training of even larger...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pr...
In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO), where a linear cascade...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...
Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained so...
Transfer learning on edge is challenging due to on-device limited resources. Existing work addresses...
We analyze the learning dynamics of neural language and translation models using Loss Change Allocat...
Deep learning training consumes ever-increasing time and resources, and that isdue to the complexity...
Recently, the pretrain-finetuning paradigm has attracted tons of attention in graph learning communi...
Backpropagation learning algorithms typically collapse the network's structure into a single ve...
The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream ...
The recent success of large and deep neural network models has motivated the training of even larger...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pr...
In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO), where a linear cascade...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...
Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained so...
Transfer learning on edge is challenging due to on-device limited resources. Existing work addresses...
We analyze the learning dynamics of neural language and translation models using Loss Change Allocat...
Deep learning training consumes ever-increasing time and resources, and that isdue to the complexity...
Recently, the pretrain-finetuning paradigm has attracted tons of attention in graph learning communi...
Backpropagation learning algorithms typically collapse the network's structure into a single ve...
The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream ...
The recent success of large and deep neural network models has motivated the training of even larger...