Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally expensive fine-tuning, which is undesirable in many edge deployment cases. In this work, we propose Zero-TPrune, the first zero-shot method that considers both the importance and similarity of tokens in performing token pruning. It leverages the attention graph of pre-trained Transformer models to produce an importance distribution for tokens via our ...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Large-scale transformer models have become the de-facto architectures for various machine learning a...
Vision transformers have achieved leading performance on various visual tasks yet still suffer from ...
Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior ...
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer visio...
Despite the recent success in many applications, the high computational requirements of vision trans...
Vision transformers have achieved significant improvements on various vision tasks but their quadrat...
Most existing pruning works are resource-intensive, requiring retraining or fine-tuning of the prune...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Multi-head attention, a collection of several attention mechanisms that independently attend to diff...
This thesis addresses the crucial issue of deploying large Transformer models on resource-constraine...
While state-of-the-art vision transformer models achieve promising results in image classification, ...
As the third-generation neural network, the Spiking Neural Network (SNN) has the advantages of low p...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Recent advances in deep learning optimization showed that just a subset of parameters are really nec...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Large-scale transformer models have become the de-facto architectures for various machine learning a...
Vision transformers have achieved leading performance on various visual tasks yet still suffer from ...
Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior ...
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer visio...
Despite the recent success in many applications, the high computational requirements of vision trans...
Vision transformers have achieved significant improvements on various vision tasks but their quadrat...
Most existing pruning works are resource-intensive, requiring retraining or fine-tuning of the prune...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Multi-head attention, a collection of several attention mechanisms that independently attend to diff...
This thesis addresses the crucial issue of deploying large Transformer models on resource-constraine...
While state-of-the-art vision transformer models achieve promising results in image classification, ...
As the third-generation neural network, the Spiking Neural Network (SNN) has the advantages of low p...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Recent advances in deep learning optimization showed that just a subset of parameters are really nec...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Large-scale transformer models have become the de-facto architectures for various machine learning a...
Vision transformers have achieved leading performance on various visual tasks yet still suffer from ...