Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. But both attention and multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense multiplications, resulting in costly training and inference. To this end, we propose to reparameterize the pre-trained ViT with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed $\textbf{ShiftAddViT}$, which aims for end-to-end inference speedups on GPUs without the need of training from scratch. Specifically, all $\texttt{MatMuls}$ among queries, keys, and values are reparameterized by additive kernels, after mapping queries and keys to ...
Prior works have proposed several strategies to reduce the computational cost of self-attention mech...
Self-attention based models such as vision transformers (ViTs) have emerged as a very competitive ar...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. Howev...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural ...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision app...
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratic...
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize...
Network quantization significantly reduces model inference complexity and has been widely used in re...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Structural re-parameterization is a general training scheme for Convolutional Neural Networks (CNNs)...
Vision transformers (ViTs) are usually considered to be less light-weight than convolutional neural ...
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer visio...
Prior works have proposed several strategies to reduce the computational cost of self-attention mech...
Self-attention based models such as vision transformers (ViTs) have emerged as a very competitive ar...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. Howev...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural ...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision app...
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratic...
With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize...
Network quantization significantly reduces model inference complexity and has been widely used in re...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Structural re-parameterization is a general training scheme for Convolutional Neural Networks (CNNs)...
Vision transformers (ViTs) are usually considered to be less light-weight than convolutional neural ...
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer visio...
Prior works have proposed several strategies to reduce the computational cost of self-attention mech...
Self-attention based models such as vision transformers (ViTs) have emerged as a very competitive ar...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...