Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

Hou, Qibin
Lu, Cheng-Ze
Cheng, Ming-Ming
Feng, Jiashi

Publication date

November 2022

Language

English

Abstract

This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features. By comparing the design principles of the recent convolutional neural networks ConvNets) and Vision Transformers, we propose to simplify the self-attention by leveraging a convolutional modulation operation. We show that such a simple approach can better take advantage of the large kernels (>=7x7) nested in convolutional layers. We build a family of hierarchical ConvNets using the proposed convolutional modulation, termed Conv2Former. Our network is simple and easy to follow. Experiments show that our Conv2Former outperforms existent popular ConvNets and vision T...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

Abstract

Extracted data

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

Abstract

Extracted data

Related items

Related items