Transformers have quickly shined in the computer vision world since the emergence of Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) seems to be challenged by increasingly effective transformer-based models. Very recently, a couple of advanced convolutional models strike back with large kernels motivated by the local-window attention mechanism, showing appealing performance and efficiency. While one of them, i.e. RepLKNet, impressively manages to scale the kernel size to 31x31 with improved performance, the performance starts to saturate as the kernel size continues growing, compared to the scaling trend of advanced ViTs such as Swin Transformer. In this paper, we explore the possibility of training ext...
Vision transformers have shown excellent performance in computer vision tasks. As the computation co...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution im...
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent ad...
Recent years have witnessed two seemingly opposite developments of deep convolutional neural network...
The successful application of ConvNets and other neural architectures to computer vision is central ...
Sparse representation plays a critical role in vision problems, including generation and understandi...
DNNs have been finding a growing number of applications including image classification, speech recog...
We propose a new method for creating computationally efficient and compact convolutional neural netw...
Dense pixel matching problems such as optical flow and disparity estimation are among the most chall...
In this book, I perform an experimental review on twelve similar types of Convolutional Neural Netwo...
Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of ...
Convolutional neural networks (CNNs) are able to attain better visual recognition performance than f...
Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision. Ini...
Medical image segmentation has seen significant improvements with transformer models, which excel in...
Vision transformers have shown excellent performance in computer vision tasks. As the computation co...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution im...
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent ad...
Recent years have witnessed two seemingly opposite developments of deep convolutional neural network...
The successful application of ConvNets and other neural architectures to computer vision is central ...
Sparse representation plays a critical role in vision problems, including generation and understandi...
DNNs have been finding a growing number of applications including image classification, speech recog...
We propose a new method for creating computationally efficient and compact convolutional neural netw...
Dense pixel matching problems such as optical flow and disparity estimation are among the most chall...
In this book, I perform an experimental review on twelve similar types of Convolutional Neural Netwo...
Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of ...
Convolutional neural networks (CNNs) are able to attain better visual recognition performance than f...
Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision. Ini...
Medical image segmentation has seen significant improvements with transformer models, which excel in...
Vision transformers have shown excellent performance in computer vision tasks. As the computation co...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution im...