Much of the recent progress on visual processing has been driven by deep learning and its bicameral heart of composition and end-to-end optimization. The machinery of convolutional networks is now ubiquitous. Its diffusion however was neither instantaneous nor effortless. To advance across the frontiers of vision, deep learning had to be equipped with the right structures: the true, intrinsic structures of the visual world.This thesis incorporates locality and scale structure into end-to-end learning for visual recognition. Locality structure is key for addressing image-to-image tasks that take image inputs and return image outputs. Scale structure is ubiquitous, and optimizing over it learns the degree of locality for the task and data. Al...
The advent of deep convolutional networks has powered a new wave of progress in visual recognition. ...
The advent of deep convolutional networks has powered a new wave of progress in visual recognition. ...
Data representation is the core of all machine learning algorithms, and their performance depends mo...
In recent years, convolutional networks have dramatically (re)emerged as the dominant paradigm for s...
The main success stories of deep learning, starting with ImageNet, depend on convolutional networks,...
Locality is taken as the surrounding or nearby region of something in the world. Without the notion ...
Vision Transformer (ViT) attains state-of-the-art performance in visual recognition, and the variant...
Deep learning based visual recognition and localization is one of the pillars of computer vision and...
Knowing where to look in an image can significantly improve performance in computer vision tasks by ...
This manuscript is about a journey. The journey of computer vision and machine learning research fro...
This manuscript is about a journey. The journey of computer vision and machine learning research fro...
This manuscript is about a journey. The journey of computer vision and machine learning research fro...
Evidence is mounting that ConvNets are the best representation learning method for recognition. In t...
Evidence is mounting that ConvNets are the best representation learning method for recognition. In t...
Understanding visual scenes is a crucial piece in many artificial intelligence applications ranging ...
The advent of deep convolutional networks has powered a new wave of progress in visual recognition. ...
The advent of deep convolutional networks has powered a new wave of progress in visual recognition. ...
Data representation is the core of all machine learning algorithms, and their performance depends mo...
In recent years, convolutional networks have dramatically (re)emerged as the dominant paradigm for s...
The main success stories of deep learning, starting with ImageNet, depend on convolutional networks,...
Locality is taken as the surrounding or nearby region of something in the world. Without the notion ...
Vision Transformer (ViT) attains state-of-the-art performance in visual recognition, and the variant...
Deep learning based visual recognition and localization is one of the pillars of computer vision and...
Knowing where to look in an image can significantly improve performance in computer vision tasks by ...
This manuscript is about a journey. The journey of computer vision and machine learning research fro...
This manuscript is about a journey. The journey of computer vision and machine learning research fro...
This manuscript is about a journey. The journey of computer vision and machine learning research fro...
Evidence is mounting that ConvNets are the best representation learning method for recognition. In t...
Evidence is mounting that ConvNets are the best representation learning method for recognition. In t...
Understanding visual scenes is a crucial piece in many artificial intelligence applications ranging ...
The advent of deep convolutional networks has powered a new wave of progress in visual recognition. ...
The advent of deep convolutional networks has powered a new wave of progress in visual recognition. ...
Data representation is the core of all machine learning algorithms, and their performance depends mo...