The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more a...
The ability to detect and track objects in the visual world is a crucial skill for any intelligent a...
This is the accepted version of the paper to appear at Pattern Recognition Letters (PRL). The final ...
In this paper we present a general, flexible framework for learning mappings from images to actions ...
In many control problems that include vision, optimal controls can be inferred from the location of ...
The power of deep neural networks comes mainly from huge labeled datasets. Even though it shines on ...
We discuss vision as a sensory modality for systems that interact flexibly with uncontrolled environ...
We discuss vision as a sensory modality for systems that interact flexibly with uncontrolled environ...
International audienceWe present a novel learned keypoint detection method designed to maximize the ...
We propose a new method for recognizing the pose of objects from a single image that for learning us...
Object recognition is a challenging problem for artificial systems. This is especially true for obje...
Learning automatically the structure of object categories remains an important open problem in compu...
Despite significant recent progress, machine vision systems lag considerably behind their biological...
Computer vision problems, such as tracking and robot navigation, tend to be solved using models of t...
Perception in natural systems is a highly active process. In this paper, we adopt the strategy of na...
Modern Computer Vision systems learn visual concepts through examples (i.e. images) which have been ...
The ability to detect and track objects in the visual world is a crucial skill for any intelligent a...
This is the accepted version of the paper to appear at Pattern Recognition Letters (PRL). The final ...
In this paper we present a general, flexible framework for learning mappings from images to actions ...
In many control problems that include vision, optimal controls can be inferred from the location of ...
The power of deep neural networks comes mainly from huge labeled datasets. Even though it shines on ...
We discuss vision as a sensory modality for systems that interact flexibly with uncontrolled environ...
We discuss vision as a sensory modality for systems that interact flexibly with uncontrolled environ...
International audienceWe present a novel learned keypoint detection method designed to maximize the ...
We propose a new method for recognizing the pose of objects from a single image that for learning us...
Object recognition is a challenging problem for artificial systems. This is especially true for obje...
Learning automatically the structure of object categories remains an important open problem in compu...
Despite significant recent progress, machine vision systems lag considerably behind their biological...
Computer vision problems, such as tracking and robot navigation, tend to be solved using models of t...
Perception in natural systems is a highly active process. In this paper, we adopt the strategy of na...
Modern Computer Vision systems learn visual concepts through examples (i.e. images) which have been ...
The ability to detect and track objects in the visual world is a crucial skill for any intelligent a...
This is the accepted version of the paper to appear at Pattern Recognition Letters (PRL). The final ...
In this paper we present a general, flexible framework for learning mappings from images to actions ...