Unsupervised object-centric learning aims to represent the modular, compositional, and causal structure of a scene as a set of object representations and thereby promises to resolve many critical limitations of traditional single-vector representations such as poor systematic generalization. Although there have been many remarkable advances in recent years, one of the most critical problems in this direction has been that previous methods work only with simple and synthetic scenes but not with complex and naturalistic images or videos. In this paper, we propose STEVE, an unsupervised model for object-centric learning in videos. Our proposed model makes a significant advancement by demonstrating its effectiveness on various complex and natur...
This thesis tackles the challenge of learning the abstract structure of object categories without ma...
Humans easily recognize object parts and their hierarchical structure by watching how they move; the...
The objective of this paper is a model that is able to discover, track and segment multiple moving o...
Object-centric learning has gained significant attention over the last years as it can serve as a po...
The objective of this work is to learn an object-centric video representation, with the aim of impro...
Machine learning models have led to remarkable progress in visual recognition. A key driving factor ...
Modern Computer Vision systems learn visual concepts through examples (i.e. images) which have been ...
Our goal is to learn a deep network that, given a small number of images of an object of a given cat...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
Learning how to model complex scenes in a modular way with recombinable components is a pre-requisit...
Despite their irresistible success, deep learning algorithms still heavily rely on annotated data, a...
We propose a novel framework for the task of object-centric video prediction, i.e., extracting the c...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
The study of object representations in computer vision has primarily focused on developing represent...
Visual recognition of semantically meaningful entities like objects, actions, and poses in images an...
This thesis tackles the challenge of learning the abstract structure of object categories without ma...
Humans easily recognize object parts and their hierarchical structure by watching how they move; the...
The objective of this paper is a model that is able to discover, track and segment multiple moving o...
Object-centric learning has gained significant attention over the last years as it can serve as a po...
The objective of this work is to learn an object-centric video representation, with the aim of impro...
Machine learning models have led to remarkable progress in visual recognition. A key driving factor ...
Modern Computer Vision systems learn visual concepts through examples (i.e. images) which have been ...
Our goal is to learn a deep network that, given a small number of images of an object of a given cat...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
Learning how to model complex scenes in a modular way with recombinable components is a pre-requisit...
Despite their irresistible success, deep learning algorithms still heavily rely on annotated data, a...
We propose a novel framework for the task of object-centric video prediction, i.e., extracting the c...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
The study of object representations in computer vision has primarily focused on developing represent...
Visual recognition of semantically meaningful entities like objects, actions, and poses in images an...
This thesis tackles the challenge of learning the abstract structure of object categories without ma...
Humans easily recognize object parts and their hierarchical structure by watching how they move; the...
The objective of this paper is a model that is able to discover, track and segment multiple moving o...