Object-centric learning has gained significant attention over the last years as it can serve as a powerful tool to analyze complex scenes as a composition of simpler entities. Well-established tasks in computer vision, such as object detection or instance segmentation, are generally posed in supervised settings. The recent surge of fully-unsupervised approaches for entity abstraction, which often tackle the problem with generative modeling or self-supervised learning, indicates the rising interest in structured representations in the form of objects or object parts. Indeed, these can provide benefits to many challenging tasks in visual analysis, reasoning, forecasting, and planning, and provide a path for combinatorial generalization. In th...
Learning how to model complex scenes in a modular way with recombinable components is a pre-requisit...
We propose a novel framework for the task of object-centric video prediction, i.e., extracting the c...
We address the problem of recognizing the pose of an object category from video sequences capturing ...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
We address the task of inferring the 3D structure underlying an image, in particular focusing on two...
Despite their irresistible success, deep learning algorithms still heavily rely on annotated data, a...
This thesis tackles the challenge of learning the abstract structure of object categories without ma...
We study unsupervised learning of occluding objects in images of visual scenes. The derived learning...
Unsupervised object-centric learning aims to represent the modular, compositional, and causal struct...
Visual recognition of semantically meaningful entities like objects, actions, and poses in images an...
Humans easily recognize object parts and their hierarchical structure by watching how they move; the...
Developing computer vision algorithms able to learn from unsegmented images containing multiple obje...
Visual scenes are extremely rich in diversity, not only because there are infinite combinations of o...
Advances in unsupervised learning of object-representations have culminated in the development of a ...
Learning how to model complex scenes in a modular way with recombinable components is a pre-requisit...
We propose a novel framework for the task of object-centric video prediction, i.e., extracting the c...
We address the problem of recognizing the pose of an object category from video sequences capturing ...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
A natural approach to generative modeling of videos is to represent them as a composition of moving ...
We address the task of inferring the 3D structure underlying an image, in particular focusing on two...
Despite their irresistible success, deep learning algorithms still heavily rely on annotated data, a...
This thesis tackles the challenge of learning the abstract structure of object categories without ma...
We study unsupervised learning of occluding objects in images of visual scenes. The derived learning...
Unsupervised object-centric learning aims to represent the modular, compositional, and causal struct...
Visual recognition of semantically meaningful entities like objects, actions, and poses in images an...
Humans easily recognize object parts and their hierarchical structure by watching how they move; the...
Developing computer vision algorithms able to learn from unsegmented images containing multiple obje...
Visual scenes are extremely rich in diversity, not only because there are infinite combinations of o...
Advances in unsupervised learning of object-representations have culminated in the development of a ...
Learning how to model complex scenes in a modular way with recombinable components is a pre-requisit...
We propose a novel framework for the task of object-centric video prediction, i.e., extracting the c...
We address the problem of recognizing the pose of an object category from video sequences capturing ...