Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all t...
We describe a hierarchical probabilistic model for the detection and recognition of objects in clutt...
International audienceThe dynamic content of physical scenes is largely compositional, that is, the ...
We present a novel method for learning human motion models from unsegmented videos. We propose a uni...
The appearance of dynamic scenes is often largely governed by a latent low-dimensional dynamic proce...
Object-centric learning has gained significant attention over the last years as it can serve as a po...
Scenes filled with moving objects are often hierarchically organized: the motion of a migrating goos...
We propose a new unsupervised learning method to obtain a layered pictorial structure (LPS) represen...
AbstractThis paper presents a method to extract a part-based model of an observed scene from a video...
Structure in a visual scene can be described at many levels of granular-ity. At a coarse level, the ...
While great strides have been made in using deep learning algorithms to solve supervised learning ta...
A fundamental goal of computer vision is the ability to analyze motion. This can range from the sim...
This paper addresses the fundamental problem of auto-matically discovering an unknown moving deforma...
Common-sense physical reasoning in the real world requires learning about the interactions of object...
Visual understanding of human behavior in video sequences is one of the fundamental topics in comput...
Visual recognition is a fundamental problem in computer vision. It is significant to many applicatio...
We describe a hierarchical probabilistic model for the detection and recognition of objects in clutt...
International audienceThe dynamic content of physical scenes is largely compositional, that is, the ...
We present a novel method for learning human motion models from unsegmented videos. We propose a uni...
The appearance of dynamic scenes is often largely governed by a latent low-dimensional dynamic proce...
Object-centric learning has gained significant attention over the last years as it can serve as a po...
Scenes filled with moving objects are often hierarchically organized: the motion of a migrating goos...
We propose a new unsupervised learning method to obtain a layered pictorial structure (LPS) represen...
AbstractThis paper presents a method to extract a part-based model of an observed scene from a video...
Structure in a visual scene can be described at many levels of granular-ity. At a coarse level, the ...
While great strides have been made in using deep learning algorithms to solve supervised learning ta...
A fundamental goal of computer vision is the ability to analyze motion. This can range from the sim...
This paper addresses the fundamental problem of auto-matically discovering an unknown moving deforma...
Common-sense physical reasoning in the real world requires learning about the interactions of object...
Visual understanding of human behavior in video sequences is one of the fundamental topics in comput...
Visual recognition is a fundamental problem in computer vision. It is significant to many applicatio...
We describe a hierarchical probabilistic model for the detection and recognition of objects in clutt...
International audienceThe dynamic content of physical scenes is largely compositional, that is, the ...
We present a novel method for learning human motion models from unsegmented videos. We propose a uni...