Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms of the complex behaviors they support. To address this, we propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole. Unlike prior approaches, our method learns in an unsupervised fashion directly from raw visual images to discover objects, parts, and their relations. It explicitly distinguishes multiple levels of abstraction and improves over a strong baseline at modeling synthetic and real-...
Computer vision has made significant progress in locating and recognizing objects in recent decades....
Humans possess rich knowledge of the structure of the world, including co-occurrences among entities...
Human scene understanding involves not just localizing objects,but also inferring latent attributes ...
Humans easily recognize object parts and their hierarchical structure by watching how they move; the...
Human visual reasoning is characterized by an ability to identify abstract patterns from only a smal...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Deep neural networks learn representations of data to facilitate problem-solving in their respective...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Many tasks that are easy for humans are difficult for machines. Particularly, while humans excel at ...
Abstract—In order for robots to function in unstructured environments in interaction with humans, th...
Structure in a visual scene can be described at many levels of granular-ity. At a coarse level, the ...
Humans acquire their most basic physical concepts early in development, and continue to enrich and e...
Object-centric learning has gained significant attention over the last years as it can serve as a po...
Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Dev...
Reasoning about commonsense from visual input remains an important and challenging problem in the fi...
Computer vision has made significant progress in locating and recognizing objects in recent decades....
Humans possess rich knowledge of the structure of the world, including co-occurrences among entities...
Human scene understanding involves not just localizing objects,but also inferring latent attributes ...
Humans easily recognize object parts and their hierarchical structure by watching how they move; the...
Human visual reasoning is characterized by an ability to identify abstract patterns from only a smal...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Deep neural networks learn representations of data to facilitate problem-solving in their respective...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Many tasks that are easy for humans are difficult for machines. Particularly, while humans excel at ...
Abstract—In order for robots to function in unstructured environments in interaction with humans, th...
Structure in a visual scene can be described at many levels of granular-ity. At a coarse level, the ...
Humans acquire their most basic physical concepts early in development, and continue to enrich and e...
Object-centric learning has gained significant attention over the last years as it can serve as a po...
Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Dev...
Reasoning about commonsense from visual input remains an important and challenging problem in the fi...
Computer vision has made significant progress in locating and recognizing objects in recent decades....
Humans possess rich knowledge of the structure of the world, including co-occurrences among entities...
Human scene understanding involves not just localizing objects,but also inferring latent attributes ...