Holistic 3D scene understanding entails estimation of both layout configuration and object geometry in a 3D environment. Recent works have shown advances in 3D scene estimation from various input modalities (e.g., images, 3D scans), by leveraging 3D supervision (e.g., 3D bounding boxes or CAD models), for which collection at scale is expensive and often intractable. To address this shortcoming, we propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth. Instead, we rely on 2D supervision from multi-view RGB images. Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories, 3D bounding boxes, and ...
The goal of this study is to determine the effectiveness of different 3D shape representations in le...
The term 3D parsing refers to the process of segmenting and labeling the 3D space into expressive ca...
We address the task of inferring the 3D structure underlying an image, in particular focusing on two...
A 3D scene consists of a set of objects, each with a shape and a layout giving their position in spa...
Representing scenes at the granularity of objects is a prerequisite for scene understanding and deci...
Human perception reliably identifies movable and immovable parts of 3D scenes, and completes the 3D ...
Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for r...
We present a new approach to instill 4D dynamic object priors into learned 3D representations by uns...
Neural 3D scene reconstruction methods have achieved impressive performance when reconstructing comp...
To endow machines with the ability to perceive the real-world in a three dimensional representation ...
We introduce 3inGAN, an unconditional 3D generative model trained from 2D images of a single self-si...
A promising direction for pre-training 3D point clouds is to leverage the massive amount of data in ...
We present ObPose, an unsupervised object-centric inference and generation model which learns 3D-str...
Recent progress in 3D scene understanding enables scalable learning of representations across large ...
We present a dense reconstruction approach that overcomes the drawbacks of traditional multiview ste...
The goal of this study is to determine the effectiveness of different 3D shape representations in le...
The term 3D parsing refers to the process of segmenting and labeling the 3D space into expressive ca...
We address the task of inferring the 3D structure underlying an image, in particular focusing on two...
A 3D scene consists of a set of objects, each with a shape and a layout giving their position in spa...
Representing scenes at the granularity of objects is a prerequisite for scene understanding and deci...
Human perception reliably identifies movable and immovable parts of 3D scenes, and completes the 3D ...
Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for r...
We present a new approach to instill 4D dynamic object priors into learned 3D representations by uns...
Neural 3D scene reconstruction methods have achieved impressive performance when reconstructing comp...
To endow machines with the ability to perceive the real-world in a three dimensional representation ...
We introduce 3inGAN, an unconditional 3D generative model trained from 2D images of a single self-si...
A promising direction for pre-training 3D point clouds is to leverage the massive amount of data in ...
We present ObPose, an unsupervised object-centric inference and generation model which learns 3D-str...
Recent progress in 3D scene understanding enables scalable learning of representations across large ...
We present a dense reconstruction approach that overcomes the drawbacks of traditional multiview ste...
The goal of this study is to determine the effectiveness of different 3D shape representations in le...
The term 3D parsing refers to the process of segmenting and labeling the 3D space into expressive ca...
We address the task of inferring the 3D structure underlying an image, in particular focusing on two...