We present Neural Feature Fusion Fields (N3F), a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene. Given an image feature extractor, for example pre-trained using self-supervision, N3F uses it as a teacher to learn a student network defined in 3D space. The 3D student network is similar to a neural radiance field that distills said features and can be trained with the usual differentiable rendering machinery. As a consequence, N3F is readily applicable to most neural rendering formulations, including vanilla NeRF and its extensions to complex dynamic scenes. We show that our method not only enables semantic understanding in the context of scen...
Deep Convolutional Neural Networks, which are a family of biologically inspired machine vision algor...
This thesis comprises a body of work that investigates the use of deep learning for 2D and 3D scene ...
The task of object segmentation in videos is usually accomplished by processing appearance and motio...
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, e...
We introduce 3inGAN, an unconditional 3D generative model trained from 2D images of a single self-si...
Recent progress in 3D scene understanding enables scalable learning of representations across large ...
Human perception reliably identifies movable and immovable parts of 3D scenes, and completes the 3D ...
Scene representation is the process of converting sensory observations of an environment into compac...
High-fidelity 3D scene reconstruction from monocular videos continues to be challenging, especially ...
Semantic labelling is highly correlated with geometry and radiance reconstruction, as scene entities...
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D...
In contrast to numerous NLP and 2D computer vision foundational models, the learning of a robust and...
Neural 3D scene reconstruction methods have achieved impressive performance when reconstructing comp...
Perceiving 3D structure and recognizing objects and their properties around us is central to our und...
It is a long-standing problem in robotics to develop agents capable of executing diverse manipulatio...
Deep Convolutional Neural Networks, which are a family of biologically inspired machine vision algor...
This thesis comprises a body of work that investigates the use of deep learning for 2D and 3D scene ...
The task of object segmentation in videos is usually accomplished by processing appearance and motio...
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, e...
We introduce 3inGAN, an unconditional 3D generative model trained from 2D images of a single self-si...
Recent progress in 3D scene understanding enables scalable learning of representations across large ...
Human perception reliably identifies movable and immovable parts of 3D scenes, and completes the 3D ...
Scene representation is the process of converting sensory observations of an environment into compac...
High-fidelity 3D scene reconstruction from monocular videos continues to be challenging, especially ...
Semantic labelling is highly correlated with geometry and radiance reconstruction, as scene entities...
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D...
In contrast to numerous NLP and 2D computer vision foundational models, the learning of a robust and...
Neural 3D scene reconstruction methods have achieved impressive performance when reconstructing comp...
Perceiving 3D structure and recognizing objects and their properties around us is central to our und...
It is a long-standing problem in robotics to develop agents capable of executing diverse manipulatio...
Deep Convolutional Neural Networks, which are a family of biologically inspired machine vision algor...
This thesis comprises a body of work that investigates the use of deep learning for 2D and 3D scene ...
The task of object segmentation in videos is usually accomplished by processing appearance and motio...