Most models tasked to ground referential utterances in 2D and 3D scenes learn to select the referred object from a pool of object proposals provided by a pre-trained detector. This is limiting because an utterance may refer to visual entities at various levels of granularity, such as the chair, the leg of the chair, or the tip of the front leg of the chair, which may be missed by the detector. We propose a language grounding model that attends on the referential utterance and on the object proposal pool computed from a pre-trained detector to decode referenced objects with a detection head, without selecting them from the pool. In this way, it is helped by powerful pre-trained object detectors without being restricted by their misses. We ca...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
Existing language grounding models often use object proposal bottlenecks: a pre-trained detector pro...
Existing language grounding models often use object proposal bottlenecks: a pre-trained detector pro...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
Recent progress on 3D scene understanding has explored visual grounding (3DVG) to localize a target ...
© 2019 Association for Computational Linguistics. Grounding referring expressions to objects in an e...
This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, l...
Object grounding tasks aim to locate the target object in an image through verbal communications. Un...
Code and models are publicly available at https://github.com/cshizhe/vil3dref.International audience...
Localizing objects in 3D scenes according to the semantics of a given natural language is a fundamen...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
Existing language grounding models often use object proposal bottlenecks: a pre-trained detector pro...
Existing language grounding models often use object proposal bottlenecks: a pre-trained detector pro...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
Recent progress on 3D scene understanding has explored visual grounding (3DVG) to localize a target ...
© 2019 Association for Computational Linguistics. Grounding referring expressions to objects in an e...
This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, l...
Object grounding tasks aim to locate the target object in an image through verbal communications. Un...
Code and models are publicly available at https://github.com/cshizhe/vil3dref.International audience...
Localizing objects in 3D scenes according to the semantics of a given natural language is a fundamen...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...