Grounding language in the physical world enables humans to use words and sentences in context and to link them to actions. Several recent computer vision studies have worked on the task of expression grounding: learning to select that part of an image that depicts the referent of a multi-word expression. The task is approached by joint processing of the language expression, visual information of individual candidate referents, and in some cases the general visual context, using neural models that combine recurrent and convolutional components (Rohrbach et al., 2016; Hu et al., 2016b,a). However, there is more than just the intended referent by itself that determines how a referring expression is phrased. When referring to an element of a sc...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In this paper we present a novel approach to generating referring expressions (GRE) that is tailored...
Object grounding tasks aim to locate the target object in an image through verbal communications. Un...
© 2019 Association for Computational Linguistics. Grounding referring expressions to objects in an e...
We introduce GroundNet, a neural network for referring expression recognition---the task of localizi...
An idealized, though simplistic, view of the referring expression production and grounding process i...
We present a visually-grounded language understanding model based on a study of how people verbally ...
The problem of how abstract symbols, such as those in sys-tems of natural language, may be grounded ...
The problem of how abstract symbols, such as those in systems of natural language, may be grounded i...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
Humans naturally use referring expressions with verbal utterances and nonverbal gestures to refer to...
Can language models learn grounded representations from text distribution alone? This question is bo...
Referring expression comprehension aims at grounding the object in an image referred to by the expre...
We propose a computational model of visually-grounded spatial language under-standing, based on a st...
In this paper we introduce a new game to crowd-source natural language referring expressions. By des...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In this paper we present a novel approach to generating referring expressions (GRE) that is tailored...
Object grounding tasks aim to locate the target object in an image through verbal communications. Un...
© 2019 Association for Computational Linguistics. Grounding referring expressions to objects in an e...
We introduce GroundNet, a neural network for referring expression recognition---the task of localizi...
An idealized, though simplistic, view of the referring expression production and grounding process i...
We present a visually-grounded language understanding model based on a study of how people verbally ...
The problem of how abstract symbols, such as those in sys-tems of natural language, may be grounded ...
The problem of how abstract symbols, such as those in systems of natural language, may be grounded i...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
Humans naturally use referring expressions with verbal utterances and nonverbal gestures to refer to...
Can language models learn grounded representations from text distribution alone? This question is bo...
Referring expression comprehension aims at grounding the object in an image referred to by the expre...
We propose a computational model of visually-grounded spatial language under-standing, based on a st...
In this paper we introduce a new game to crowd-source natural language referring expressions. By des...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In this paper we present a novel approach to generating referring expressions (GRE) that is tailored...
Object grounding tasks aim to locate the target object in an image through verbal communications. Un...