In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in computer vision. Phrase grounding is a generalization of more traditional computer vision tasks with the goal of localizing a natural language phrase spatially in a given image. Most recent work use state-of-the-art deep learning techniques to achieve good performance on this task. However, they do not capture complex dependencies among proposal regions and phrases that are crucial for the superior performance on the task. In this work we try to overcome this limitation through a model that makes no assumptions regarding the underlying dependencies in both of the modalities. We present an end-to-end framework for grounding of the phrases in i...
We introduce GroundNet, a neural network for referring expression recognition---the task of localizi...
Recent progress on 3D scene understanding has explored visual grounding (3DVG) to localize a target ...
Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding task, which locates the targ...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challe...
Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challe...
In this paper, we propose a novel graph learning framework for phrase grounding in the image. Develo...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
Grounding language in images has shown it can help improve performance on many image-language tasks....
Grounding language in the physical world enables humans to use words and sentences in context and to...
The problem of how abstract symbols, such as those in sys-tems of natural language, may be grounded ...
We introduce GroundNet, a neural network for referring expression recognition---the task of localizi...
Recent progress on 3D scene understanding has explored visual grounding (3DVG) to localize a target ...
Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding task, which locates the targ...
In recent years, phrase (or more generally language) grounding has emerged as a fundamental task in ...
Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challe...
Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challe...
In this paper, we propose a novel graph learning framework for phrase grounding in the image. Develo...
Visual Grounding (VG) is a task of locating a specific object in an image semantically matching a gi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
In this paper, we introduce a contextual grounding approach that captures the context in correspondi...
Grounding language in images has shown it can help improve performance on many image-language tasks....
Grounding language in the physical world enables humans to use words and sentences in context and to...
The problem of how abstract symbols, such as those in sys-tems of natural language, may be grounded ...
We introduce GroundNet, a neural network for referring expression recognition---the task of localizi...
Recent progress on 3D scene understanding has explored visual grounding (3DVG) to localize a target ...
Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding task, which locates the targ...