Grounded semantic composition for visual scenes

Peter Gorniak
Deb Roy

Publication date

January 2004

Abstract

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system’s successes and failures we reveal how visual context influences the semantics of utter...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Grounded semantic composition for visual scenes

Abstract

Extracted data

Grounded semantic composition for visual scenes

Abstract

Extracted data

Related items

Related items