Imagined Visual Representations as Multimodal Embeddings

Collell, Guillem
Zhang, Ted
Moens, Marie-Francine

Publication date

February 2017

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Abstract

Language and vision provide complementary information. Integrating both modalities in a single multimodal representation is an unsolved problem with wide-reaching applications to both natural language processing and computer vision. In this paper, we present a simple and effective method that learns a language-to-vision mapping and uses its output visual predictions to build multimodal representations. In this sense, our method provides a cognitively plausible way of building representations, consistent with the inherently re-constructive and associative nature of human memory. Using seven benchmark concept similarity tests we show that the mapped (or imagined) vectors not only help to fuse multimodal information, but also outperform strong...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Imagined Visual Representations as Multimodal Embeddings

Abstract

Extracted data

Imagined Visual Representations as Multimodal Embeddings

Abstract

Extracted data

Related items

Related items