Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking

Vukotić, Vedran
Raymond, Christian
Gravier, Guillaume

Publication date

October 2016

Publisher

HAL CCSD

Abstract

International audienceVideo hyperlinking represents a classical example of multimodal problems. Common approaches to such problems are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially deep autoencoders, have proven promising both for crossmodal translation and for early fusion via multimodal embedding. A particular architecture, bidirectional symmetrical deep neural networks, have been proven to yield improved multimodal embeddings over classical autoencoders, while also being able to perform crossmodal translation. In this work, we focus firstly at evaluating good single-modal continuous representations both for textual and for visual information. W...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking

Abstract

Extracted data

Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking

Abstract

Extracted data

Related items

Related items