Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédias

Vukotic, Vedran

Publication date

September 2017

Publisher

HAL CCSD

Abstract

In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédias

Abstract

Extracted data

Architectures neuronales profondes pour l'apprentissage de représentation multimodales de données multimédias

Abstract

Extracted data

Related items

Related items