In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for example, correlations in the space or time domain across modalities - but should be wisely exploited in order to benefit from their full predictive potential. We propose two deep learning architectures with multimodal cross-connections that allow for dataflow between several feature extractors (XFlow). Our models derive more interpretable features and achieve better performances than models which do not exchange representations, usefully exploiting correlations between audio and visual data, which have a dif...
Intelligently reasoning about the world often requires integrating data from multiple modalities, as...
We propose a novel deep training algorithm for joint representation of audio and visual information ...
We propose a novel deep training algorithm for joint representation of audio and visual information ...
In recent years, there have been numerous developments toward solving multimodal tasks, aiming to le...
The heterogeneity gap problem is the main challenge in cross-modal retrieval. Because cross-modal da...
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multime...
Comunicació presentada a: 18th International Society for Music Information Retrieval Conference (ISM...
In this work, we explore the multi-modal information provided by the Youtube-8M dataset by projectin...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
This work aims at investigating cross-modal connections between audio and video sources in the task ...
This work aims at investigating cross-modal connections between audio and video sources in the task ...
Machine learning algorithms can have difficulties adapting to data from different sources, for examp...
International audienceCommon approaches to problems involving multiple modalities (classification, r...
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multipl...
Intelligently reasoning about the world often requires integrating data from multiple modalities, as...
We propose a novel deep training algorithm for joint representation of audio and visual information ...
We propose a novel deep training algorithm for joint representation of audio and visual information ...
In recent years, there have been numerous developments toward solving multimodal tasks, aiming to le...
The heterogeneity gap problem is the main challenge in cross-modal retrieval. Because cross-modal da...
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multime...
Comunicació presentada a: 18th International Society for Music Information Retrieval Conference (ISM...
In this work, we explore the multi-modal information provided by the Youtube-8M dataset by projectin...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
This work aims at investigating cross-modal connections between audio and video sources in the task ...
This work aims at investigating cross-modal connections between audio and video sources in the task ...
Machine learning algorithms can have difficulties adapting to data from different sources, for examp...
International audienceCommon approaches to problems involving multiple modalities (classification, r...
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multipl...
Intelligently reasoning about the world often requires integrating data from multiple modalities, as...
We propose a novel deep training algorithm for joint representation of audio and visual information ...
We propose a novel deep training algorithm for joint representation of audio and visual information ...