Effective fusion of data from multiple modalities, such as video, speech, and text, is a challenging task due to the heterogeneous nature of multimodal data. In this work, we propose fusion techniques that aim to model context from different modalities effectively. Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide “how” to combine given multimodal features more effectively. We propose two networks: 1) Auto-Fusion network, which aims to compress information from different modalities while preserving the context, and 2) GAN-Fusion, which regularizes the learned latent space given context from complementing modalities. A quantitative evaluation on the tasks of multimodal mac...
In this paper we present a novel approach towards multi-modal emotion recognition on a challenging d...
When effectively used in deep learning models for classification, multi-modal data can provide rich ...
Current multimodal data processing methods use deep learning to combine complementary visual and tex...
abstract: Modern machine learning systems leverage data and features from multiple modalities to gai...
During recent years, the advances in computational and information systems have contributed to the g...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textua...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre ...
University of Technology Sydney. Faculty of Engineering and Information Technology.Multi-modal perce...
Multimodal datasets often feature a combination of continuous signals and a series of discrete event...
In this contribution, we investigate the effectiveness of deep fusion of text and audio features for...
Abstract: Representation learning methods have received a lot of attention by researchers and pract...
Multimodal Emotion Recognition in Conversation (ERC) has garnered growing attention from research co...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
In this paper we present a novel approach towards multi-modal emotion recognition on a challenging d...
When effectively used in deep learning models for classification, multi-modal data can provide rich ...
Current multimodal data processing methods use deep learning to combine complementary visual and tex...
abstract: Modern machine learning systems leverage data and features from multiple modalities to gai...
During recent years, the advances in computational and information systems have contributed to the g...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textua...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video r...
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre ...
University of Technology Sydney. Faculty of Engineering and Information Technology.Multi-modal perce...
Multimodal datasets often feature a combination of continuous signals and a series of discrete event...
In this contribution, we investigate the effectiveness of deep fusion of text and audio features for...
Abstract: Representation learning methods have received a lot of attention by researchers and pract...
Multimodal Emotion Recognition in Conversation (ERC) has garnered growing attention from research co...
Multi-modal approaches employ data from multiple input streams such as textual and visual domains. D...
In this paper we present a novel approach towards multi-modal emotion recognition on a challenging d...
When effectively used in deep learning models for classification, multi-modal data can provide rich ...
Current multimodal data processing methods use deep learning to combine complementary visual and tex...