The enrichment of social media expression makes multimodal sentiment analysis a research hotspot. However, modality heterogeneity brings great difficulties to effective cross-modal fusion, especially the modality alignment problem and the uncontrolled vector offset during fusion. In this paper, we propose a bimodal multi-head attention network (BMAN) based on text and audio, which adaptively captures the intramodal utterance features and complex intermodal alignment relationships. Specifically, we first set two independent unimodal encoders to extract the semantic features within each modality. Considering that different modalities deserve different weights, we further built a joint decoder to fuse the audio information into the text repres...
Emotion recognition is an increasingly important sub-field in artificial intelligence (AI). Advances...
Existing multimodal sentiment analysis models focus more on fusing highly correlated image-text pair...
Humans express their emotions via facial expressions, voice intonation and word choices. To infer th...
The rising use of online media has changed the social customs of the public. Users have become accus...
Multimodal affective computing, learning to recognize and interpret human affect and subjective info...
Emotion recognition has become one of the most researched subjects in the scientific community, espe...
Sentiment analysis (SA) has gained much traction In the field of artificial intelligence (AI) and na...
Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multipl...
Human communication includes rich emotional content, thus the development of multimodal emotion reco...
Interactive fusion methods have been successfully applied to multimodal sentiment analysis, due to ...
Multimodal video sentiment analysis is a rapidly growing area. It combines verbal (i.e., linguistic)...
Abstract Sentiment analysis (SA), a buzzword in the fields of artificial intelligence (AI) and natu...
Semantic-rich speech emotion recognition has a high degree of popularity in a range of areas. Speech...
Multimodal sentiment analysis is a very actively growing field of research. A promising area of oppo...
International audienceMultimodal neural network in sentiment analysis uses video, text and audio. Pr...
Emotion recognition is an increasingly important sub-field in artificial intelligence (AI). Advances...
Existing multimodal sentiment analysis models focus more on fusing highly correlated image-text pair...
Humans express their emotions via facial expressions, voice intonation and word choices. To infer th...
The rising use of online media has changed the social customs of the public. Users have become accus...
Multimodal affective computing, learning to recognize and interpret human affect and subjective info...
Emotion recognition has become one of the most researched subjects in the scientific community, espe...
Sentiment analysis (SA) has gained much traction In the field of artificial intelligence (AI) and na...
Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multipl...
Human communication includes rich emotional content, thus the development of multimodal emotion reco...
Interactive fusion methods have been successfully applied to multimodal sentiment analysis, due to ...
Multimodal video sentiment analysis is a rapidly growing area. It combines verbal (i.e., linguistic)...
Abstract Sentiment analysis (SA), a buzzword in the fields of artificial intelligence (AI) and natu...
Semantic-rich speech emotion recognition has a high degree of popularity in a range of areas. Speech...
Multimodal sentiment analysis is a very actively growing field of research. A promising area of oppo...
International audienceMultimodal neural network in sentiment analysis uses video, text and audio. Pr...
Emotion recognition is an increasingly important sub-field in artificial intelligence (AI). Advances...
Existing multimodal sentiment analysis models focus more on fusing highly correlated image-text pair...
Humans express their emotions via facial expressions, voice intonation and word choices. To infer th...