In this paper, we tackle the problem of predicting the affective responses of movie viewers, based on the content of the movies. Current studies on this topic focus on video representation learning and fusion techniques to combine the extracted features for predicting affect. Yet, these typically, while ignoring the correlation between multiple modality inputs, ignore the correlation between temporal inputs (i.e., sequential features). To explore these correlations, a neural network architecture—namely AttendAffectNet (AAN)—uses the self-attention mechanism for predicting the emotions of movie viewers from different input modalities. Particularly, visual, audio, and text features are considered for predicting emotions (and expressed in term...
Recognizing emotional reactions of movie audiences to affective movie content is a challenging task ...
Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedi...
Exploiting the multimodal and temporal interaction between audio-visual channels is essential for au...
Predicting the emotional response of movie audiences to affective movie content is a challenging tas...
In this paper, we present our submission to 3rd Affective Behavior Analysis in-the-wild (ABAW) chall...
This study explored the feasibility of using shared neural patterns from brief affective episodes (v...
Emotion recognition is an increasingly important sub-field in artificial intelligence (AI). Advances...
Induced affect is the emotional effect of an object on an individual. It can be quantified through tw...
In Audio-Video Emotion Recognition (AVER), the idea is to have a human-level understanding of emotio...
The economic success of a movie depends on audience satisfaction and on how much they are emotionall...
We present our system description of input-levelmultimodal fusion of audio, video, and text forrecog...
Emotions play a crucial role in human-human communications with complex socio-psychological nature. ...
This paper addresses the problem of continuous emotion prediction in movies from multimodal cues. Th...
With the surge of services offering video-on-demand through streaming and the increased competition ...
© 2017 IEEE. Stories can have tremendous power - not only useful for entertainment, they can activat...
Recognizing emotional reactions of movie audiences to affective movie content is a challenging task ...
Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedi...
Exploiting the multimodal and temporal interaction between audio-visual channels is essential for au...
Predicting the emotional response of movie audiences to affective movie content is a challenging tas...
In this paper, we present our submission to 3rd Affective Behavior Analysis in-the-wild (ABAW) chall...
This study explored the feasibility of using shared neural patterns from brief affective episodes (v...
Emotion recognition is an increasingly important sub-field in artificial intelligence (AI). Advances...
Induced affect is the emotional effect of an object on an individual. It can be quantified through tw...
In Audio-Video Emotion Recognition (AVER), the idea is to have a human-level understanding of emotio...
The economic success of a movie depends on audience satisfaction and on how much they are emotionall...
We present our system description of input-levelmultimodal fusion of audio, video, and text forrecog...
Emotions play a crucial role in human-human communications with complex socio-psychological nature. ...
This paper addresses the problem of continuous emotion prediction in movies from multimodal cues. Th...
With the surge of services offering video-on-demand through streaming and the increased competition ...
© 2017 IEEE. Stories can have tremendous power - not only useful for entertainment, they can activat...
Recognizing emotional reactions of movie audiences to affective movie content is a challenging task ...
Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedi...
Exploiting the multimodal and temporal interaction between audio-visual channels is essential for au...