Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance -- and thus, to understand linguistic information. In this work, we investigate the extent in which this information is exploited during SER fine-tuning. Using a reproducible methodology based on open-source tools, we synthesise prosodically neutral speech utterances while varying the sentiment of the text. Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as nega...
Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of...
Recognizing emotions in spoken communication is crucial for advanced human-machine interaction. Curr...
Creating machines with the ability to reason, perceive, learn and make decisions based on a human li...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Human emotion understanding is pivotal in making conversational technology mainstream. We view speec...
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner h...
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner h...
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner h...
Self-supervised speech models have grown fast during the past few years and have proven feasible for...
We propose EmoDistill, a novel speech emotion recognition (SER) framework that leverages cross-modal...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep ...
This work explores the effect of gender and linguistic-based vocal variations on the accuracy of emo...
Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of...
Recognizing emotions in spoken communication is crucial for advanced human-machine interaction. Curr...
Creating machines with the ability to reason, perceive, learn and make decisions based on a human li...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Human emotion understanding is pivotal in making conversational technology mainstream. We view speec...
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner h...
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner h...
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner h...
Self-supervised speech models have grown fast during the past few years and have proven feasible for...
We propose EmoDistill, a novel speech emotion recognition (SER) framework that leverages cross-modal...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep ...
This work explores the effect of gender and linguistic-based vocal variations on the accuracy of emo...
Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of...
Recognizing emotions in spoken communication is crucial for advanced human-machine interaction. Curr...
Creating machines with the ability to reason, perceive, learn and make decisions based on a human li...