Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.Comment: ICASSP 2
Contending with signal variability due to source and channel effects is a critical problem in automa...
Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack gener...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep ...
2018-12-13Regularization is crucial to the success of many practical deep learning models, in partic...
State-of-the-art speaker verification systems are inherently dependent on some kind of human supervi...
There are individual differences in expressive behaviors driven by cultural norms and personality. T...
Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of...
Abstract: Detecting the mental state of a person has implications in psychiatry, medicine, psycholo...
Human emotion understanding is pivotal in making conversational technology mainstream. We view speec...
Recent advances in technology have given birth to intelligent speech assistants such as Siri and Ale...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Self-supervised learning has recently been implemented widely in speech processing areas, replacing ...
Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by...
Over the last few years, deep learning has grown in popularity for speaker verification, identificat...
Self-supervised speech models have grown fast during the past few years and have proven feasible for...
Contending with signal variability due to source and channel effects is a critical problem in automa...
Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack gener...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep ...
2018-12-13Regularization is crucial to the success of many practical deep learning models, in partic...
State-of-the-art speaker verification systems are inherently dependent on some kind of human supervi...
There are individual differences in expressive behaviors driven by cultural norms and personality. T...
Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of...
Abstract: Detecting the mental state of a person has implications in psychiatry, medicine, psycholo...
Human emotion understanding is pivotal in making conversational technology mainstream. We view speec...
Recent advances in technology have given birth to intelligent speech assistants such as Siri and Ale...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Self-supervised learning has recently been implemented widely in speech processing areas, replacing ...
Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by...
Over the last few years, deep learning has grown in popularity for speaker verification, identificat...
Self-supervised speech models have grown fast during the past few years and have proven feasible for...
Contending with signal variability due to source and channel effects is a critical problem in automa...
Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack gener...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep ...