We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. © 2011 IEEE
The task of classifying accent, as belonging to a native language speaker or a foreign language spea...
This paper presents a new speaker change detection system based on Long Short-Term Memory (L...
This thesis follows the trend of last decades in using neural networks in order to detect speech in ...
We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and ...
In this study, we investigate an audiovisual approach for classification of vocal outbursts (non-lin...
Non-linguistic Vocalization Recognition refers to the detection and classification of non-speech voi...
Speech and visual information are the most dominant modalities for a human to perceive emotion. A me...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
Non-verbal speech cues play an important role in human communication such as expressing emotional st...
Currently, the most popular speech recognition systems are based on unit selection - decision tree a...
A novel, data-driven approach to voice activity detection is presented. The approach is based on Lon...
Prediction plays a key role in recent computational models of the brain and it has been suggested th...
Abstract. We apply Long Short-Term Memory (LSTM) recurrent neural networks to a large corpus of unpr...
This work explores the use of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for aut...
Automatically recognizing human emotions from spontaneous and non-prototypical real-life data is cur...
The task of classifying accent, as belonging to a native language speaker or a foreign language spea...
This paper presents a new speaker change detection system based on Long Short-Term Memory (L...
This thesis follows the trend of last decades in using neural networks in order to detect speech in ...
We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and ...
In this study, we investigate an audiovisual approach for classification of vocal outbursts (non-lin...
Non-linguistic Vocalization Recognition refers to the detection and classification of non-speech voi...
Speech and visual information are the most dominant modalities for a human to perceive emotion. A me...
Automatic speech recognition (ASR) permits effective interaction between humans and machines in envi...
Non-verbal speech cues play an important role in human communication such as expressing emotional st...
Currently, the most popular speech recognition systems are based on unit selection - decision tree a...
A novel, data-driven approach to voice activity detection is presented. The approach is based on Lon...
Prediction plays a key role in recent computational models of the brain and it has been suggested th...
Abstract. We apply Long Short-Term Memory (LSTM) recurrent neural networks to a large corpus of unpr...
This work explores the use of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for aut...
Automatically recognizing human emotions from spontaneous and non-prototypical real-life data is cur...
The task of classifying accent, as belonging to a native language speaker or a foreign language spea...
This paper presents a new speaker change detection system based on Long Short-Term Memory (L...
This thesis follows the trend of last decades in using neural networks in order to detect speech in ...