Disfluency, though originating from human spoken utterances, is primarily studied as a uni-modal text-based Natural Language Processing (NLP) task. Based on early-fusion and self-attention-based multimodal interaction between text and acoustic modalities, in this paper, we propose a novel multimodal architecture for disfluency detection from individual utterances. Our architecture leverages a multimodal dynamic fusion network that adds minimal parameters over an existing text encoder commonly used in prior art to leverage the prosodic and acoustic cues hidden in speech. Through experiments, we show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior ...
Emotion Recognition (ER) aims to classify human utterances into different emotion categories. Based ...
Hough J, Schlangen D. Joint, Incremental Disfluency Detection and Utterance Segmentation from Speech...
Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expe...
We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-l...
Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of sp...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
International audienceDisfluent speech has been previously addressed from two main perspectives: the...
People rarely speak in the same manner that they write – they are generally disfluent. Disfluencies ...
Abstract. Previous research has shown that speech disfluencies- speech errors that occur in spoken l...
Within the language system, several of the language production levels may be involved in the product...
Theoretical thesis.Bibliography: pages 43-46.1. Introduction -- 2. Literature review -- 3. LSTM nois...
Within the language system, several of the language production levels may be involved in the product...
The synthesis of spontaneous natural speech is a challenge. One way to approach it is to introduce d...
The synthesis of spontaneous natural speech is a challenge. One way to approach it is to introduce d...
The synthesis of spontaneous natural speech is a challenge. One way to approach it is to introduce d...
Emotion Recognition (ER) aims to classify human utterances into different emotion categories. Based ...
Hough J, Schlangen D. Joint, Incremental Disfluency Detection and Utterance Segmentation from Speech...
Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expe...
We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-l...
Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of sp...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
International audienceDisfluent speech has been previously addressed from two main perspectives: the...
People rarely speak in the same manner that they write – they are generally disfluent. Disfluencies ...
Abstract. Previous research has shown that speech disfluencies- speech errors that occur in spoken l...
Within the language system, several of the language production levels may be involved in the product...
Theoretical thesis.Bibliography: pages 43-46.1. Introduction -- 2. Literature review -- 3. LSTM nois...
Within the language system, several of the language production levels may be involved in the product...
The synthesis of spontaneous natural speech is a challenge. One way to approach it is to introduce d...
The synthesis of spontaneous natural speech is a challenge. One way to approach it is to introduce d...
The synthesis of spontaneous natural speech is a challenge. One way to approach it is to introduce d...
Emotion Recognition (ER) aims to classify human utterances into different emotion categories. Based ...
Hough J, Schlangen D. Joint, Incremental Disfluency Detection and Utterance Segmentation from Speech...
Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expe...