Voice dictation is an increasingly important text input modality. Existing systems that allow both dictation and editing-by-voice restrict their command language to flat templates invoked by trigger words. In this work, we study the feasibility of allowing users to interrupt their dictation with spoken editing commands in open-ended natural language. We introduce a new task and dataset, TERTiUS, to experiment with such systems. To support this flexibility in real-time, a system must incrementally segment and classify spans of speech as either dictation or command, and interpret the spans that are commands. We experiment with using large pre-trained language models to predict the edited text, or alternatively, to predict a small text-editing...
International audienceThis paper presents an exploratory work to automatically insert disfluencies i...
We present techniques for the incremental interpretation and prediction of utterance meaning in dia...
Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by ...
Voice dictation is an increasingly important text input modality. Existing systems that allow both d...
Audiobooks are a powerful source of rich information for speech synthesis. Recent work has been foc...
We present the design of a spoken dialogue system to provide feedback to users of an autonomous syst...
We present EdiTTS, an off-the-shelf speech editing methodology based on score-based generative model...
A spoken dialog system performs best when users speak within the grammar that the system understands...
In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing...
A spoken dialog system performs best when users speak within the grammar that the system understands...
In this paper, we report on a pilot mixed-methods experiment investigating the effects on productiv...
Atterer M, Baumann T, Schlangen D. Towards Incremental End-of-Utterance Detection in Dialogue System...
Spontaneous human speech is peppered with errors and disfluencies. Previous research has demonstrat...
Buß O, Baumann T, Schlangen D. Collaborating on Utterances with a Spoken Dialogue System Using an IS...
Distributional shift is a central challenge in the deployment of machine learning models as they can...
International audienceThis paper presents an exploratory work to automatically insert disfluencies i...
We present techniques for the incremental interpretation and prediction of utterance meaning in dia...
Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by ...
Voice dictation is an increasingly important text input modality. Existing systems that allow both d...
Audiobooks are a powerful source of rich information for speech synthesis. Recent work has been foc...
We present the design of a spoken dialogue system to provide feedback to users of an autonomous syst...
We present EdiTTS, an off-the-shelf speech editing methodology based on score-based generative model...
A spoken dialog system performs best when users speak within the grammar that the system understands...
In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing...
A spoken dialog system performs best when users speak within the grammar that the system understands...
In this paper, we report on a pilot mixed-methods experiment investigating the effects on productiv...
Atterer M, Baumann T, Schlangen D. Towards Incremental End-of-Utterance Detection in Dialogue System...
Spontaneous human speech is peppered with errors and disfluencies. Previous research has demonstrat...
Buß O, Baumann T, Schlangen D. Collaborating on Utterances with a Spoken Dialogue System Using an IS...
Distributional shift is a central challenge in the deployment of machine learning models as they can...
International audienceThis paper presents an exploratory work to automatically insert disfluencies i...
We present techniques for the incremental interpretation and prediction of utterance meaning in dia...
Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by ...