By definition, spontaneous speech is unscripted and created on the fly by the speaker. It is dramatically different from read speech, where the words are authored as text before they are spoken. Spontaneous speech is emergent and transient, whereas text read out loud is pre-planned. For this reason, it is unsuitable to evaluate the usability and appropriateness of spontaneous speech synthesis by having it read out written texts sampled from for example newspapers or books. Instead, we need to use transcriptions of speech as the target - something that is much less readily available. In this paper, we introduce Starmap, a tool allowing developers to select a varied, representative set of utterances from a spoken genre, to be used for evaluat...
When creating voices for concatenative speech synthesis, several hours of speech uttered by a profes...
Freely available audiobooks are a rich resource of expressive speech recordings that can be used for...
This paper deals with the design of a speech corpus for a concatenation-based text-to-speech (TTS) s...
By definition, spontaneous speech is unscripted and created on the fly by the speaker. It is dramati...
this paper is to show that the performance of our automatic transcription tool compares to that of e...
Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutr...
To make synthesized speech more natural and col-loquial the regularity of synthesized speech has to ...
In this work we design an approach for automatic feature selection and voice creation for expressive...
This paper reports various investigations on recognizing spontaneous presentation speech in connecti...
Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disf...
The lack of prosody variation in text-to-speech systems contributes to their perceived unnaturalness...
Audiobooks are a powerful source of rich information for speech synthesis. Recent work has been foc...
Getting a text to speech synthesis (TTS) system to speak lively animated stories like a human is ver...
Generating responses that take user preferences into account requires adaptation at all levels of th...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
When creating voices for concatenative speech synthesis, several hours of speech uttered by a profes...
Freely available audiobooks are a rich resource of expressive speech recordings that can be used for...
This paper deals with the design of a speech corpus for a concatenation-based text-to-speech (TTS) s...
By definition, spontaneous speech is unscripted and created on the fly by the speaker. It is dramati...
this paper is to show that the performance of our automatic transcription tool compares to that of e...
Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutr...
To make synthesized speech more natural and col-loquial the regularity of synthesized speech has to ...
In this work we design an approach for automatic feature selection and voice creation for expressive...
This paper reports various investigations on recognizing spontaneous presentation speech in connecti...
Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disf...
The lack of prosody variation in text-to-speech systems contributes to their perceived unnaturalness...
Audiobooks are a powerful source of rich information for speech synthesis. Recent work has been foc...
Getting a text to speech synthesis (TTS) system to speak lively animated stories like a human is ver...
Generating responses that take user preferences into account requires adaptation at all levels of th...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
When creating voices for concatenative speech synthesis, several hours of speech uttered by a profes...
Freely available audiobooks are a rich resource of expressive speech recordings that can be used for...
This paper deals with the design of a speech corpus for a concatenation-based text-to-speech (TTS) s...