While text-to-speech has long been centered on the production of an intelligible message of good quality, interest has recently shifted to the generation of more natural and expressive speech. This comes as an answer to the widespread criticism stating that current speech synthesizers lack fundamental human components. This thesis tackles that issue by considering three fundamental stages of HMM-based speech synthesis: the phonetic and prosodic annotations of the training corpus and their automatic alignment with the speech signal. We first propose a systematic step-by-step study of HMM-based phonetic alignment in which the models are directly trained on the corpus to align. Based on a detailed analysis of the errors made by this technique,...
International audienceActually a lot of work on expressive speech focus on acoustic models and proso...
International audienceText-to-speech (TTS) systems are built on speech corpora which are labeled wit...
This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two leve...
While text-to-speech has long been centered on the production of an intelligible message of good qua...
While current research in speech synthesis focuses on the gen-eration of various speaking styles or ...
This paper proposes the integration of a two-layer prosody annotation specific to live sports commen...
International audienceThis study investigates the impact of phonetization and phonetic segmentation ...
This paper describes recent progress in our approach to generating expressive speech. A goal of text...
We analyse the contribution of higher-level elements of the linguistic specification of a data-drive...
Text-to-speech has long been centered on the production of an intelligible message of good quality. ...
International audienceChironomic stylization is the process of real-time modification of intonation ...
This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two leve...
Freely available audiobooks are a rich resource of expressive speech recordings that can be used for...
The paper assesses the capability of an HMM-based TTS system to produce German speech. The results a...
International audienceIncremental speech synthesis aims at delivering the synthetic voice while the ...
International audienceActually a lot of work on expressive speech focus on acoustic models and proso...
International audienceText-to-speech (TTS) systems are built on speech corpora which are labeled wit...
This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two leve...
While text-to-speech has long been centered on the production of an intelligible message of good qua...
While current research in speech synthesis focuses on the gen-eration of various speaking styles or ...
This paper proposes the integration of a two-layer prosody annotation specific to live sports commen...
International audienceThis study investigates the impact of phonetization and phonetic segmentation ...
This paper describes recent progress in our approach to generating expressive speech. A goal of text...
We analyse the contribution of higher-level elements of the linguistic specification of a data-drive...
Text-to-speech has long been centered on the production of an intelligible message of good quality. ...
International audienceChironomic stylization is the process of real-time modification of intonation ...
This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two leve...
Freely available audiobooks are a rich resource of expressive speech recordings that can be used for...
The paper assesses the capability of an HMM-based TTS system to produce German speech. The results a...
International audienceIncremental speech synthesis aims at delivering the synthetic voice while the ...
International audienceActually a lot of work on expressive speech focus on acoustic models and proso...
International audienceText-to-speech (TTS) systems are built on speech corpora which are labeled wit...
This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two leve...