Expressive synthesis from text is a challenging problem. There are two issues. First, read text is often highly expressive to convey the emotion and scenario in the text. Second, since the expressive training speech is not always available for different speakers, it is necessary to develop methods to share the expressive information over speakers. This paper investigates the approach of using very expressive, highly diverse audiobook data from multiple speakers to build an expressive speech synthesis system. Both of two problems are addressed by considering a factorized framework where speaker and emotion are modeled in separate sub-spaces of a cluster adaptive training (CAT) parametric speech synthesis system. The sub-spaces for the expres...
International audienceThe main goal of this work is to generate expressive speech in different speak...
In this thesis, we study the expressivity of read speech with a particular type of data, which are...
Dans ces travaux de thèse nous abordons l'expressivité de la parole lue avec un type de données part...
Freely available audiobooks are a rich resource of expressive speech recordings that can be used for...
Generating expressive, naturally sounding, speech from text using a speech synthesis (TTS) system is...
Automatically generating expressive speech from plain text is an important research topic in speech ...
Audiobooks are a powerful source of rich information for speech synthesis. Recent work has been foc...
Getting a text to speech synthesis (TTS) system to speak lively animated stories like a human is ver...
This work aims at creating expressive voices from audiobooks using semantic selection. First, for ea...
In this work we design an approach for automatic feature selection and voice creation for expressive...
This work presents a study on the suitability of prosodic andacoustic features, with a special focus...
International audienceExpressive speech synthesis using parametric approaches is constrained by the ...
Nowadays, especially with the upswing of neural networks, speech synthesis is almost totally data dr...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
International audienceThe main goal of this work is to generate expressive speech in different speak...
In this thesis, we study the expressivity of read speech with a particular type of data, which are...
Dans ces travaux de thèse nous abordons l'expressivité de la parole lue avec un type de données part...
Freely available audiobooks are a rich resource of expressive speech recordings that can be used for...
Generating expressive, naturally sounding, speech from text using a speech synthesis (TTS) system is...
Automatically generating expressive speech from plain text is an important research topic in speech ...
Audiobooks are a powerful source of rich information for speech synthesis. Recent work has been foc...
Getting a text to speech synthesis (TTS) system to speak lively animated stories like a human is ver...
This work aims at creating expressive voices from audiobooks using semantic selection. First, for ea...
In this work we design an approach for automatic feature selection and voice creation for expressive...
This work presents a study on the suitability of prosodic andacoustic features, with a special focus...
International audienceExpressive speech synthesis using parametric approaches is constrained by the ...
Nowadays, especially with the upswing of neural networks, speech synthesis is almost totally data dr...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
International audienceThe main goal of this work is to generate expressive speech in different speak...
In this thesis, we study the expressivity of read speech with a particular type of data, which are...
Dans ces travaux de thèse nous abordons l'expressivité de la parole lue avec un type de données part...