In this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based on conventional linguistic features, as they have been used in statistical parametric speech synthesis. The predicted parameters are transformed into speech using a conventional high-quality vocoder. In this paper, the conventional linguistic features are enriched using sentiment features. Different sentiment representations have been considered, combining sentiment probabilities with hierarchical distance and context. After preliminary analysis a listening experiment is conducted, where participants evaluate the differe...
This paper presents a weighted multi-distribution deep belief network (wMD-DBN) for context-dependen...
Generating expressive, naturally sounding, speech from text using a speech synthesis (TTS) system is...
The goal of the study is to predict acoustic features of expressive speech from semantic vector spac...
In this paper we present a DNN based speech synthesis system trained on an audiobook including senti...
Nowadays, especially with the upswing of neural networks, speech synthesis is almost totally data dr...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
This work aims at creating expressive voices from audiobooks using semantic selection. First, for ea...
This paper proposes architectures that facilitate the extrapolation of emotional expressions in deep...
International audienceDeep neural networks have become the state of the art in speech synthesis. The...
Recently, text-to-speech (TTS) synthesis has gained immense success in the human-computer interactio...
International audienceSpeech emotion conversion is the task of modifying the perceived emotion of a ...
International audienceGreat improvement has been made in the field of expressive audiovisual Text-to...
Speech can express subjective meanings and intents that, in order to be fully understood, rely heavi...
PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014Text-to-speech synthe...
This work presents a study on the suitability of prosodic andacoustic features, with a special focus...
This paper presents a weighted multi-distribution deep belief network (wMD-DBN) for context-dependen...
Generating expressive, naturally sounding, speech from text using a speech synthesis (TTS) system is...
The goal of the study is to predict acoustic features of expressive speech from semantic vector spac...
In this paper we present a DNN based speech synthesis system trained on an audiobook including senti...
Nowadays, especially with the upswing of neural networks, speech synthesis is almost totally data dr...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
This work aims at creating expressive voices from audiobooks using semantic selection. First, for ea...
This paper proposes architectures that facilitate the extrapolation of emotional expressions in deep...
International audienceDeep neural networks have become the state of the art in speech synthesis. The...
Recently, text-to-speech (TTS) synthesis has gained immense success in the human-computer interactio...
International audienceSpeech emotion conversion is the task of modifying the perceived emotion of a ...
International audienceGreat improvement has been made in the field of expressive audiovisual Text-to...
Speech can express subjective meanings and intents that, in order to be fully understood, rely heavi...
PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014Text-to-speech synthe...
This work presents a study on the suitability of prosodic andacoustic features, with a special focus...
This paper presents a weighted multi-distribution deep belief network (wMD-DBN) for context-dependen...
Generating expressive, naturally sounding, speech from text using a speech synthesis (TTS) system is...
The goal of the study is to predict acoustic features of expressive speech from semantic vector spac...