International audienceThe main goal of this work is to generate expressive speech in different speaker’s voices for which no expressive speech data is available. The presented approach conditions Tacotron 2 speech synthesis with latent representations extracted from text, speaker identity, and reference expressive Mel spectrogram. We propose to use multiclass N-pair loss in the end-to-end multispeaker expressive Text-To-Speech (TTS) for improving the transfer of expressivity to the target speaker’s voice. We have jointly trained the end-to-end (E2E) TTS with multiclass N-pair loss to discriminate between various emotions. This augmentation of the loss function during training paves the way to enhance the latent space representation of emoti...
This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art for cr...
Getting a text to speech synthesis (TTS) system to speak lively animated stories like a human is ver...
This paper proposes architectures that facilitate the extrapolation of emotional expressions in deep...
Proceedings published in Oct. 2020, but conference to be merged with SLSP 2021International audience...
Recently, text-to-speech (TTS) synthesis has gained immense success in the human-computer interactio...
International audienceThe main goal of this work is to provide fine-grained transfer of expressivity...
International audienceIn this paper, we present a novel deep metric learning architecture along with...
International audienceExpressive speech synthesis using parametric approaches is constrained by the ...
International audienceThe main objective of this work is to study the expressivity transfer in a spe...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
Expressive synthesis from text is a challenging problem. There are two issues. First, read text is o...
In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architec...
A Text-to-Speech (TTS) synthesizer has to generate intelligible and natural speech while modeling li...
This paper aims to synthesize target speaker's speech with desired speaking style and emotion by tra...
International audienceGreat improvement has been made in the field of expressive audiovisual Text-to...
This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art for cr...
Getting a text to speech synthesis (TTS) system to speak lively animated stories like a human is ver...
This paper proposes architectures that facilitate the extrapolation of emotional expressions in deep...
Proceedings published in Oct. 2020, but conference to be merged with SLSP 2021International audience...
Recently, text-to-speech (TTS) synthesis has gained immense success in the human-computer interactio...
International audienceThe main goal of this work is to provide fine-grained transfer of expressivity...
International audienceIn this paper, we present a novel deep metric learning architecture along with...
International audienceExpressive speech synthesis using parametric approaches is constrained by the ...
International audienceThe main objective of this work is to study the expressivity transfer in a spe...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
Expressive synthesis from text is a challenging problem. There are two issues. First, read text is o...
In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architec...
A Text-to-Speech (TTS) synthesizer has to generate intelligible and natural speech while modeling li...
This paper aims to synthesize target speaker's speech with desired speaking style and emotion by tra...
International audienceGreat improvement has been made in the field of expressive audiovisual Text-to...
This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art for cr...
Getting a text to speech synthesis (TTS) system to speak lively animated stories like a human is ver...
This paper proposes architectures that facilitate the extrapolation of emotional expressions in deep...