Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Due to the hierarchical structure of speech emotion, it is challenging to disentangle the emotional style for different speakers. Inspired by the recent success of speaker disentanglement with variational autoencoder (VAE), we propose an any-to-any expressive voice conversion framework, that is called StyleVC. StyleVC is designed to disentangle linguistic content, speaker identity, pitch, and emotional style information. We study the use of style encoder to model emotional style explicitly. At run-time, StyleVC converts both speaker identity and emotional style for arbitrary speakers. Experiments valida...
Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to...
In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architec...
This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "arti...
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preservi...
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preservi...
This paper aims to synthesize target speaker's speech with desired speaking style and emotion by tra...
Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preser...
Voice conversion (VC) transforms an utterance to sound like another person without changing the ling...
Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive...
Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while...
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the b...
The problem of style transfer consists in transferring the style from one signal to another while pr...
International audienceSpeech emotion conversion is the task of modifying the perceived emotion of a ...
Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a...
Human speech can be characterized by different components, including semantic content, speaker ident...
Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to...
In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architec...
This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "arti...
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preservi...
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preservi...
This paper aims to synthesize target speaker's speech with desired speaking style and emotion by tra...
Speech anonymisation prevents misuse of spoken data by removing any personal identifier while preser...
Voice conversion (VC) transforms an utterance to sound like another person without changing the ling...
Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive...
Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while...
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the b...
The problem of style transfer consists in transferring the style from one signal to another while pr...
International audienceSpeech emotion conversion is the task of modifying the perceived emotion of a ...
Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a...
Human speech can be characterized by different components, including semantic content, speaker ident...
Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to...
In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architec...
This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "arti...