Voice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both neural bottleneck feature (BNF) approach and information perturbation approach. Specifically, we use a BNF encoder and a Perturbed-Wav encoder to form a content extractor to learn linguistic and para-linguistic features respectively, where BNFs come from a robust pre-trained ASR model and the perturbed wave becomes speaker-irrelevant after signal perturbation. We further fuse the linguistic and para-linguistic features through an attention ...
Voice conversion (VC) is a technique to transform a speaker identity included in a source speech wav...
Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizi...
International audienceMuch existing voice conversion (VC) systems are attractive owing to their high...
Voice conversion (VC) transforms an utterance to sound like another person without changing the ling...
Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate ...
Expressive voice conversion performs identity conversion for emotional speakers by jointly convertin...
Streaming voice conversion (VC) is the task of converting the voice of one person to another in real...
This paper presents a study of the baseline system of the VoicePrivacy 2020 challenge. This baseline...
This paper aims to synthesize target speaker's speech with desired speaking style and emotion by tra...
Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part o...
This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "arti...
The objective of voice conversion techniques is to convert a source speaker's voice so that it sound...
We propose voice conversion model from arbitrary source speaker to arbitrary target speaker with dis...
Better disentanglement of speech representation is essential to improve the quality of voice convers...
This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressiv...
Voice conversion (VC) is a technique to transform a speaker identity included in a source speech wav...
Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizi...
International audienceMuch existing voice conversion (VC) systems are attractive owing to their high...
Voice conversion (VC) transforms an utterance to sound like another person without changing the ling...
Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate ...
Expressive voice conversion performs identity conversion for emotional speakers by jointly convertin...
Streaming voice conversion (VC) is the task of converting the voice of one person to another in real...
This paper presents a study of the baseline system of the VoicePrivacy 2020 challenge. This baseline...
This paper aims to synthesize target speaker's speech with desired speaking style and emotion by tra...
Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part o...
This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "arti...
The objective of voice conversion techniques is to convert a source speaker's voice so that it sound...
We propose voice conversion model from arbitrary source speaker to arbitrary target speaker with dis...
Better disentanglement of speech representation is essential to improve the quality of voice convers...
This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressiv...
Voice conversion (VC) is a technique to transform a speaker identity included in a source speech wav...
Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizi...
International audienceMuch existing voice conversion (VC) systems are attractive owing to their high...