This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech. It utilizes text instructions as style prompts to modify the prosody and emotional information of the given speech. In contrast to previous approaches, which often rely on employing separate encoders like prosody and content encoders to handle diff...
Recently, a lot of works has been done in speech technology. Text-to-Speech and Automatic Speech Rec...
Expressive voice conversion performs identity conversion for emotional speakers by jointly convertin...
Accent plays a significant role in speech communication, influencing understanding capabilities and ...
We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre o...
This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressiv...
Better disentanglement of speech representation is essential to improve the quality of voice convers...
Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive...
Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation...
Voice conversion (VC) transforms an utterance to sound like another person without changing the ling...
Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research top...
In this paper, we propose an end-to-end text-to-speech system deployment wherein a user feeds input ...
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the b...
International audienceSpeech emotion conversion is the task of modifying the perceived emotion of a ...
This article focuses on developing a system for high-quality synthesized and converted speech by add...
Using a text description as prompt to guide the generation of text or images (e.g., GPT-3 or DALLE-2...
Recently, a lot of works has been done in speech technology. Text-to-Speech and Automatic Speech Rec...
Expressive voice conversion performs identity conversion for emotional speakers by jointly convertin...
Accent plays a significant role in speech communication, influencing understanding capabilities and ...
We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre o...
This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressiv...
Better disentanglement of speech representation is essential to improve the quality of voice convers...
Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive...
Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation...
Voice conversion (VC) transforms an utterance to sound like another person without changing the ling...
Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research top...
In this paper, we propose an end-to-end text-to-speech system deployment wherein a user feeds input ...
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the b...
International audienceSpeech emotion conversion is the task of modifying the perceived emotion of a ...
This article focuses on developing a system for high-quality synthesized and converted speech by add...
Using a text description as prompt to guide the generation of text or images (e.g., GPT-3 or DALLE-2...
Recently, a lot of works has been done in speech technology. Text-to-Speech and Automatic Speech Rec...
Expressive voice conversion performs identity conversion for emotional speakers by jointly convertin...
Accent plays a significant role in speech communication, influencing understanding capabilities and ...