Personalizing a speech synthesis system is a highly desired application, where the system can generate speech with the user's voice with rare enrolled recordings. There are two main approaches to build such a system in recent works: speaker adaptation and speaker encoding. On the one hand, speaker adaptation methods fine-tune a trained multi-speaker text-to-speech (TTS) model with few enrolled samples. However, they require at least thousands of fine-tuning steps for high-quality adaptation, making it hard to apply on devices. On the other hand, speaker encoding methods encode enrollment utterances into a speaker embedding. The trained TTS model can synthesize the user's speech conditioned on the corresponding speaker embedding. Nevertheles...
This paper describes a novel approach for the speaker adaptation of statistical parametric speech sy...
This paper describes a technique for synthesizing speech with any desired voice. The technique is ba...
Meta-learning has recently become a research hotspot in speaker verification (SV). We introduce two ...
Fine-tuning is a popular method for adapting text-to-speech (TTS) models to new speakers. However th...
The performance of automatic speech recognition systems degrades rapidly when there is a mismatch b...
We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approac...
Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation...
While most research into speech synthesis has focused on synthesizing high-quality speech for in-dat...
Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any adaptation time and...
Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pre-trained TTS model to adapt...
Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voi...
For personalized speech generation, a neural text-to-speech (TTS) model must be successfully impleme...
This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthe...
International audienceText-To-Speech synthesis with few data is a challenging task, in particular wh...
Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is tr...
This paper describes a novel approach for the speaker adaptation of statistical parametric speech sy...
This paper describes a technique for synthesizing speech with any desired voice. The technique is ba...
Meta-learning has recently become a research hotspot in speaker verification (SV). We introduce two ...
Fine-tuning is a popular method for adapting text-to-speech (TTS) models to new speakers. However th...
The performance of automatic speech recognition systems degrades rapidly when there is a mismatch b...
We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approac...
Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation...
While most research into speech synthesis has focused on synthesizing high-quality speech for in-dat...
Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any adaptation time and...
Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pre-trained TTS model to adapt...
Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voi...
For personalized speech generation, a neural text-to-speech (TTS) model must be successfully impleme...
This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthe...
International audienceText-To-Speech synthesis with few data is a challenging task, in particular wh...
Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is tr...
This paper describes a novel approach for the speaker adaptation of statistical parametric speech sy...
This paper describes a technique for synthesizing speech with any desired voice. The technique is ba...
Meta-learning has recently become a research hotspot in speaker verification (SV). We introduce two ...