This article focuses on developing a system for high-quality synthesized and converted speech by addressing three fundamental principles. Although the noise-like component in the state-of-the-art parametric vocoders (for example, STRAIGHT) is often not accurate enough, a novel analytical approach for modeling unvoiced excitations using a temporal envelope is proposed. Discrete All Pole, Frequency Domain Linear Prediction, Low Pass Filter, and True envelopes are firstly studied and applied to the noise excitation signal in our continuous vocoder. Second, we build a deep learning model based text–to–speech (TTS) which converts written text into human-like speech with a feed-forward and several sequence-to-sequence models (long short-term memo...
Recently, generative neural network models which operate directly on raw audio, such as WaveNet, hav...
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estima...
In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achie...
Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the f...
The quality of the vocoder plays a crucial role in the performance of parametric speech synthesis sy...
In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using ...
The quality of the vocoder plays a crucial role in the performance of parametric speech synthesis sy...
Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the f...
In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating...
Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS...
The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-hum...
The excitation for LPC speech synthesis usually consists of two separate signals- a delta-function p...
Most text-to-speech (TTS) methods use high-quality speech corpora recorded in a well-designed enviro...
The main challenge introduced in current voice conversion is the tradeoff between speaker similarity...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
Recently, generative neural network models which operate directly on raw audio, such as WaveNet, hav...
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estima...
In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achie...
Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the f...
The quality of the vocoder plays a crucial role in the performance of parametric speech synthesis sy...
In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using ...
The quality of the vocoder plays a crucial role in the performance of parametric speech synthesis sy...
Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the f...
In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating...
Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS...
The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-hum...
The excitation for LPC speech synthesis usually consists of two separate signals- a delta-function p...
Most text-to-speech (TTS) methods use high-quality speech corpora recorded in a well-designed enviro...
The main challenge introduced in current voice conversion is the tradeoff between speaker similarity...
In modern days synthesis of human images and videos is arguably one of the most popular topics in th...
Recently, generative neural network models which operate directly on raw audio, such as WaveNet, hav...
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estima...
In this work, accurate spectral envelope estimation is applied to Voice Conversion in order to achie...