Fundamental frequency, or F0 is critical for high quality speech synthesis in HMM based speech synthesis. Traditionally, F0 values are considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Multi-space distribution HMM (MSDHMM) has been used for modelling the discontinuous F0. Recently, a continuous F0 modelling framework has been proposed and shown to be effective, where continuous F0 observations are assumed to always exist and voicing labels are explicitly modelled by an independent stream. In this paper, a refined continuous F0 modelling approach is proposed. Here, F0 values are assumed to be dependent on voicing labels and both are jointly modelled in a singl...
This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-fre...
This article focuses on developing a system for high-quality synthesized and converted speech by add...
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estima...
ICASSP2009: IEEE International Conference on Acoustics, Speech, and Signal Processing, April 19-24...
HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, ...
Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 s...
The Multi-Space Probability Distribution Hidden Markov model (MSD-HMM) is a discrete model that lear...
Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of t...
International audienceThis paper assesses the ability of a HMM-based speech synthesis systems to mod...
A uniform phase representation for the harmonic model in speech synthesis applications Gilles Degott...
The usual approach to automatic continuous speech recognition is what can be called the acoustic-pho...
Summarization: Hidden Markov models (HMMs) are becoming the dominant approach for text-to-speech syn...
This paper describes a trainable excitation approach to eliminate the unnaturalness of HMM-based spe...
This paper proposes the use of the Liljencrants-Fant model (LFmodel) to represent the glottal source...
HMM-based speech synthesis generally suffers from typical buzzi-ness due to over-simplified excitati...
This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-fre...
This article focuses on developing a system for high-quality synthesized and converted speech by add...
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estima...
ICASSP2009: IEEE International Conference on Acoustics, Speech, and Signal Processing, April 19-24...
HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, ...
Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 s...
The Multi-Space Probability Distribution Hidden Markov model (MSD-HMM) is a discrete model that lear...
Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of t...
International audienceThis paper assesses the ability of a HMM-based speech synthesis systems to mod...
A uniform phase representation for the harmonic model in speech synthesis applications Gilles Degott...
The usual approach to automatic continuous speech recognition is what can be called the acoustic-pho...
Summarization: Hidden Markov models (HMMs) are becoming the dominant approach for text-to-speech syn...
This paper describes a trainable excitation approach to eliminate the unnaturalness of HMM-based spe...
This paper proposes the use of the Liljencrants-Fant model (LFmodel) to represent the glottal source...
HMM-based speech synthesis generally suffers from typical buzzi-ness due to over-simplified excitati...
This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-fre...
This article focuses on developing a system for high-quality synthesized and converted speech by add...
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estima...