We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target non-speech human audio signals --yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between real and generated audio. We validated the most common optimization techniques reported in the literature and a specifically designed neural network. We evaluated several popular quality metrics as error functions. These include both objective quality metrics and subjective-equivalent metrics. We compared the results in terms of total error and comput...
cote interne IRCAM: Rioux99bNone / NoneNational audienceVoicing techniques are used by organ builder...
All Statistical Parametric Speech Synthesizers consist of a linear pipeline of components. This view...
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication dev...
With the similarity between music and speech synthesis from symbolic input and the rapid development...
A two-dimensional physical model of the human vocal tract is described. Such a system promises incre...
The process of voiced sounds production can be described as follows: air coming from the lungs is fo...
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis ...
Parametric optimisation techniques are compared in their abilities to elicit parameter settings for ...
A text-to-speech (TTS) model typically factorizes speech attributes such as content, speaker and pro...
International audienceIntroductionIn their often cited paper published in 1987 in IEEE Trans. ASSP, ...
Foley sound synthesis refers to the creation of authentic, diegetic sound effects for media, such as...
Generative probabilistic and neural models of the speech signal are shown to be effective in speech ...
Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging ...
The model of speech production generally used in speech synthesis is that of a source modified by a ...
Manually configuring synthesizer parameters to reproduce a particular sound is a complex and challen...
cote interne IRCAM: Rioux99bNone / NoneNational audienceVoicing techniques are used by organ builder...
All Statistical Parametric Speech Synthesizers consist of a linear pipeline of components. This view...
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication dev...
With the similarity between music and speech synthesis from symbolic input and the rapid development...
A two-dimensional physical model of the human vocal tract is described. Such a system promises incre...
The process of voiced sounds production can be described as follows: air coming from the lungs is fo...
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis ...
Parametric optimisation techniques are compared in their abilities to elicit parameter settings for ...
A text-to-speech (TTS) model typically factorizes speech attributes such as content, speaker and pro...
International audienceIntroductionIn their often cited paper published in 1987 in IEEE Trans. ASSP, ...
Foley sound synthesis refers to the creation of authentic, diegetic sound effects for media, such as...
Generative probabilistic and neural models of the speech signal are shown to be effective in speech ...
Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging ...
The model of speech production generally used in speech synthesis is that of a source modified by a ...
Manually configuring synthesizer parameters to reproduce a particular sound is a complex and challen...
cote interne IRCAM: Rioux99bNone / NoneNational audienceVoicing techniques are used by organ builder...
All Statistical Parametric Speech Synthesizers consist of a linear pipeline of components. This view...
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication dev...