The absence of convincing intonation makes current parametric speech synthesis systems sound dull and lifeless, even when trained on expressive speech data. Typically, these systems use regression techniques to predict the fundamental frequency (F0) frame-by-frame. This approach leads to overlysmooth pitch contours and fails to construct an appropriate prosodic structure across the full utterance. In order to capture and reproduce larger-scale pitch patterns, we propose a template-based approach for automatic F0 generation, where per-syllable pitch-contour templates (from a small, automatically learned set) are predicted by a recurrent neural network (RNN). The use of syllable templates mitigates the over-smoothing problem and is a...
This paper introduces the Tilt intonational model and describes how this model can be used to automa...
Text-to-Prosody systems based on the use of prosodic databases extracted from natural speech will be...
A new method for predicting prosodic parameters, i.e. phone durations and F0 targets, from preproce...
This thesis addresses the problem of generating a range of natural sounding pitch contours for spee...
This paper describes a general system which maps from a phonological specification of an utterance J...
The use of neural networks in speech synthesis has been especially successful in the domain of proso...
Statistical parametric speech synthesis (SPSS) has seen improvements over recent years, especially ...
This paper describes an implementation of the rise/fall/connection (RFC) model of intonation for us...
This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthe...
This paper addresses the problem of generating a full range of appropriate intonation contours for ...
End-to-end text-to-speech synthesis systems achieved immense success in recent times, with improved ...
The intonation produced by current text-to-speech systems is often either flat or artificial soundi...
Intonation plays a crucial role in making synthetic speech sound more natural. However, intonation m...
In this paper, we report on an effort to provide a general-purpose spoken language generation tool f...
This paper presents an intonation generation system for use in a text-to-speech synthesis system. T...
This paper introduces the Tilt intonational model and describes how this model can be used to automa...
Text-to-Prosody systems based on the use of prosodic databases extracted from natural speech will be...
A new method for predicting prosodic parameters, i.e. phone durations and F0 targets, from preproce...
This thesis addresses the problem of generating a range of natural sounding pitch contours for spee...
This paper describes a general system which maps from a phonological specification of an utterance J...
The use of neural networks in speech synthesis has been especially successful in the domain of proso...
Statistical parametric speech synthesis (SPSS) has seen improvements over recent years, especially ...
This paper describes an implementation of the rise/fall/connection (RFC) model of intonation for us...
This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthe...
This paper addresses the problem of generating a full range of appropriate intonation contours for ...
End-to-end text-to-speech synthesis systems achieved immense success in recent times, with improved ...
The intonation produced by current text-to-speech systems is often either flat or artificial soundi...
Intonation plays a crucial role in making synthetic speech sound more natural. However, intonation m...
In this paper, we report on an effort to provide a general-purpose spoken language generation tool f...
This paper presents an intonation generation system for use in a text-to-speech synthesis system. T...
This paper introduces the Tilt intonational model and describes how this model can be used to automa...
Text-to-Prosody systems based on the use of prosodic databases extracted from natural speech will be...
A new method for predicting prosodic parameters, i.e. phone durations and F0 targets, from preproce...