In unit selection-based concatenative speech synthesis, join cost (also known as concatenation cost), which measures how well two units can be joined together, is one of the main criteria for selecting appropriate units from the inventory. Usually, some form of local parameter smoothing is also needed to disguise the remaining discontinuities. This paper presents a subjective evaluation of three join cost functions and three smoothing methods. We also describe the design and performance of a listening test. The three join cost functions were taken from our previous study, where we proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinu...
Our goal is to automatically learn a PERCEPTUALLY-optimal target cost function for a unit selection ...
In concatenative text-to-speech (TTS) synthesis systems unit selection aims to reduce the number of ...
Synthesized speech from text-to-speech systems is generally produced from the concatenation of small...
In our previous papers, we have proposed join cost functions derived from spectral distances, which ...
In our previous papers, we have proposed join cost functions derived from spectral distances, which ...
Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high qu...
Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high qu...
In our previous papers, we have proposed join cost functions derived from spectral distances, which ...
The quality of unit selection based concatenative speech synthesis mainly depends on how well two su...
In unit selection based concatenative speech systems, join cost, which measures how well two units c...
In unit selection based concatenative speech systems, join cost, which measures how well two units c...
Unit selection synthesis predominates today, but is not yet of a quality to rival natural speech. U...
We introduce a new method for computing join cost in unit-selection speech synthesis which uses a li...
This project aims to contribute to current research on the quality of speech synthesis by conductin...
A significant problem with unit selection based speech synthesis is the listener perception of soun...
Our goal is to automatically learn a PERCEPTUALLY-optimal target cost function for a unit selection ...
In concatenative text-to-speech (TTS) synthesis systems unit selection aims to reduce the number of ...
Synthesized speech from text-to-speech systems is generally produced from the concatenation of small...
In our previous papers, we have proposed join cost functions derived from spectral distances, which ...
In our previous papers, we have proposed join cost functions derived from spectral distances, which ...
Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high qu...
Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high qu...
In our previous papers, we have proposed join cost functions derived from spectral distances, which ...
The quality of unit selection based concatenative speech synthesis mainly depends on how well two su...
In unit selection based concatenative speech systems, join cost, which measures how well two units c...
In unit selection based concatenative speech systems, join cost, which measures how well two units c...
Unit selection synthesis predominates today, but is not yet of a quality to rival natural speech. U...
We introduce a new method for computing join cost in unit-selection speech synthesis which uses a li...
This project aims to contribute to current research on the quality of speech synthesis by conductin...
A significant problem with unit selection based speech synthesis is the listener perception of soun...
Our goal is to automatically learn a PERCEPTUALLY-optimal target cost function for a unit selection ...
In concatenative text-to-speech (TTS) synthesis systems unit selection aims to reduce the number of ...
Synthesized speech from text-to-speech systems is generally produced from the concatenation of small...