In this article, three adaptation methods are compared based on how well they change the speaking style of a neural network based text-to-speech (TTS) voice. The speaking style conversion adopted here is from normal to Lombard speech. The selected adaptation methods are: auxiliary features (AF), learning hidden unit contribution (LHUC), and fine-tuning (FT). Furthermore, four state-of-the-art TTS vocoders are compared in the same context. The evaluated vocoders are: GlottHMM, GlottDNN, STRAIGHT, and pulse model in log-domain (PML). Objective and subjective evaluations were conducted to study the performance of both the adaptation methods and the vocoders. In the subjective evaluations, speaking style similarity and speech intelligibility we...
The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispe...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that c...
Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality ...
Currently, there is increasing interest to use sequence-to-sequence models in text-to-speech (TTS) s...
Speaking style conversion (SSC) is the technology of converting natural speech signals from one styl...
Long Short-Term Memory (LSTM) is a recurrent neural net-work (RNN) architecture specializing in mode...
Speaking style conversion is the technology of converting natural speech signals from one style to a...
Generating speech in different styles from any given style is a challenging research problem in spee...
Lombard speech is a speaking style associated with increased vocal effort that is naturally used by ...
During the 2000s decade, unit-selection based text-to-speech was the dominant commercial technology....
International audienceThis paper investigates speaker adaptation techniques for bidirectional long ...
Speaking style conversion (SSC) is the technology of converting natural speech signals from one styl...
Speaking style conversion (SSC) is the technology of converting natural speech signals from one styl...
Speech synthesis technology has a wide range of applications such as voice assistants. In recent yea...
In spite of recent advances in automatic speech recognition, the performance of state-of-the-art spe...
The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispe...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that c...
Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality ...
Currently, there is increasing interest to use sequence-to-sequence models in text-to-speech (TTS) s...
Speaking style conversion (SSC) is the technology of converting natural speech signals from one styl...
Long Short-Term Memory (LSTM) is a recurrent neural net-work (RNN) architecture specializing in mode...
Speaking style conversion is the technology of converting natural speech signals from one style to a...
Generating speech in different styles from any given style is a challenging research problem in spee...
Lombard speech is a speaking style associated with increased vocal effort that is naturally used by ...
During the 2000s decade, unit-selection based text-to-speech was the dominant commercial technology....
International audienceThis paper investigates speaker adaptation techniques for bidirectional long ...
Speaking style conversion (SSC) is the technology of converting natural speech signals from one styl...
Speaking style conversion (SSC) is the technology of converting natural speech signals from one styl...
Speech synthesis technology has a wide range of applications such as voice assistants. In recent yea...
In spite of recent advances in automatic speech recognition, the performance of state-of-the-art spe...
The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispe...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that c...
Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized high-quality ...