Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge of delivering S2ST in real time is the accumulated delay between the translation and speech synthesis modules. While recently incremental text-to-speech (iTTS) models have shown large quality improvements, they typically require additional future text inputs to reach optimal performance. In this work, we minimize the initial waiting time of iTTS by adapting the upstream speech translator to generate high-quality pseudo lookahead for the speech synthesizer. After mitigating the initial delay, we demonstrate that the duration of synthesized speech also plays a crucial role on latency. We formalize this as a latency metric and then present a si...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
International audienceThis paper describes the ON-TRAC Consortium translation systems developed for ...
With the advent of high-quality speech synthesis, there is a lot of interest in controlling various ...
Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge...
In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We e...
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Simultaneous translation systems start producing the output while processing the partial source sent...
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditi...
Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest pos...
In simultaneous speech translation (SimulST), finding the best trade-off between high translation qu...
The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech ...
We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained e...
Simultaneous speech translation (SimulST) is the task in which output generation has to be performed...
Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
International audienceThis paper describes the ON-TRAC Consortium translation systems developed for ...
With the advent of high-quality speech synthesis, there is a lot of interest in controlling various ...
Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge...
In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We e...
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Simultaneous translation systems start producing the output while processing the partial source sent...
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditi...
Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest pos...
In simultaneous speech translation (SimulST), finding the best trade-off between high translation qu...
The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech ...
We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained e...
Simultaneous speech translation (SimulST) is the task in which output generation has to be performed...
Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
International audienceThis paper describes the ON-TRAC Consortium translation systems developed for ...
With the advent of high-quality speech synthesis, there is a lot of interest in controlling various ...