Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech. However, experiments are inconclusive on whether the cascade or the direct model is stronger, and have only been conducted under the unrealistic assumption that both are trained on equal amounts of data, ignoring other available speech recognition and machine translation corpora. In this paper, we demonstrate that direct speech translation models r...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
Neural machine translation has considerably improved the quality of automatic translations by learni...
In a pipeline speech translation system, automatic speech recognition (ASR) system will transmit err...
When building state-of-the-art speech translation models, the need for large computational resources...
For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models th...
Speech-to-text translation (ST), which translates source language speech into target language text, ...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
Speech translation has been traditionally tackled under a cascade approach, chaining speech recognit...
This paper describes FBK’s submission to the end-to-end English-German speech translation task at IW...
Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confr...
End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or dec...
[EN] The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Aut...
Pretrained models in acoustic and textual modalities can potentially improve speech translation for ...
The primary goal of this FBK’s systems submission to the IWSLT 2022 offline and simultaneous speech ...
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Mac...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
Neural machine translation has considerably improved the quality of automatic translations by learni...
In a pipeline speech translation system, automatic speech recognition (ASR) system will transmit err...
When building state-of-the-art speech translation models, the need for large computational resources...
For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models th...
Speech-to-text translation (ST), which translates source language speech into target language text, ...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
Speech translation has been traditionally tackled under a cascade approach, chaining speech recognit...
This paper describes FBK’s submission to the end-to-end English-German speech translation task at IW...
Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confr...
End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or dec...
[EN] The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Aut...
Pretrained models in acoustic and textual modalities can potentially improve speech translation for ...
The primary goal of this FBK’s systems submission to the IWSLT 2022 offline and simultaneous speech ...
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Mac...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
Neural machine translation has considerably improved the quality of automatic translations by learni...
In a pipeline speech translation system, automatic speech recognition (ASR) system will transmit err...