Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. We first pre-train an ASR encoder and apply AFS to dynamically estimate the importance of each encoded speech feature to ASR. A ST encoder, stacked on top of the ASR encoder, then receives the filtered features from the (frozen) ASR encoder. We take L0DROP (Zhang et al., 2020) as the backbone for AFS, and adapt it to sparsify speech features with respect to both temporal and feature dimensions. Results on LibriSpeech EnFr and MuST-C benchmarks show that AFS facilitates lear...
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perf...
[EN] The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Aut...
International audienceThis work investigates speaker adaptation and transfer learning for spoken lan...
Information in speech signals is not evenly distributed, making it an additional challenge for end-t...
End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or dec...
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Mac...
Document-level contextual information has shown benefits to text-based machine translation, but whet...
This paper describes FBK’s participation in the IWSLT 2020 offline speech translation (ST) task. The...
For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models th...
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. ...
The primary goal of this FBK’s systems submission to the IWSLT 2022 offline and simultaneous speech ...
This paper describes FBK’s submission to the end-to-end English-German speech translation task at IW...
We present a method for introducing a text encoder into pre-trained end-to-end speech translation sy...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Speech translation has traditionally been approached through cascaded models consisting of a speech ...
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perf...
[EN] The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Aut...
International audienceThis work investigates speaker adaptation and transfer learning for spoken lan...
Information in speech signals is not evenly distributed, making it an additional challenge for end-t...
End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or dec...
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Mac...
Document-level contextual information has shown benefits to text-based machine translation, but whet...
This paper describes FBK’s participation in the IWSLT 2020 offline speech translation (ST) task. The...
For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models th...
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. ...
The primary goal of this FBK’s systems submission to the IWSLT 2022 offline and simultaneous speech ...
This paper describes FBK’s submission to the end-to-end English-German speech translation task at IW...
We present a method for introducing a text encoder into pre-trained end-to-end speech translation sy...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Speech translation has traditionally been approached through cascaded models consisting of a speech ...
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perf...
[EN] The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Aut...
International audienceThis work investigates speaker adaptation and transfer learning for spoken lan...