End-to-end speech-to-text translation models are often initialized with pre-trained speech encoder and pre-trained text decoder. This leads to a significant training gap between pretraining and fine-tuning, largely due to the modality differences between speech outputs from the encoder and text inputs to the decoder. In this work, we aim to bridge the modality gap between speech and text to improve translation quality. We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text. While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation via modelling global and local dependencies of a speech sequence. Our experimental results show that our model outperforms a s...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
Pre-trained models used in the transfer-learning scenario are recently becoming very popular. Such m...
Document-level contextual information has shown benefits to text-based machine translation, but whet...
We present a method for introducing a text encoder into pre-trained end-to-end speech translation sy...
International audienceAdapter modules were recently introduced as an efficient alternative to fine-t...
Spoken language translation (SLT) exists within one of the most challenging intersections of speech ...
International audienceRecent research has shown that independently trained encoders and decoders, co...
In spoken language translation, integration of the ASR and MT components is critical for good perfor...
Adapter modules have emerged as a general parameter-efficient means to specialize a pretrained encod...
Paper accepted to IWSLT 2021This paper describes the submission to the IWSLT 2021 offline speech tra...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
This paper describes a new approach to language model adaptation for speech recognition based on the...
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. ...
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Mac...
Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confr...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
Pre-trained models used in the transfer-learning scenario are recently becoming very popular. Such m...
Document-level contextual information has shown benefits to text-based machine translation, but whet...
We present a method for introducing a text encoder into pre-trained end-to-end speech translation sy...
International audienceAdapter modules were recently introduced as an efficient alternative to fine-t...
Spoken language translation (SLT) exists within one of the most challenging intersections of speech ...
International audienceRecent research has shown that independently trained encoders and decoders, co...
In spoken language translation, integration of the ASR and MT components is critical for good perfor...
Adapter modules have emerged as a general parameter-efficient means to specialize a pretrained encod...
Paper accepted to IWSLT 2021This paper describes the submission to the IWSLT 2021 offline speech tra...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
This paper describes a new approach to language model adaptation for speech recognition based on the...
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. ...
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Mac...
Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confr...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
Pre-trained models used in the transfer-learning scenario are recently becoming very popular. Such m...
Document-level contextual information has shown benefits to text-based machine translation, but whet...