In this thesis, we propose a new approach for Speech-to-Text translation, where thanks to an efficient Transformer we can work with a spectrogram without having to use convolutional layers before the Transformer. This allows the encoder to learn directly from the spectrogram and no information is lost, which we believe could be profitable. We have created an encoder-decoder model, where the encoder is an efficient Transformer -the Longformer- and the decoder is a traditional Transformer decoder. Firstly we trained our model for an Automatic Speech Recognition (ASR) task, and then for Speech Translation using the ASR pre-trained encoder. Our results are close to the ones obtained with convolutional layers and a regular Transformer, showing l...
We present a method for introducing a text encoder into pre-trained end-to-end speech translation sy...
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. ...
Speech-to-text translation (ST), which translates source language speech into target language text, ...
In this thesis, we propose a new approach for Speech-to-Text translation, where thanks to an efficie...
Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Reco...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models th...
Pre-trained models used in the transfer-learning scenario are recently becoming very popular. Such m...
Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapp...
International audienceWe introduce dual-decoder Transformer, a new model architecture that jointly p...
End-to-end neural machine translation does not require us to have specialized knowledge of investiga...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
This chapter presents an overview of the state of the art in natural language processing, exploring ...
For years speech translation has been faced as concatenation of speech recognition and machine trans...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
We present a method for introducing a text encoder into pre-trained end-to-end speech translation sy...
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. ...
Speech-to-text translation (ST), which translates source language speech into target language text, ...
In this thesis, we propose a new approach for Speech-to-Text translation, where thanks to an efficie...
Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Reco...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models th...
Pre-trained models used in the transfer-learning scenario are recently becoming very popular. Such m...
Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapp...
International audienceWe introduce dual-decoder Transformer, a new model architecture that jointly p...
End-to-end neural machine translation does not require us to have specialized knowledge of investiga...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
This chapter presents an overview of the state of the art in natural language processing, exploring ...
For years speech translation has been faced as concatenation of speech recognition and machine trans...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
We present a method for introducing a text encoder into pre-trained end-to-end speech translation sy...
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. ...
Speech-to-text translation (ST), which translates source language speech into target language text, ...