Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. Shiftable Context is also broadly applicable to segment-based transformers for streaming tasks. Our experiments on the English-German, English-French, and English-Spanish language pairs from the MUST-...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real...
Simultaneous speech translation is an essential communication task difficult for humans whereby a tr...
End-to-end simultaneous speech translation (SimulST) outputs translation while receiving the streami...
Training speech translation (ST) models requires large and high-quality datasets. MuST-C is one of t...
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined fr...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
One of the things that need to change when it comes to machine translation is the models' ability to...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Traditional machine translation industrial systems usually handle sentences independently, neglectin...
This paper introduces a new data augmentation method for neural machine translation that can enforce...
Spoken language translation (SLT) exists within one of the most challenging intersections of speech ...
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real...
Simultaneous speech translation is an essential communication task difficult for humans whereby a tr...
End-to-end simultaneous speech translation (SimulST) outputs translation while receiving the streami...
Training speech translation (ST) models requires large and high-quality datasets. MuST-C is one of t...
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined fr...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
One of the things that need to change when it comes to machine translation is the models' ability to...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Traditional machine translation industrial systems usually handle sentences independently, neglectin...
This paper introduces a new data augmentation method for neural machine translation that can enforce...
Spoken language translation (SLT) exists within one of the most challenging intersections of speech ...
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real...