Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chine...
Transformer encoder-decoder models have shown impressive performance in dialogue modeling. However, ...
Transformer models using segment-based processing have been an effective architecture for simultaneo...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse ph...
Simultaneous machine translation systems rely on a policy to schedule read and write operations in o...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech...
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims t...
Attention-based autoregressive models have achieved state-of-the-art performance in various sequence...
Simultaneous translation systems start producing the output while processing the partial source sent...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge...
The integration of syntactic structures into Transformer machine translation has shown positive resu...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chine...
Transformer encoder-decoder models have shown impressive performance in dialogue modeling. However, ...
Transformer models using segment-based processing have been an effective architecture for simultaneo...
Transformer-based models have gained increasing popularity achieving state-of-the-art performance in...
Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse ph...
Simultaneous machine translation systems rely on a policy to schedule read and write operations in o...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech...
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims t...
Attention-based autoregressive models have achieved state-of-the-art performance in various sequence...
Simultaneous translation systems start producing the output while processing the partial source sent...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge...
The integration of syntactic structures into Transformer machine translation has shown positive resu...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chine...
Transformer encoder-decoder models have shown impressive performance in dialogue modeling. However, ...