Currently, there are mainly three Transformer encoder based streaming End to End (E2E) Automatic Speech Recognition (ASR) approaches, namely time-restricted methods, chunk-wise methods, and memory based methods. However, all of them have some limitations in aspects of global context modeling, linear computational complexity, and model parallelism. In this work, we aim to build a single model to achieve the benefits of all the three aspects for streaming E2E ASR. Particularly, we propose to use a shifted chunk mechanism instead of the conventional chunk mechanism for streaming Transformer and Conformer. This shifted chunk mechanism can significantly enhance modeling power through allowing chunk self-attention to capture global context across...
While transformers and their variant conformers show promising performance in speech recognition, th...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
This work studies the use of attention masking in transformer transducer based speech recognition fo...
This paper presents an in-depth study on a Sequentially Sampled Chunk Conformer, SSC-Conformer, for ...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Recently, self-attention-based transformers and conformers have been introduced as alternatives to R...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer infer...
Although recent advances in deep learning technology have boosted automatic speech recognition (ASR)...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
History and future contextual information are known to be important for accurate acoustic modeling. ...
This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acou...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours ...
[EN] The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Aut...
While transformers and their variant conformers show promising performance in speech recognition, th...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
This work studies the use of attention masking in transformer transducer based speech recognition fo...
This paper presents an in-depth study on a Sequentially Sampled Chunk Conformer, SSC-Conformer, for ...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Recently, self-attention-based transformers and conformers have been introduced as alternatives to R...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer infer...
Although recent advances in deep learning technology have boosted automatic speech recognition (ASR)...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
History and future contextual information are known to be important for accurate acoustic modeling. ...
This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acou...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours ...
[EN] The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Aut...
While transformers and their variant conformers show promising performance in speech recognition, th...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
This work studies the use of attention masking in transformer transducer based speech recognition fo...