As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-Transducer has achieved evolving performance with more and more sophisticated neural network models of growing size and increasing training epochs. While strong computation resources seem to be the prerequisite of training superior models, we try to overcome it by carefully designing a more efficient training pipeline. In this work, we propose an efficient 3-stage progressive training pipeline to build highly-performing neural transducer models from scratch with very limited computation resources in a reasonable short time period. The effectiveness of each stage is experimentally verified on both Librispeech and Switchboard corpora. The propo...
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namel...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as...
Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can...
The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are ...
© 2014 IEEE. Recurrent neural network language models (RNNLMs) are becoming increasingly popular for...
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition...
© 2014 IEEE. Recurrent neural network language models (RNNLMs) are becoming increasingly popular for...
While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesi...
Recurrent neural network language models (RNNLMs) are becoming increasingly popular for speech recog...
Conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol seq...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Sequence transducers, such as the RNN-T and the Conformer-T, are one of the most promising models of...
Internal language model (ILM) subtraction has been widely applied to improve the performance of the ...
In this paper we describe the implementation of a complete ANN training procedure using the block m...
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namel...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as...
Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can...
The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are ...
© 2014 IEEE. Recurrent neural network language models (RNNLMs) are becoming increasingly popular for...
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition...
© 2014 IEEE. Recurrent neural network language models (RNNLMs) are becoming increasingly popular for...
While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesi...
Recurrent neural network language models (RNNLMs) are becoming increasingly popular for speech recog...
Conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol seq...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Sequence transducers, such as the RNN-T and the Conformer-T, are one of the most promising models of...
Internal language model (ILM) subtraction has been widely applied to improve the performance of the ...
In this paper we describe the implementation of a complete ANN training procedure using the block m...
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namel...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as...