Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer with external language model (LM) fusion for speech recognition. In this work, we show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view. Theoretically, we derive that the global optimum of maximum mutual information (MMI) training shares a similar formula as ILM subtraction. Empirically, we show that ILM subtraction and sequence discriminative training achieve similar performance across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and li...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) ...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech re...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
© 2019 Dr. Cong Duy Vu HoangNeural sequence models have recently achieved great success across vario...
Auto-regressive sequence models can estimate the distribution of any type of sequential data. To stu...
The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are ...
With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame...
End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech desp...
Deliberation networks are a family of sequence-to-sequence models, which have achieved state-of-the-...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) ...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech re...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
© 2019 Dr. Cong Duy Vu HoangNeural sequence models have recently achieved great success across vario...
Auto-regressive sequence models can estimate the distribution of any type of sequential data. To stu...
The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are ...
With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame...
End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech desp...
Deliberation networks are a family of sequence-to-sequence models, which have achieved state-of-the-...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...