Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned internal language model (ILM) prior, in order to integrate the ELM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained neural language model with full context, which may be inappropriate for the estimation of ILM and deteriorate the integration perfor...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully aco...
For resource rich languages, recent works have shown Neu-ral Network based Language Models (NNLMs) t...
Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech re...
ASR model deployment environment is ever-changing, and the incoming speech can be switched across di...
Internal language model (ILM) subtraction has been widely applied to improve the performance of the ...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech desp...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
In this paper, we investigate the usage of large language models (LLMs) to improve the performance o...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
The acoustic and linguistic features are important cues for the spoken language identification (LID)...
Speech and text are two major forms of human language. The research community has been focusing on m...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
The advent of large-scale pre-trained language models has contributed greatly to the recent progress...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully aco...
For resource rich languages, recent works have shown Neu-ral Network based Language Models (NNLMs) t...
Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech re...
ASR model deployment environment is ever-changing, and the incoming speech can be switched across di...
Internal language model (ILM) subtraction has been widely applied to improve the performance of the ...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
End-to-end automatic speech recognition suffers from adaptation to unknown target domain speech desp...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
In this paper, we investigate the usage of large language models (LLMs) to improve the performance o...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
The acoustic and linguistic features are important cues for the spoken language identification (LID)...
Speech and text are two major forms of human language. The research community has been focusing on m...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
The advent of large-scale pre-trained language models has contributed greatly to the recent progress...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully aco...
For resource rich languages, recent works have shown Neu-ral Network based Language Models (NNLMs) t...