Deep biasing for the Transducer can improve the recognition performance of rare words or contextual entities, which is essential in practical applications, especially for streaming Automatic Speech Recognition (ASR). However, deep biasing with large-scale rare words remains challenging, as the performance drops significantly when more distractors exist and there are words with similar grapheme sequences in the bias list. In this paper, we combine the phoneme and textual information of rare words in Transducers to distinguish words with similar pronunciation or spelling. Moreover, the introduction of training with text-only data containing more rare words benefits large-scale deep biasing. The experiments on the LibriSpeech corpus demonstrat...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Deep learning algorithm are increasingly used for speech enhancement (SE). In supervised methods, gl...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Personalization of automatic speech recognition (ASR) models is a widely studied topic because of it...
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition ...
Contextual information plays a crucial role in speech recognition technologies and incorporating it ...
In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in KWS ...
A common belief in the community is that deep learning requires large datasets to be effective. We s...
In this paper, we investigate the usage of large language models (LLMs) to improve the performance o...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recogniti...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Deep learning algorithm are increasingly used for speech enhancement (SE). In supervised methods, gl...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Personalization of automatic speech recognition (ASR) models is a widely studied topic because of it...
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition ...
Contextual information plays a crucial role in speech recognition technologies and incorporating it ...
In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in KWS ...
A common belief in the community is that deep learning requires large datasets to be effective. We s...
In this paper, we investigate the usage of large language models (LLMs) to improve the performance o...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recogniti...
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently ...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Deep learning algorithm are increasingly used for speech enhancement (SE). In supervised methods, gl...