Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER impr...
In this project we have developed a language model based on Artificial Neural Networks (ANNs) for us...
Human speech processing is inherently multi-modal, where visual cues (lip movements) help better und...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition ...
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a chal...
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual ...
Speech technology has developed to levels equivalent with human parity through the use of deep neura...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articu...
Automatic speech recognition system (ASR) contains three main parts: an acoustic model, a lexicon a...
Emotion recognition in conversations is essential for ensuring advanced human-machine interactions. ...
This paper demonstrates the significance of using contextual information in machine learning and spe...
State-of-the-art pretrained contextualized models (PCM) eg. BERT use tasks such as WiC and WSD to ev...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
In this work, we define barge-in verification as a supervised learning task where audio-only informa...
In this project we have developed a language model based on Artificial Neural Networks (ANNs) for us...
Human speech processing is inherently multi-modal, where visual cues (lip movements) help better und...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition ...
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a chal...
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual ...
Speech technology has developed to levels equivalent with human parity through the use of deep neura...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articu...
Automatic speech recognition system (ASR) contains three main parts: an acoustic model, a lexicon a...
Emotion recognition in conversations is essential for ensuring advanced human-machine interactions. ...
This paper demonstrates the significance of using contextual information in machine learning and spe...
State-of-the-art pretrained contextualized models (PCM) eg. BERT use tasks such as WiC and WSD to ev...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
In this work, we define barge-in verification as a supervised learning task where audio-only informa...
In this project we have developed a language model based on Artificial Neural Networks (ANNs) for us...
Human speech processing is inherently multi-modal, where visual cues (lip movements) help better und...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...