The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are the RNN-Transducer (RNN-T) and the connectionist temporal classification (CTC) objectives. Both perform an alignment-free training by marginalizing over all possible alignments, but use different transition rules. Between these two loss types we can classify the monotonic RNN-T (MonoRNN-T) and the recently proposed CTC-like Transducer (CTC-T), which both can be realized using the graph temporal classification-transducer (GTC-T) loss function. Monotonic transducers have a few advantages. First, RNN-T can suffer from runaway hallucination, where a model keeps emitting non-blank symbols without advancing in time, often in an infinite loop. Secon...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
This study conducts a comparative analysis of three prominent machine learning models: Multi-Layer P...
Learning a set of tasks in sequence remains a challenge for artificial neural networks, which, in su...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition...
Optimization of modern ASR architectures is among the highest priority tasks since it saves many com...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignmen...
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignmen...
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignmen...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
The Transformer architecture model, based on self-attention and multi-head attention, has achieved r...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namel...
This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connection...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
This study conducts a comparative analysis of three prominent machine learning models: Multi-Layer P...
Learning a set of tasks in sequence remains a challenge for artificial neural networks, which, in su...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition...
Optimization of modern ASR architectures is among the highest priority tasks since it saves many com...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignmen...
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignmen...
Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignmen...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
The Transformer architecture model, based on self-attention and multi-head attention, has achieved r...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namel...
This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connection...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
This study conducts a comparative analysis of three prominent machine learning models: Multi-Layer P...
Learning a set of tasks in sequence remains a challenge for artificial neural networks, which, in su...