Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousands of hours of transcribed data, limiting their use to only a few languages. Moreover, current state-of-the-art acoustic models are based on the Transformer architecture that scales quadratically with sequence lengths, hindering its use for long sequences. This thesis aims to reduce (a) the data and (b) the compute requirements for developing state-of-the-art ASR systems with only a few hundred hours of transcribed data or less. The first part of this thesis focuses on reducing the amount of transcribed data required to train these models. We propose an approach that uses dropout for uncertainty-aware semi-supervised learning. We show that ou...
We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-end...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...
Many of today's state-of-the-art automatic speech recognition (ASR) systems are based on hybrid hidd...
Over the past decades, the dominant approach towards building automatic speech recognition (ASR) sys...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Training domain-specific automatic speech recognition (ASR) systems requires a suitable amount of da...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Neural networks, especially those with more than one hidden layer, have re-emerged in Automatic Spee...
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems hav...
We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-end...
We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-end...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contra...
Many of today's state-of-the-art automatic speech recognition (ASR) systems are based on hybrid hidd...
Over the past decades, the dominant approach towards building automatic speech recognition (ASR) sys...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Training domain-specific automatic speech recognition (ASR) systems requires a suitable amount of da...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Neural networks, especially those with more than one hidden layer, have re-emerged in Automatic Spee...
Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems hav...
We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-end...
We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-end...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...