The modern paradigm in speech processing has demonstrated the importance of scale and compute for end-to-end speech recognition and synthesis. For instance, state-of-the-art self-supervised speech representation learning models typically consists of more than 300M model parameters and being trained on 24 GPUs. While such a paradigm has proven to be effective in certain offline settings, it remains unclear the extent to which it can be extended to online and small-device scenarios. This thesis is a step toward making advanced speech processing models more parameter-efficient. We aim to answer the following: do sparse subnetworks exist in modern speech processing models, and if so, how can we discover them efficiently? The key contribution...
While self-supervised speech representation learning (SSL) models serve a variety of downstream task...
Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN)...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-spee...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
In this paper, we explore the use of exemplar-based sparse representations (SRs) to map test feature...
Neural network pruning offers an effective method for compressing a multilingual automatic speech re...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN)...
While self-supervised speech representation learning (SSL) models serve a variety of downstream task...
Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN)...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-spee...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
In this paper, we explore the use of exemplar-based sparse representations (SRs) to map test feature...
Neural network pruning offers an effective method for compressing a multilingual automatic speech re...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN)...
While self-supervised speech representation learning (SSL) models serve a variety of downstream task...
Large-scale deep neural models, e.g., deep neural networks (DNN) and recurrent neural networks (RNN)...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...