While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on t...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
For personalized speech generation, a neural text-to-speech (TTS) model must be successfully impleme...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
The modern paradigm in speech processing has demonstrated the importance of scale and compute for en...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Automatic speech recognition models are often adapted to improve their accuracy in a new domain. A p...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
In real-world applications, speaker recognition models often face various domain-mismatch challenges...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
For personalized speech generation, a neural text-to-speech (TTS) model must be successfully impleme...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
The modern paradigm in speech processing has demonstrated the importance of scale and compute for en...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Automatic speech recognition models are often adapted to improve their accuracy in a new domain. A p...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
In real-world applications, speaker recognition models often face various domain-mismatch challenges...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
For personalized speech generation, a neural text-to-speech (TTS) model must be successfully impleme...