Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly large, which contradicts the limited on-device resources. This gap could be more severe in multilingual/multitask scenarios requiring simultaneously recognizing multiple languages or executing multiple speech processing tasks. Additionally, strongly overparameterized speech SSL models tend to suffer from overfitting when being finetuned on low-resour...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
While self-supervised speech representation learning (SSL) models serve a variety of downstream task...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) perfor...
The modern paradigm in speech processing has demonstrated the importance of scale and compute for en...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being ...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades signi...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
While self-supervised speech representation learning (SSL) models serve a variety of downstream task...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) perfor...
The modern paradigm in speech processing has demonstrated the importance of scale and compute for en...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being ...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades signi...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
While self-supervised speech representation learning (SSL) models serve a variety of downstream task...