Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing. The SSL model is normally pre-trained on a great variety of unlabelled data and a large model size is preferred to increase the modeling capacity. However, this might limit its potential applications due to the expensive computation and memory costs introduced by the oversize model. Miniaturization for SSL models has become an important research direction of practical value. To this end, we explore the effective distillation of HuBERT-based SSL models for automatic speech recognition (ASR). First, in order to establish a strong baseline, a comprehensive study on different student model structures is conducted. On top of this, as a supplemen...
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditi...
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) perfor...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
SUPERB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades signi...
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
The use of speech processing applications, particularly speech recognition, has got a lot of attenti...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditi...
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) perfor...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
SUPERB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades signi...
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
The use of speech processing applications, particularly speech recognition, has got a lot of attenti...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditi...
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) perfor...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...