Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance deg...
Self-supervised speech representation learning has shown promising results in various speech process...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) perfor...
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech...
The use of speech processing applications, particularly speech recognition, has got a lot of attenti...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
SUPERB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different doma...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised speech representation learning has shown promising results in various speech process...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) perfor...
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech...
The use of speech processing applications, particularly speech recognition, has got a lot of attenti...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
SUPERB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different doma...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised speech representation learning has shown promising results in various speech process...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
Self-supervised speech recognition models require considerable labeled training data for learning hi...