Self-supervised pre-training could effectively improve the performance of low-resource automatic speech recognition (ASR). However, existing self-supervised pre-training are task-agnostic, i.e., could be applied to various downstream tasks. Although it enlarges the scope of its application, the capacity of the pre-trained model is not fully utilized for the ASR task, and the learned representations may not be optimal for ASR. In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained mod...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Self-supervised speech models have grown fast during the past few years and have proven feasible for...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations i...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize m...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data t...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Self-supervised speech models have grown fast during the past few years and have proven feasible for...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations i...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize m...
The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data t...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Self-supervised speech models have grown fast during the past few years and have proven feasible for...