Self-supervised learning (SSL) representation for speech has achieved state-of-the-art (SOTA) performance on several downstream tasks. However, there remains room for improvement in speech enhancement (SE) tasks. In this study, we used a cross-domain feature to solve the problem that SSL embeddings may lack fine-grained information to regenerate speech signals. By integrating the SSL representation and spectrogram, the result can be significantly boosted. We further study the relationship between the noise robustness of SSL representation via clean-noisy distance (CN distance) and the layer importance for SE. Consequently, we found that SSL representations with lower noise robustness are more important. Furthermore, our experiments on the V...
We present RemixIT, a simple yet effective self-supervised method for training speech enhancement wi...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditi...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Self-supervised learning (SSL) achieves great success in monaural speech enhancement, while the accu...
Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades signi...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for sp...
We propose RemixIT, a simple and novel self-supervised training method for speech enhancement. The p...
Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency...
We present RemixIT, a simple yet effective self-supervised method for training speech enhancement wi...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditi...
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low...
Self-supervised learning (SSL) achieves great success in monaural speech enhancement, while the accu...
Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades signi...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for sp...
We propose RemixIT, a simple and novel self-supervised training method for speech enhancement. The p...
Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency...
We present RemixIT, a simple yet effective self-supervised method for training speech enhancement wi...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Self-supervised speech recognition models require considerable labeled training data for learning hi...