Impressive progress in neural network-based single-channel speech source separation has been made in recent years. But those improvements have been mostly reported on anechoic data, a situation that is hardly met in practice. Taking the SepFormer as a starting point, which achieves state-of-the-art performance on anechoic mixtures, we gradually modify it to optimize its performance on reverberant mixtures. Although this leads to a word error rate improvement by 7 percentage points compared to the standard SepFormer implementation, the system ends up with only marginally better performance than a PIT-BLSTM separation system, that is optimized with rather straightforward means. This is surprising and at the same time sobering, challenging the...
Multichannel blind source separation performances rapidly degrade when the mixtures are highly rever...
Despite recent strides made in Speech Separation, most models are trained on datasets with neutral e...
Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by...
This paper examines the performance of several source separation systems on a speech separation task...
With the advancements in deep learning approaches, the performance of speech enhancing systems in th...
Separation of speech mixtures in noisy and reverberant environments remains a challenging task for s...
Many speech technologies, such as automatic speech recognition and speaker identification, are conve...
Abstract—This paper examines the performance of several source separation systems on a speech separa...
In real world environments, the speech signals received by our ears are usually a combination of dif...
Ph. D. Thesis.Monaural speech separation and enhancement aim to remove noise interference from the n...
Recording channel mismatch between training and testing conditions has been shown to be a serious pr...
In this paper, we compare different deep neural networks (DNN) in extracting speech signals from com...
\ua9 2017 IEEE. Monaural source separation is an important research area which can help to improve t...
Although recent advances in deep learning technology improved automatic speech recognition (ASR), it...
We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DN...
Multichannel blind source separation performances rapidly degrade when the mixtures are highly rever...
Despite recent strides made in Speech Separation, most models are trained on datasets with neutral e...
Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by...
This paper examines the performance of several source separation systems on a speech separation task...
With the advancements in deep learning approaches, the performance of speech enhancing systems in th...
Separation of speech mixtures in noisy and reverberant environments remains a challenging task for s...
Many speech technologies, such as automatic speech recognition and speaker identification, are conve...
Abstract—This paper examines the performance of several source separation systems on a speech separa...
In real world environments, the speech signals received by our ears are usually a combination of dif...
Ph. D. Thesis.Monaural speech separation and enhancement aim to remove noise interference from the n...
Recording channel mismatch between training and testing conditions has been shown to be a serious pr...
In this paper, we compare different deep neural networks (DNN) in extracting speech signals from com...
\ua9 2017 IEEE. Monaural source separation is an important research area which can help to improve t...
Although recent advances in deep learning technology improved automatic speech recognition (ASR), it...
We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DN...
Multichannel blind source separation performances rapidly degrade when the mixtures are highly rever...
Despite recent strides made in Speech Separation, most models are trained on datasets with neutral e...
Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by...