Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the use of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with a speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR...
Computational speech segregation attempts to automatically separate speech from noise. This is chall...
Although distinguishing different sounds in noisy environment is a relative easy task for human, sour...
In real world environments, the speech signals received by our ears are usually a combination of dif...
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used...
In this thesis, a low-latency variant of speaker-independent deep clustering method is proposed for...
In this paper, we compare different deep neural networks (DNN) in extracting speech signals from com...
Human auditory cortex excels at selectively suppressing background noise to focus on a target speake...
Speech separation has long been an active research topic in the signal processing community with its...
Abstract—This paper describes an in-depth investigation of training criteria, network architectures ...
Deep neural networks (DNNs) are often used to tackle the single channel source separation (SCSS) pro...
Deep learning based speech enhancement in the short-time Fourier transform (STFT) domain typically u...
Speech source separation aims to estimate one or more individual sources from mixtures of multiple s...
Distant speech recognition is being revolutionized by deep learning, that has contributed to signifi...
Binaural features of interaural level difference and interaural phase difference have proved to be v...
In recent research, deep neural network (DNN) has been used to solve the monaural source separation...
Computational speech segregation attempts to automatically separate speech from noise. This is chall...
Although distinguishing different sounds in noisy environment is a relative easy task for human, sour...
In real world environments, the speech signals received by our ears are usually a combination of dif...
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used...
In this thesis, a low-latency variant of speaker-independent deep clustering method is proposed for...
In this paper, we compare different deep neural networks (DNN) in extracting speech signals from com...
Human auditory cortex excels at selectively suppressing background noise to focus on a target speake...
Speech separation has long been an active research topic in the signal processing community with its...
Abstract—This paper describes an in-depth investigation of training criteria, network architectures ...
Deep neural networks (DNNs) are often used to tackle the single channel source separation (SCSS) pro...
Deep learning based speech enhancement in the short-time Fourier transform (STFT) domain typically u...
Speech source separation aims to estimate one or more individual sources from mixtures of multiple s...
Distant speech recognition is being revolutionized by deep learning, that has contributed to signifi...
Binaural features of interaural level difference and interaural phase difference have proved to be v...
In recent research, deep neural network (DNN) has been used to solve the monaural source separation...
Computational speech segregation attempts to automatically separate speech from noise. This is chall...
Although distinguishing different sounds in noisy environment is a relative easy task for human, sour...
In real world environments, the speech signals received by our ears are usually a combination of dif...