Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-independent speech separation. It solves the label ambiguity problem by mapping time-frequency (TF) bins of the mixed spectrogram to an embedding space, and assigning contrastive embedding vectors to different TF regions in order to predict the mask of the target spectrogram of each speaker. The original deep clustering transforms the speech into the TF domain through a short-time Fourier transform (STFT). Since the frequency component of STFT is linear, while the frequency distribution of human auditory system is nonlinear. Therefore, we propose to use constant Q transform (CQT) instead of STFT to achieve a better simulation of the frequency...
This paper proposes an autoregressive approach to harness the power of deep learning for multi-speak...
This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). T...
In this paper we address the problem of multichan-nel speech enhancement in the short-time Fourier t...
Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-...
Abstract Neutral network (NN) and clustering are the two commonly used methods for speech separatio...
In this thesis, a low-latency variant of speaker-independent deep clustering method is proposed for...
In order to separate individual sources from convoluted speech mixtures, complex-domain independent ...
In order to separate individual sources from convoluted speech mixtures, complex-domain independent ...
Source Separation (SS) refers to a problem in signal processing where two or more mixed signal sourc...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
International audienceThis work analyzes the constant-Q filterbank-based time-frequency representati...
Deep neural networks with convolutional layers usually process the entire spectrogram of an audio si...
In this paper we investigate the use of observation weights and contextual time-frequency informatio...
Utterance level permutation invariant training (uPIT) technique is a state-of-the-art deep learning ...
Speech source separation aims to estimate one or more individual sources from mixtures of multiple s...
This paper proposes an autoregressive approach to harness the power of deep learning for multi-speak...
This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). T...
In this paper we address the problem of multichan-nel speech enhancement in the short-time Fourier t...
Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-...
Abstract Neutral network (NN) and clustering are the two commonly used methods for speech separatio...
In this thesis, a low-latency variant of speaker-independent deep clustering method is proposed for...
In order to separate individual sources from convoluted speech mixtures, complex-domain independent ...
In order to separate individual sources from convoluted speech mixtures, complex-domain independent ...
Source Separation (SS) refers to a problem in signal processing where two or more mixed signal sourc...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
International audienceThis work analyzes the constant-Q filterbank-based time-frequency representati...
Deep neural networks with convolutional layers usually process the entire spectrogram of an audio si...
In this paper we investigate the use of observation weights and contextual time-frequency informatio...
Utterance level permutation invariant training (uPIT) technique is a state-of-the-art deep learning ...
Speech source separation aims to estimate one or more individual sources from mixtures of multiple s...
This paper proposes an autoregressive approach to harness the power of deep learning for multi-speak...
This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). T...
In this paper we address the problem of multichan-nel speech enhancement in the short-time Fourier t...