Utterance level permutation invariant training (uPIT) technique is a state-of-the-art deep learning architecture for speaker independent multi-talker separation. uPIT solves the label ambiguity problem by minimizing the mean square error (MSE) over all permutations between outputs and targets. However, uPIT may be sub-optimal at segmental level because the optimization is not calculated over the individual frames. In this paper, we propose a constrained uPIT (cuPIT) to solve this problem by computing a weighted MSE loss using dynamic information (i.e., delta and acceleration). The weighted loss ensures the temporal continuity of output frames with the same speaker. Inspired by the heuristics (i.e., vocal tract continuity) in computational a...
Many apparently difficult problems can be solved by reduction to linear programming. Such problems a...
Neuroscience suggests that the sparse behavior of a neural population underlies the mechanisms of th...
International audienceWe propose a multichannel speech enhancement method using along short-term mem...
Speaker identification refers to the task of locating the face of a person with the same identity as...
The multichannel variational autoencoder (MVAE) integrates the rule-based update of a separation mat...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-...
Many apparently difficult problems can be solved by reduction to linear programming. Such problems a...
Master's thesis in Computer scienceThe cocktail party problem, also known as a single-channel multi-...
Deep learning based approaches have achieved promising performance in speaker-dependent single-chann...
The current monaural state of the art tools for speech separation relies on supervised learning. Thi...
Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture rec...
Ensemble methods often yield significant gains for automatic speech recognition. One method to obtai...
We introduce a new paradigm for single-channel target source separation where the sources of interes...
We present an algorithm for separating multiple speakers from a mixed single channel recording. The ...
Many apparently difficult problems can be solved by reduction to linear programming. Such problems a...
Neuroscience suggests that the sparse behavior of a neural population underlies the mechanisms of th...
International audienceWe propose a multichannel speech enhancement method using along short-term mem...
Speaker identification refers to the task of locating the face of a person with the same identity as...
The multichannel variational autoencoder (MVAE) integrates the rule-based update of a separation mat...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-...
Many apparently difficult problems can be solved by reduction to linear programming. Such problems a...
Master's thesis in Computer scienceThe cocktail party problem, also known as a single-channel multi-...
Deep learning based approaches have achieved promising performance in speaker-dependent single-chann...
The current monaural state of the art tools for speech separation relies on supervised learning. Thi...
Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture rec...
Ensemble methods often yield significant gains for automatic speech recognition. One method to obtai...
We introduce a new paradigm for single-channel target source separation where the sources of interes...
We present an algorithm for separating multiple speakers from a mixed single channel recording. The ...
Many apparently difficult problems can be solved by reduction to linear programming. Such problems a...
Neuroscience suggests that the sparse behavior of a neural population underlies the mechanisms of th...
International audienceWe propose a multichannel speech enhancement method using along short-term mem...