This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). The network is trained to learn to automatically exploit narrow-band speech separation information, such as spatial vector clustering of multiple speakers. Specifically, in the short-time Fourier transform (STFT) domain, the network processes each frequency independently, and is shared by all frequencies. For one frequency, the network inputs the STFT coefficients of multichannel mixture signals, and predicts the STFT coefficients of separated speech signals. Clustering of spatial vectors shares a similar principle with the self-attention mechanism in the sense of computing the similarity of vectors and then aggregating similar vectors. Theref...
Zegers J., Van hamme H., ''Joint sound source separation and speaker recognition'', 17th annual conf...
This paper addresses a method of multichannel signal separation (MSS) with its application to cockta...
Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications...
In this paper, we address the problem of multichannel speech enhancement in the short-time Fourier t...
In a scenario with multiple persons talking simultaneously, the spatial characteristics of the signa...
Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers ...
When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems s...
We present new results on single-channel speechseparation and suggest a new separation approach to i...
We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DN...
Thesis (M.S.)--Boston UniversityUnderstanding how we perceive speech in the face of competing sound ...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
International audienceWe propose a multichannel speech enhancement method using along short-term mem...
In this paper, we present a novel framework that jointly performs speaker diarization, speech separa...
We present a novel single-channel separation approach to improve the separation performance while re...
Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-...
Zegers J., Van hamme H., ''Joint sound source separation and speaker recognition'', 17th annual conf...
This paper addresses a method of multichannel signal separation (MSS) with its application to cockta...
Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications...
In this paper, we address the problem of multichannel speech enhancement in the short-time Fourier t...
In a scenario with multiple persons talking simultaneously, the spatial characteristics of the signa...
Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers ...
When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems s...
We present new results on single-channel speechseparation and suggest a new separation approach to i...
We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DN...
Thesis (M.S.)--Boston UniversityUnderstanding how we perceive speech in the face of competing sound ...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
International audienceWe propose a multichannel speech enhancement method using along short-term mem...
In this paper, we present a novel framework that jointly performs speaker diarization, speech separa...
We present a novel single-channel separation approach to improve the separation performance while re...
Deep clustering technique is a state-of-the-art deep learning-based method for multi-talker speaker-...
Zegers J., Van hamme H., ''Joint sound source separation and speaker recognition'', 17th annual conf...
This paper addresses a method of multichannel signal separation (MSS) with its application to cockta...
Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications...