Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimenta...
Zegers J., Van hamme H., ''Improving source separation via multi-speaker representations'', 18th ann...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain appli...
Owing to the loss of effective information and incomplete feature extraction caused by the convoluti...
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker spee...
In recent years, researchers have become increasingly interested in speaker extraction (SE), which i...
In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an easy-to-impl...
Source Separation (SS) refers to a problem in signal processing where two or more mixed signal sourc...
A strong representation of a target speaker can aid in extracting important information regarding th...
This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). T...
This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acou...
In this paper, we present a novel framework that jointly performs speaker diarization, speech separa...
Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary ...
Streaming recognition and segmentation of multi-party conversations with overlapping speech is cruci...
Many speech technology applications expect speech input from a single speaker and usually fail when ...
Zegers J., Van hamme H., ''Improving source separation via multi-speaker representations'', 18th ann...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain appli...
Owing to the loss of effective information and incomplete feature extraction caused by the convoluti...
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-talker spee...
In recent years, researchers have become increasingly interested in speaker extraction (SE), which i...
In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an easy-to-impl...
Source Separation (SS) refers to a problem in signal processing where two or more mixed signal sourc...
A strong representation of a target speaker can aid in extracting important information regarding th...
This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). T...
This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acou...
In this paper, we present a novel framework that jointly performs speaker diarization, speech separa...
Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary ...
Streaming recognition and segmentation of multi-party conversations with overlapping speech is cruci...
Many speech technology applications expect speech input from a single speaker and usually fail when ...
Zegers J., Van hamme H., ''Improving source separation via multi-speaker representations'', 18th ann...
Despite the recent progress of automatic speech recognition (ASR) driven by deep learning, conversat...
Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain appli...