International audienceIn this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data, with up to 3 simultaneous speakers. The obtained results indicate that the majority of the proposed architectures either perform on par, or outperform the CRNN baseline, especially in the multisource scenario. Moreover, by avoiding the recurrent layers, the proposed models lend themselves to parallel computing, which is shown...
International audienceSpeaker counting is the task of estimating the number of people that are simul...
Abstract The goal of sound event detection and localization (SELD) is to identify each individual so...
In this paper we investigate the importance of the extent of memory in sequential self attention for...
International audienceIn this work, we propose a novel self-attention based neural network for robus...
International audienceIn this work, we propose to extend a state-of-the-art multi-source localizatio...
Sound source localization (SSL) is a subtask of audio scene analysis that has challenged researchers...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task addi...
La localisation de sources sonores est une sous-tâche de l'analyse de scènes sonores qui a défié les...
International audienceLocalizing audio sources is challenging in real reverberant environments, espe...
Transformer models are now widely used for speech processing tasks due to their powerful sequence mo...
This work was conducted in the fast-growing context of hands-free voice command. In domestic environ...
Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Neverthe...
International audienceIn this work we focus on the problem of estimating the number of concurrent sp...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is propo...
International audienceSpeaker counting is the task of estimating the number of people that are simul...
Abstract The goal of sound event detection and localization (SELD) is to identify each individual so...
In this paper we investigate the importance of the extent of memory in sequential self attention for...
International audienceIn this work, we propose a novel self-attention based neural network for robus...
International audienceIn this work, we propose to extend a state-of-the-art multi-source localizatio...
Sound source localization (SSL) is a subtask of audio scene analysis that has challenged researchers...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task addi...
La localisation de sources sonores est une sous-tâche de l'analyse de scènes sonores qui a défié les...
International audienceLocalizing audio sources is challenging in real reverberant environments, espe...
Transformer models are now widely used for speech processing tasks due to their powerful sequence mo...
This work was conducted in the fast-growing context of hands-free voice command. In domestic environ...
Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Neverthe...
International audienceIn this work we focus on the problem of estimating the number of concurrent sp...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is propo...
International audienceSpeaker counting is the task of estimating the number of people that are simul...
Abstract The goal of sound event detection and localization (SELD) is to identify each individual so...
In this paper we investigate the importance of the extent of memory in sequential self attention for...