Transformer models are now widely used for speech processing tasks due to their powerful sequence modeling capabilities. Previous work determined an efficient way to model speaker embeddings using the Transformer model by combining transformers with convolutional networks. However, traditional global self-attention mechanisms lack the ability to capture local information. To alleviate these problems, we proposed a novel global–local self-attention mechanism. Instead of using local or global multi-head attention alone, this method performs local and global attention in parallel in two parallel groups to enhance local modeling and reduce computational cost. To better handle local location information, we introduced locally enhanced location e...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Neverthe...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-att...
International audienceIn this work, we propose a novel self-attention based neural network for robus...
Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapp...
Abstract Personalized voice triggering is a key technology in voice assistants and serves as the fir...
In this paper, a hierarchical attention network is proposed to generate robust utterance-level embed...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on s...
In the recent past, Deep neural networks became the most successful approach to extract the speaker ...
Learning an effective speaker representation is crucial for achieving reliable performance in speake...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Neverthe...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-att...
International audienceIn this work, we propose a novel self-attention based neural network for robus...
Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapp...
Abstract Personalized voice triggering is a key technology in voice assistants and serves as the fir...
In this paper, a hierarchical attention network is proposed to generate robust utterance-level embed...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on s...
In the recent past, Deep neural networks became the most successful approach to extract the speaker ...
Learning an effective speaker representation is crucial for achieving reliable performance in speake...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Neverthe...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...