The objective of this paper is speaker recognition `in the wild' - where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a `thin-ResNet' trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1 test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance le...
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally differe...
Artificial Intelligence plays a fundamental role in the speech-based interaction between humans and ...
Learning representation from audio data has shown advantages over the handcrafted features such as m...
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown signifi...
Convolutional neural networks (CNNs) have significantly promoted the development of speaker verifica...
Current speaker verification techniques rely on a neural network to extract speaker representations....
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
This work considers training neural networks for speaker recognition with a much smaller dataset siz...
The objective of this work is to study state-of-the-art deep neural networks based speaker verificat...
Deep neural networks have become a veritable alternative to classic speaker recognition and clusteri...
Despite achieving satisfactory performance in speaker verification using deep neural networks, varia...
This paper discusses a transition from the traditional methods to novel deep learning architectures ...
This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The...
This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The...
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally differe...
Artificial Intelligence plays a fundamental role in the speech-based interaction between humans and ...
Learning representation from audio data has shown advantages over the handcrafted features such as m...
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown signifi...
Convolutional neural networks (CNNs) have significantly promoted the development of speaker verifica...
Current speaker verification techniques rely on a neural network to extract speaker representations....
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
This work considers training neural networks for speaker recognition with a much smaller dataset siz...
The objective of this work is to study state-of-the-art deep neural networks based speaker verificat...
Deep neural networks have become a veritable alternative to classic speaker recognition and clusteri...
Despite achieving satisfactory performance in speaker verification using deep neural networks, varia...
This paper discusses a transition from the traditional methods to novel deep learning architectures ...
This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The...
This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The...
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally differe...
Artificial Intelligence plays a fundamental role in the speech-based interaction between humans and ...
Learning representation from audio data has shown advantages over the handcrafted features such as m...