This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with CE or AAM softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant, w2v2-aam, achieves a 1.88% EER on the extended voxceleb1 test set compared to 1.69% EER with an ECAPA-TDNN baseline. Code is available at https://github.com/nikvaessen/w2v2-speaker.Comment: accepted to ICASSP 202
This report describes the SJTU-AISPEECH system for the Voxceleb Speaker Recognition Challenge 2022. ...
This paper investigates self-supervised pre-training for audio-visual speaker representation learnin...
Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizi...
This work considers training neural networks for speaker recognition with a much smaller dataset siz...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
In this report, we describe our submitted system for track 2 of the VoxCeleb Speaker Recognition Cha...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations i...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition ...
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
In this technical report, we describe the Royalflush submissions for the VoxCeleb Speaker Recognitio...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to...
This report describes the SJTU-AISPEECH system for the Voxceleb Speaker Recognition Challenge 2022. ...
This paper investigates self-supervised pre-training for audio-visual speaker representation learnin...
Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizi...
This work considers training neural networks for speaker recognition with a much smaller dataset siz...
Learning music representations that are general-purpose offers the flexibility to finetune several d...
In this report, we describe our submitted system for track 2 of the VoxCeleb Speaker Recognition Cha...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations i...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition ...
The objective of this paper is speaker recognition under noisy and unconstrained conditions. We mak...
In this technical report, we describe the Royalflush submissions for the VoxCeleb Speaker Recognitio...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to...
This report describes the SJTU-AISPEECH system for the Voxceleb Speaker Recognition Challenge 2022. ...
This paper investigates self-supervised pre-training for audio-visual speaker representation learnin...
Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizi...