Fine-tuning wav2vec2 for speaker recognition

Vaessen, Nik
van Leeuwen, David A.

Open PDF

Open link

Publication date

May 2022

DOI

10.1109/ICASSP43922.2022.9746952

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Language

English

Abstract

This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with CE or AAM softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant, w2v2-aam, achieves a 1.88% EER on the extended voxceleb1 test set compared to 1.69% EER with an ECAPA-TDNN baseline. Code is available at https://github.com/nikvaessen/w2v2-speaker.Comment: accepted to ICASSP 202

Extracted data

We use cookies to provide a better user experience.

Data Protection

Fine-tuning wav2vec2 for speaker recognition

Abstract

Extracted data

Fine-tuning wav2vec2 for speaker recognition

Abstract

Extracted data

Related items

Related items