State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to the amount of data available today. In this study, we explore self-supervised learning for speaker verification by learning representations directly from raw audio. The objective is to produce robust speaker embeddings that have small intra-speaker and large inter-speaker variance. Our approach is based on recent information maximization learning frameworks and an intensive data augmentation pre-processing step. We evaluate the ability of these methods to work without contrastive samples before showing th...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases....
Data augmentation is vital to the generalization ability and robustness of deep neural networks (DNN...
Training robust speaker verification systems without speaker labels has long been a challenging task...
For self-supervised speaker verification, the quality of pseudo labels decides the upper bound of th...
Speaker recognition, recognizing speaker identities based on voice alone, enables important downstre...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
Over the last few years, deep learning has grown in popularity for speaker verification, identificat...
The goal of this paper is to train effective self-supervised speaker representations without identit...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
This paper investigates self-supervised pre-training for audio-visual speaker representation learnin...
Voice cloning is a difficult task which requires robust and informative features incorporated in a h...
This paper presents the SJTU system for both text-dependent and text-independent tasks in short-dura...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
This paper proposes a novel formulation of prototypical loss with mixup for speaker verification. Mi...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases....
Data augmentation is vital to the generalization ability and robustness of deep neural networks (DNN...
Training robust speaker verification systems without speaker labels has long been a challenging task...
For self-supervised speaker verification, the quality of pseudo labels decides the upper bound of th...
Speaker recognition, recognizing speaker identities based on voice alone, enables important downstre...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
Over the last few years, deep learning has grown in popularity for speaker verification, identificat...
The goal of this paper is to train effective self-supervised speaker representations without identit...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
This paper investigates self-supervised pre-training for audio-visual speaker representation learnin...
Voice cloning is a difficult task which requires robust and informative features incorporated in a h...
This paper presents the SJTU system for both text-dependent and text-independent tasks in short-dura...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
This paper proposes a novel formulation of prototypical loss with mixup for speaker verification. Mi...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases....
Data augmentation is vital to the generalization ability and robustness of deep neural networks (DNN...