Disentanglement is the task of learning representations that identify and separate factors that explain the variation observed in data. Disentangled representations are useful to increase the generalizability, explainability, and fairness of data-driven models. Only little is known about how well such disentanglement works for speech representations. A major challenge when tackling disentanglement for speech representations are the unknown generative factors underlying the speech signal. In this work, we investigate to what degree speech representations encoding speaker identity can be disentangled. To quantify disentanglement, we identify acoustic features that are highly speaker-variant and can serve as proxies for the factors of variatio...
Speaker embeddings represent a means to extract representative vectorial representations from a spee...
We present a self-supervised method to disentangle factors of variation in high-dimensional data tha...
A large part of the literature on learning disentangled representations focuses on variational autoe...
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components o...
A variety of informational factors are contained within the speech signal and a single short recordi...
International audienceRecently, a growing interest in unsupervised learning of disentangled represen...
Recently end-to-end neural audio/speech coding has shown its great potential to outperform tradition...
Speech intelligibility assessment plays an important role in the therapy of patients suffering from ...
While most research into speech synthesis has focused on synthesizing high-quality speech for in-dat...
Learning disentangled representations with variational autoencoders (VAEs) is often attributed to th...
Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research top...
International audienceSpeech signals contain a lot of sensitive information, such as the speaker's i...
State-of-the-art speaker verification systems are inherently dependent on some kind of human supervi...
In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training sp...
The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisati...
Speaker embeddings represent a means to extract representative vectorial representations from a spee...
We present a self-supervised method to disentangle factors of variation in high-dimensional data tha...
A large part of the literature on learning disentangled representations focuses on variational autoe...
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components o...
A variety of informational factors are contained within the speech signal and a single short recordi...
International audienceRecently, a growing interest in unsupervised learning of disentangled represen...
Recently end-to-end neural audio/speech coding has shown its great potential to outperform tradition...
Speech intelligibility assessment plays an important role in the therapy of patients suffering from ...
While most research into speech synthesis has focused on synthesizing high-quality speech for in-dat...
Learning disentangled representations with variational autoencoders (VAEs) is often attributed to th...
Voice Conversion (VC) for unseen speakers, also known as zero-shot VC, is an attractive research top...
International audienceSpeech signals contain a lot of sensitive information, such as the speaker's i...
State-of-the-art speaker verification systems are inherently dependent on some kind of human supervi...
In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training sp...
The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisati...
Speaker embeddings represent a means to extract representative vectorial representations from a spee...
We present a self-supervised method to disentangle factors of variation in high-dimensional data tha...
A large part of the literature on learning disentangled representations focuses on variational autoe...