This paper proposes a hierarchical latent embedding structure for Vector Quantized Variational Autoencoder (VQVAE) to improve the performance of the non-parallel voice conversion (NPVC) model. Previous studies on NPVC based on vanilla VQVAE use a single codebook to encode the linguistic information at a fixed temporal scale. However, the linguistic structure contains different semantic levels (e.g., phoneme, syllable, word) that span at various temporal scales. Therefore, the converted speech may contain unnatural pronunciations which can degrade the naturalness of speech. To tackle this problem, we propose to use the hierarchical latent embedding structure which comprises several vector quantization blocks operating at different temporal s...
We propose a joint training scheme of an any-to-one voice conversion (VC) system with LPCNet to impr...
In the recent past, deep neural networks have been successfully employed to extract fixed-dimensiona...
Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate ...
In this paper, we present a nonparallel voice conversion (VC) approach that does not require paralle...
In this paper, we present a dictionary-based voice conversion (VC) approach that does not require pa...
Gburrek T, Ebbers J, Häb-Umbach R, Wagner P. Unsupervised Learning of a Disentangled Speech Represen...
International audienceMuch existing voice conversion (VC) systems are attractive owing to their high...
Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework ...
We present an any-to-one voice conversion (VC) system, using an autoregressive model and LPCNet voco...
We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vec...
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical ...
We propose voice conversion model from arbitrary source speaker to arbitrary target speaker with dis...
International audienceRecently, audiovisual speech enhancement has been tackled in the unsupervised ...
The objective of voice conversion techniques is to convert a source speaker's voice so that it sound...
Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a t...
We propose a joint training scheme of an any-to-one voice conversion (VC) system with LPCNet to impr...
In the recent past, deep neural networks have been successfully employed to extract fixed-dimensiona...
Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate ...
In this paper, we present a nonparallel voice conversion (VC) approach that does not require paralle...
In this paper, we present a dictionary-based voice conversion (VC) approach that does not require pa...
Gburrek T, Ebbers J, Häb-Umbach R, Wagner P. Unsupervised Learning of a Disentangled Speech Represen...
International audienceMuch existing voice conversion (VC) systems are attractive owing to their high...
Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework ...
We present an any-to-one voice conversion (VC) system, using an autoregressive model and LPCNet voco...
We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vec...
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical ...
We propose voice conversion model from arbitrary source speaker to arbitrary target speaker with dis...
International audienceRecently, audiovisual speech enhancement has been tackled in the unsupervised ...
The objective of voice conversion techniques is to convert a source speaker's voice so that it sound...
Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a t...
We propose a joint training scheme of an any-to-one voice conversion (VC) system with LPCNet to impr...
In the recent past, deep neural networks have been successfully employed to extract fixed-dimensiona...
Kuhlmann M, Seebauer FM, Ebbers J, Wagner P, Haeb-Umbach R. Investigation into Target Speaking Rate ...