International audienceRecent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text. Is such a discrete bottleneck necessary, potentially introducing irreversible errors in the encoding of the speech signal, or could we learn a language model without discrete units at all? In this work, we study the role of discrete versus continuous representations in spoken language modeling. We show that discretization is indeed essential for good results in spoken language modeling. We show that discretization re...
Recently, there has been a growing interest in text-to-speech (TTS) methods that can be trained with...
Documenting languages helps to prevent the extinction of endangered dialects, many of which are othe...
In the domain of unsupervised learning most work on speech has focused on discovering low-level cons...
Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly ...
International audienceWe introduce Generative Spoken Language Modeling, the task of learning the aco...
14 pages, including references and supplementary materialInternational audienceWe introduce a new un...
Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588International aud...
Documenting languages helps to prevent the extinction of endangered dialects – many of which are oth...
International audienceRecent progress in self-supervised or unsupervised machine learning has opened...
The goal of voice conversion is to transform source speech into a target voice, keeping the content ...
Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizin...
International audienceThis article analyzes the phonetic decoding performance obtained with differen...
International audienceUnsupervised models of representations based on Contrastive Predictive Coding ...
International audienceFinding word boundaries in continuous speech is challenging as there is little...
International audienceModels of the acquisition of word segmentation are typically evaluated using p...
Recently, there has been a growing interest in text-to-speech (TTS) methods that can be trained with...
Documenting languages helps to prevent the extinction of endangered dialects, many of which are othe...
In the domain of unsupervised learning most work on speech has focused on discovering low-level cons...
Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly ...
International audienceWe introduce Generative Spoken Language Modeling, the task of learning the aco...
14 pages, including references and supplementary materialInternational audienceWe introduce a new un...
Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588International aud...
Documenting languages helps to prevent the extinction of endangered dialects – many of which are oth...
International audienceRecent progress in self-supervised or unsupervised machine learning has opened...
The goal of voice conversion is to transform source speech into a target voice, keeping the content ...
Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizin...
International audienceThis article analyzes the phonetic decoding performance obtained with differen...
International audienceUnsupervised models of representations based on Contrastive Predictive Coding ...
International audienceFinding word boundaries in continuous speech is challenging as there is little...
International audienceModels of the acquisition of word segmentation are typically evaluated using p...
Recently, there has been a growing interest in text-to-speech (TTS) methods that can be trained with...
Documenting languages helps to prevent the extinction of endangered dialects, many of which are othe...
In the domain of unsupervised learning most work on speech has focused on discovering low-level cons...