Documenting languages helps to prevent the extinction of endangered dialects, many of which are otherwise expected to disappear by the end of the century. When documenting oral languages, unsupervised word segmentation (UWS) from speech is a useful, yet challenging, task. It consists in producing time-stamps for slicing utterances into smaller segments corresponding to words, being performed from phonetic transcriptions, or in the absence of these, from the output of unsupervised speech discretization models. These discretization models are trained using raw speech only, producing discrete speech units that can be applied for downstream (text-based) tasks. In this paper we compare five of these models: three Bayesian and two neural approach...
The ability to discover groupings in continuous stimuli on the basis of distributional information i...
Humans, even from infancy, are capable of unsupervised (“sta- tistical”) learning of linguistic info...
These last years, there has been a regain of interest in unsupervised sub-lexical and lexical unit d...
Documenting languages helps to prevent the extinction of endangered dialects – many of which are oth...
Current supervised speech technology relies heavily on tran-scribed speech and pronunciation diction...
This paper describes a variety of nonparametric Bayesian models of word segmentation based on Adapto...
Developing better methods for segmenting continuous text into words is important for improving the p...
International audienceWe present a first attempt to perform attentional word segmen-tation directly ...
Abstract — In this paper we consider the unsupervised word discovery from phonetic input. We employ ...
Accepted to ICASSP 2018International audienceDeveloping speech technologies for low-resource languag...
Language users process utterances by segmenting them into many cognitive units, which vary in their ...
In this paper we show that recently developed algorithms for unsupervised word segmentation can be a...
Zero resource speech processing refers to a scenario where no or minimal transcribed da...
In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our ap...
Language diversity is under considerable pressure: half of the world’s languages could disappear by ...
The ability to discover groupings in continuous stimuli on the basis of distributional information i...
Humans, even from infancy, are capable of unsupervised (“sta- tistical”) learning of linguistic info...
These last years, there has been a regain of interest in unsupervised sub-lexical and lexical unit d...
Documenting languages helps to prevent the extinction of endangered dialects – many of which are oth...
Current supervised speech technology relies heavily on tran-scribed speech and pronunciation diction...
This paper describes a variety of nonparametric Bayesian models of word segmentation based on Adapto...
Developing better methods for segmenting continuous text into words is important for improving the p...
International audienceWe present a first attempt to perform attentional word segmen-tation directly ...
Abstract — In this paper we consider the unsupervised word discovery from phonetic input. We employ ...
Accepted to ICASSP 2018International audienceDeveloping speech technologies for low-resource languag...
Language users process utterances by segmenting them into many cognitive units, which vary in their ...
In this paper we show that recently developed algorithms for unsupervised word segmentation can be a...
Zero resource speech processing refers to a scenario where no or minimal transcribed da...
In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our ap...
Language diversity is under considerable pressure: half of the world’s languages could disappear by ...
The ability to discover groupings in continuous stimuli on the basis of distributional information i...
Humans, even from infancy, are capable of unsupervised (“sta- tistical”) learning of linguistic info...
These last years, there has been a regain of interest in unsupervised sub-lexical and lexical unit d...