This is a modified version of a subset of the Device and Produced Speech (DAPS) dataset. The original dataset can be found here. This dataset contains text-aligned audio of the first script of the "clean" partition of the DAPS dataset for all 20 speakers. Phoneme and word alignments are provided as JSON files. We segment the audio and alignments into single sentences. For each sentence, we additionally provide the raw text in a txt file. Audio is provided as 44.1 kHz WAV files. If you use this work as part of an academic publication, please cite the paper corresponding to the original dataset: Gautham J. Mysore, “Can We Automatically Transform Speech Recorded on Common Consumer Devices in Real-World Environments into Professional Product...
Phonetic segmentation is the breakup and classication of the sound signal into a string of phones. T...
: Concatenative Text-To-Speech synthesizers join pre-recorded segments of speech data in order to pr...
International audienceDuring Speech Prosody 2012, we presented SPPAS, SPeech Phonetization Alignment...
The DAPS (Device and Produced Speech) dataset is a collection of aligned versions of professionally ...
DDS (Device-Degraded Speech) dataset provides aligned parallel recordings of high-quality speech (re...
DDS (Device-Degraded Speech) dataset provides aligned parallel recordings of high-quality speech (re...
DDS (Device-Degraded Speech) dataset provides aligned parallel recordings of high-quality speech (re...
This paper describes the ALISA tool, which implements a lightly supervised method for sentence-level...
In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditi...
In this paper, we introduce an algorithm dedicated to speaker-based segmentation of audio material. ...
This dataset contains speech from Finnish parliament 2008-2020 plenary sessions, segmented and align...
This poster describes the creation of an automatic word and phoneme alignment between the audio reco...
Large multi paragraph speech databases encapsulate prosodic and contextual information beyond the se...
This paper investigates the issue of automatic segmentation of speech recordings for broadcast news ...
Automatic speech recognition (ASR) in the educational environment could be a solution to address the...
Phonetic segmentation is the breakup and classication of the sound signal into a string of phones. T...
: Concatenative Text-To-Speech synthesizers join pre-recorded segments of speech data in order to pr...
International audienceDuring Speech Prosody 2012, we presented SPPAS, SPeech Phonetization Alignment...
The DAPS (Device and Produced Speech) dataset is a collection of aligned versions of professionally ...
DDS (Device-Degraded Speech) dataset provides aligned parallel recordings of high-quality speech (re...
DDS (Device-Degraded Speech) dataset provides aligned parallel recordings of high-quality speech (re...
DDS (Device-Degraded Speech) dataset provides aligned parallel recordings of high-quality speech (re...
This paper describes the ALISA tool, which implements a lightly supervised method for sentence-level...
In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditi...
In this paper, we introduce an algorithm dedicated to speaker-based segmentation of audio material. ...
This dataset contains speech from Finnish parliament 2008-2020 plenary sessions, segmented and align...
This poster describes the creation of an automatic word and phoneme alignment between the audio reco...
Large multi paragraph speech databases encapsulate prosodic and contextual information beyond the se...
This paper investigates the issue of automatic segmentation of speech recordings for broadcast news ...
Automatic speech recognition (ASR) in the educational environment could be a solution to address the...
Phonetic segmentation is the breakup and classication of the sound signal into a string of phones. T...
: Concatenative Text-To-Speech synthesizers join pre-recorded segments of speech data in order to pr...
International audienceDuring Speech Prosody 2012, we presented SPPAS, SPeech Phonetization Alignment...