The Makerere AI Lab has built an end-to-end CTC Luganda ASR model using radio data. Having encountered data challenges in working with low resource languages, we take the initiative together with our partners to release the first radio corpus for Luganda. The corpus of 155 hours is publicly available online under the Creative Commons BY-NC-ND 4.0 license. The dataset release is comprised of the following: 20 hours of human transcribed radio speech. The audio is 16kHZ, mono channel and with 16 bit rate. Two CSV files for the 20-hour human transcribed dataset - cleaned.csv contains cleaned transcripts and uncleaned.csv contains uncleaned transcripts. The uncleaned transcripts contain extra speech details included in tags like [laughter]...
The dataset was created to enable research on automatic speech recognition in Boulé (Baule) language...
International audienceMost speech and language technologies are trained with massive amounts of spee...
This English-Luganda parallel sentence corpus was created by a team of researchers from AI & Data sc...
For many of the 700 million illiterate people around the world, speech recognition technology could ...
Farm Radio International (FRI) and the CGIAR Research Initiative on Digital Innovation have col labo...
This paper introduces a new corpus of read English speech, suitable for training and evaluating spee...
This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based o...
This dataset contains 100,000 Luganda sentences. For more information on how the dataset was created...
This English-Luganda parallel sentence corpus was created by a team of researchers from AI & Data sc...
Languages are disappearing at an alarming rate, linguistics rights of speakers of most of the 7000 l...
Item does not contain fulltextThe components of the Frisian data collection are speech and language ...
DIT’s prototype speech corpus allows language learners and researchers access to real, informal dial...
This dataset contains the first electronic speech corpus of Maaloula Aramaic, an endangered Western ...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
The Linguistic Data Consortium’s Human Subjects Data Collection lab conducts cross-channel speech co...
The dataset was created to enable research on automatic speech recognition in Boulé (Baule) language...
International audienceMost speech and language technologies are trained with massive amounts of spee...
This English-Luganda parallel sentence corpus was created by a team of researchers from AI & Data sc...
For many of the 700 million illiterate people around the world, speech recognition technology could ...
Farm Radio International (FRI) and the CGIAR Research Initiative on Digital Innovation have col labo...
This paper introduces a new corpus of read English speech, suitable for training and evaluating spee...
This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based o...
This dataset contains 100,000 Luganda sentences. For more information on how the dataset was created...
This English-Luganda parallel sentence corpus was created by a team of researchers from AI & Data sc...
Languages are disappearing at an alarming rate, linguistics rights of speakers of most of the 7000 l...
Item does not contain fulltextThe components of the Frisian data collection are speech and language ...
DIT’s prototype speech corpus allows language learners and researchers access to real, informal dial...
This dataset contains the first electronic speech corpus of Maaloula Aramaic, an endangered Western ...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
The Linguistic Data Consortium’s Human Subjects Data Collection lab conducts cross-channel speech co...
The dataset was created to enable research on automatic speech recognition in Boulé (Baule) language...
International audienceMost speech and language technologies are trained with massive amounts of spee...
This English-Luganda parallel sentence corpus was created by a team of researchers from AI & Data sc...