The dataset is based on public recordings of Voice of America (https://ukrainian.voanews.com) extracted from their videos. The dataset contains 398 hours of speech. The dataset is created by the ASR Corpus Creator (https://zenodo.org/record/7396705). The format of files: WAV with 16 kHz. The URL to download WAV files: https://nx16725.your-storageshare.de/s/f4NYHXdEw2ykZK
This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains record...
The Persian Speech to Text dataset is a collection of audio files and their corresponding transcript...
USPDATRO ========== Underrepresented Speech Dataset from Open Data: Case Study on the Romanian Lang...
The dataset is based on public recordings of Voice of America (https://ukrainian.voanews.com) extrac...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
The corpus consists of transcribed recordings from the Czech political discussion broadcast “Otázky ...
The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republ...
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti w...
The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of...
Manually transcribed and speech-to-text aligned Galician ASR corpus containing 53 hours of multi-dom...
In this paper, we present our progress in pretraining Czech monolingual audio transformers from a la...
The Makerere AI Lab has built an end-to-end CTC Luganda ASR model using radio data. Having encounter...
Item does not contain fulltextThe components of the Frisian data collection are speech and language ...
In recent decades, broadcast archives have opened up their collections with automatic speech recogni...
This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains record...
The Persian Speech to Text dataset is a collection of audio files and their corresponding transcript...
USPDATRO ========== Underrepresented Speech Dataset from Open Data: Case Study on the Romanian Lang...
The dataset is based on public recordings of Voice of America (https://ukrainian.voanews.com) extrac...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
The corpus consists of transcribed recordings from the Czech political discussion broadcast “Otázky ...
The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republ...
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti w...
The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of...
Manually transcribed and speech-to-text aligned Galician ASR corpus containing 53 hours of multi-dom...
In this paper, we present our progress in pretraining Czech monolingual audio transformers from a la...
The Makerere AI Lab has built an end-to-end CTC Luganda ASR model using radio data. Having encounter...
Item does not contain fulltextThe components of the Frisian data collection are speech and language ...
In recent decades, broadcast archives have opened up their collections with automatic speech recogni...
This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains record...
The Persian Speech to Text dataset is a collection of audio files and their corresponding transcript...
USPDATRO ========== Underrepresented Speech Dataset from Open Data: Case Study on the Romanian Lang...