Nos_ParlaSpeech-GL is an ASR corpus of more than 1,600 hours of automatically aligned speech and text, created from audio and official transcripts of Galician parliamentary sessions celebrated between 2015 and 2022. The content belongs to the Galician Parliament and the data is released according to their terms of use. The corpus is split into two subcorpora, “clean” and “other”. The segments included in the “clean” subcorpus were filtered according to several alignment quality criteria, whereas the “other” subcorpus comprises the segments that were discarded in the filtering process. The details of both subcorpora can be found in the table below: Subcorpus No. of hours No. of segments clean 1,196.92 667,308 ...
collection of bilingual parallel English-Galician corpora.These corpora contain only synthetic text ...
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly...
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly...
Manually transcribed and speech-to-text aligned Galician ASR corpus containing 53 hours of multi-dom...
Galician TTS single speaker corpus of approximately 25 hours of speech. Nos_Celtia-GL is a phonetic...
Póster presentado no 9th Language Resources and Evaluation Conference (LREC 2014). Reykjavik, 26-31 ...
This is the ParlamentParla speech corpus for Catalan prepared by Col·lectivaT. The audio segments we...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of...
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 2...
ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starti...
The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republ...
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly...
Item does not contain fulltextThe components of the Frisian data collection are speech and language ...
collection of bilingual parallel English-Galician corpora.These corpora contain only synthetic text ...
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly...
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly...
Manually transcribed and speech-to-text aligned Galician ASR corpus containing 53 hours of multi-dom...
Galician TTS single speaker corpus of approximately 25 hours of speech. Nos_Celtia-GL is a phonetic...
Póster presentado no 9th Language Resources and Evaluation Conference (LREC 2014). Reykjavik, 26-31 ...
This is the ParlamentParla speech corpus for Catalan prepared by Col·lectivaT. The audio segments we...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian ...
The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of...
ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 2...
ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starti...
The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republ...
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly...
Item does not contain fulltextThe components of the Frisian data collection are speech and language ...
collection of bilingual parallel English-Galician corpora.These corpora contain only synthetic text ...
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly...
ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly...