ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole Czech Republic. The corpus comprises 835 recordings from 2008–2011 that contain 2 785 189 words (i.e. 3 285 508 tokens including punctuation) uttered by 2 544 speakers, out of which 1 297 speakers are unique. ORAL2013 is balanced in the main sociolinguistic categories of the speakers (gender, age group, education, region of childhood residence). The (anonymized) transcriptions are provided in the Transcriber XML format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format. Another format option of the transcriptio...
We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximatel...
The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consi...
This paper presents the final version of the Czech Broadcast Conversation Corpus that will shortly b...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
Balanced corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 297 recordings ma...
The paper presents a corpus of spontaneous spoken Czech called ORAL2013, its design principles and p...
ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (pr...
ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (pr...
Corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 221 recordings made in 200...
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which cont...
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which cont...
The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republ...
The corpus contains speech data of 2 Czech native speakers, male and female. The speech is very prec...
PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 m...
We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximatel...
The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consi...
This paper presents the final version of the Czech Broadcast Conversation Corpus that will shortly b...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
Balanced corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 297 recordings ma...
The paper presents a corpus of spontaneous spoken Czech called ORAL2013, its design principles and p...
ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (pr...
ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (pr...
Corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 221 recordings made in 200...
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which cont...
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which cont...
The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republ...
The corpus contains speech data of 2 Czech native speakers, male and female. The speech is very prec...
PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 m...
We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximatel...
The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consi...
This paper presents the final version of the Czech Broadcast Conversation Corpus that will shortly b...