The KAS-mag corpus of Slovene MSc/MA theses consists of almost 16,000 texts (1,360 thousand pages or 500 million tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institutions via the Slovene Open Science portal (http://openscience.si). The theses have associated with them significant metadata, while each thesis in the corpus contains its textual body, i.e. without their front and back matter. The body is divided into pages, these into paragraphs, and then into sentences. The sentence tokens are morphosyntactically annotated, words are lemmatised and English-Slovene pairs of term candidates are marked up and linked. The corpus is distributed in the canonical TEI encoding, in the so called ...
ccUčbeniki includes 32 openly available texbooks for Slovenian primary and secondary education, publ...
The KUUS corpus comprises 17 textbooks and 7 workbooks (over 700,000 words) for Slovenian as a secon...
The siParl corpus contains minutes of the Assembly of the Republic of Slovenia for 11th legislative ...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
The KAS-dr corpus of Slovene PhD theses consists of almost 1,600 texts (266 thousand pages or 100 mi...
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
Corpus of Academic Slovene (KAS) contains Slovene BSc/BA, MSc/MA, and PhD theses from 2000 - 2018. W...
The KAS-abs corpus contains 108,254 automatically identified Slovenian and/or English abstracts (30 ...
MAKS (MlAdinski KorpuS, i.e. the Youth Corpus) includes texts from literature, newspapers, and, to a...
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 8,3...
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,3...
Corpus ccGigafida consists of paragraph samples from 31,722 documents, each containing information a...
The JOS morphosyntactic resources for Slovene consist of the specifications, lexicon, and two corpor...
Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.ne...
ccUčbeniki includes 32 openly available texbooks for Slovenian primary and secondary education, publ...
The KUUS corpus comprises 17 textbooks and 7 workbooks (over 700,000 words) for Slovenian as a secon...
The siParl corpus contains minutes of the Assembly of the Republic of Slovenia for 11th legislative ...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
The KAS-dr corpus of Slovene PhD theses consists of almost 1,600 texts (266 thousand pages or 100 mi...
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
Corpus of Academic Slovene (KAS) contains Slovene BSc/BA, MSc/MA, and PhD theses from 2000 - 2018. W...
The KAS-abs corpus contains 108,254 automatically identified Slovenian and/or English abstracts (30 ...
MAKS (MlAdinski KorpuS, i.e. the Youth Corpus) includes texts from literature, newspapers, and, to a...
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 8,3...
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,3...
Corpus ccGigafida consists of paragraph samples from 31,722 documents, each containing information a...
The JOS morphosyntactic resources for Slovene consist of the specifications, lexicon, and two corpor...
Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.ne...
ccUčbeniki includes 32 openly available texbooks for Slovenian primary and secondary education, publ...
The KUUS corpus comprises 17 textbooks and 7 workbooks (over 700,000 words) for Slovenian as a secon...
The siParl corpus contains minutes of the Assembly of the Republic of Slovenia for 11th legislative ...