Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.net/11356/1448) and the abstracts from the KAS-Abs 2.0 corpus (http://hdl.handle.net/11356/1449). The monolingual slo2slo dataset contains 69,730 Slovene abstracts and Slovene body texts. The cross-lingual slo2eng dataset contains 52,351 Slovene body texts and English abstracts. It is suitable for building cross-lingual summarization models. Total number of words represent the sum of words in bodies, Slovene abstracts, and English abstracts. The files are stored in the same manner as the complete KAS corpus, i.e. in 1,000 directories with the same filename prefix as in KAS. They are in the JSON format that contains chapter segmented text. ...
The KUUS corpus comprises 17 textbooks and 7 workbooks (over 700,000 words) for Slovenian as a secon...
V diplomskem delu obravnavamo avtomatsko povzemanje slovenskih dokumentov. Živimo v času, ko imamo n...
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,3...
Corpus of Academic Slovene (KAS) contains Slovene BSc/BA, MSc/MA, and PhD theses from 2000 - 2018. W...
The KAS-abs corpus contains 108,254 automatically identified Slovenian and/or English abstracts (30 ...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
The KAS-dr corpus of Slovene PhD theses consists of almost 1,600 texts (266 thousand pages or 100 mi...
V okviru diplomske naloge sem razvil model, ki povzema daljša besedila v slovenskem jeziku. Pri tem ...
The KAS-mag corpus of Slovene MSc/MA theses consists of almost 16,000 texts (1,360 thousand pages or...
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1...
The Machine Translation datasets KAS-MT 1.0 contain automatically sentence-aligned Slovene and Engli...
A text summarisation task aims to convert a longer text into a shorter text while preserving the ess...
The JOS morphosyntactic resources for Slovene consist of the specifications, lexicon, and two corpor...
Automatic text summarization is a process of extracting important information from texts and present...
The KUUS corpus comprises 17 textbooks and 7 workbooks (over 700,000 words) for Slovenian as a secon...
V diplomskem delu obravnavamo avtomatsko povzemanje slovenskih dokumentov. Živimo v času, ko imamo n...
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,3...
Corpus of Academic Slovene (KAS) contains Slovene BSc/BA, MSc/MA, and PhD theses from 2000 - 2018. W...
The KAS-abs corpus contains 108,254 automatically identified Slovenian and/or English abstracts (30 ...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600...
The KAS-dr corpus of Slovene PhD theses consists of almost 1,600 texts (266 thousand pages or 100 mi...
V okviru diplomske naloge sem razvil model, ki povzema daljša besedila v slovenskem jeziku. Pri tem ...
The KAS-mag corpus of Slovene MSc/MA theses consists of almost 16,000 texts (1,360 thousand pages or...
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1...
The Machine Translation datasets KAS-MT 1.0 contain automatically sentence-aligned Slovene and Engli...
A text summarisation task aims to convert a longer text into a shorter text while preserving the ess...
The JOS morphosyntactic resources for Slovene consist of the specifications, lexicon, and two corpor...
Automatic text summarization is a process of extracting important information from texts and present...
The KUUS corpus comprises 17 textbooks and 7 workbooks (over 700,000 words) for Slovenian as a secon...
V diplomskem delu obravnavamo avtomatsko povzemanje slovenskih dokumentov. Živimo v času, ko imamo n...
The corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,3...