The childPoeDE corpus is a collection of 1082 German poems for children created within the CHYLSA project. The poems were taken from anthologies published between 1991 and 2019. This publication includes the poem-level metadata for each poem with information about the author, the poem's length, data on case, punctuation, layout, rhyme, type-token ratio (TTR and MATTR) and lexical density. It also includes token-level metadata, namely word length and position, POS tags in different levels of granularity as well as data on onomatopoeia and sonority. Furthermore, this publication provides a word frequency table and a Python script which was used to extract some of the metadata from the texts (poemtool.py). The childPoeDE corpus does not contai...
Vocabulary Profilers (VPrs) are deeply rooted in pedagogical purposes. The current investigation, ho...
This paper describes the corpus of recordings of children`s speech which was collected as part of th...
This paper describes the corpus of recordings of children’s speech which was collected as part of th...
The childPoe corpus is a collection of 1082 German children's poems created within the CHYLSA projec...
First release of the lexica corpus: a corpus for German text simplification. The corpus consists of...
Second release of the lexica corpus: a corpus for German text simplification, total size now 3270 fi...
We report upon a digital humanities project on the acquisition and analysis of a corpus of German on...
A prerequisite for the computational study of literature is the availability of properly digitized t...
We present a digital corpus of plays in Alsatian language varieties with rich metadata. We transcrib...
The Oupoco Database is a collection of 4870 French sonnets developed in the framework of the Oupoco ...
The paper regards the principles of selection of poems for Russian National Corpus’ poetical subcorp...
The computational analysis of poetry is limited by the scarcity of tools to automatically analyze an...
PREPRINT Abstract: With DISCO, the DIachronic Spanish Sonnet COrpus, we collected 4085 sonnets, from...
The XSample corpus war created in the project XSample (https://www.izus.uni-stuttgart.de/fokus/fdm-p...
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) informa...
Vocabulary Profilers (VPrs) are deeply rooted in pedagogical purposes. The current investigation, ho...
This paper describes the corpus of recordings of children`s speech which was collected as part of th...
This paper describes the corpus of recordings of children’s speech which was collected as part of th...
The childPoe corpus is a collection of 1082 German children's poems created within the CHYLSA projec...
First release of the lexica corpus: a corpus for German text simplification. The corpus consists of...
Second release of the lexica corpus: a corpus for German text simplification, total size now 3270 fi...
We report upon a digital humanities project on the acquisition and analysis of a corpus of German on...
A prerequisite for the computational study of literature is the availability of properly digitized t...
We present a digital corpus of plays in Alsatian language varieties with rich metadata. We transcrib...
The Oupoco Database is a collection of 4870 French sonnets developed in the framework of the Oupoco ...
The paper regards the principles of selection of poems for Russian National Corpus’ poetical subcorp...
The computational analysis of poetry is limited by the scarcity of tools to automatically analyze an...
PREPRINT Abstract: With DISCO, the DIachronic Spanish Sonnet COrpus, we collected 4085 sonnets, from...
The XSample corpus war created in the project XSample (https://www.izus.uni-stuttgart.de/fokus/fdm-p...
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) informa...
Vocabulary Profilers (VPrs) are deeply rooted in pedagogical purposes. The current investigation, ho...
This paper describes the corpus of recordings of children`s speech which was collected as part of th...
This paper describes the corpus of recordings of children’s speech which was collected as part of th...