There exist as many as 7000 natural languages in the world, and a huge number of documents describing those languages have been produced over the years. Most of those documents are in paper format. Any attempts to use modern computational techniques and tools to process those documents will require them to be digitized first. In this paper, we report a multilingual digitized version of thousands of such documents searchable through some well-established corpus infrastructures. The corpus is annotated with various meta, word, and text level attributes to make searching and analysis easier and more useful.NWO335-54-102Descriptive and Comparative Linguistic
The paper discusses several key concepts related to the development of corpora and reconsiders them ...
Document servers complying to the standards of the Open Archives Initiative (OAI) are rich, yet seld...
"To understand the role of machine-readable text corpora in linguistics it is necessary to consider ...
© European Language Resources Association (ELRA), licensed under CC-BY-NC There exist as many as 700...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 ...
A frequently overlooked benefit of open access publications is that they are an easy accessible and ...
Linguistically annotated corpora are becoming a central part of the corpus linguistics field. One of...
This contribution explores the potentials of combining corpora of language use data with language de...
Nowadays a corpus is typically a large collection of text excerpts, representing a range of register...
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
This paper introduces Kwaras and Namuti, two new tools for building, managing, accessing, and mobili...
This is a "white paper" proposing the construction of a "universal corpus" containing digitizations ...
Corpus data have emerged as the raw data/benchmark for several NLP applications. Corpus is described...
The paper discusses several key concepts related to the development of corpora and reconsiders them ...
Document servers complying to the standards of the Open Archives Initiative (OAI) are rich, yet seld...
"To understand the role of machine-readable text corpora in linguistics it is necessary to consider ...
© European Language Resources Association (ELRA), licensed under CC-BY-NC There exist as many as 700...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
There exist as many as 7000 natural languages in the world, and a huge number of documents describin...
This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 ...
A frequently overlooked benefit of open access publications is that they are an easy accessible and ...
Linguistically annotated corpora are becoming a central part of the corpus linguistics field. One of...
This contribution explores the potentials of combining corpora of language use data with language de...
Nowadays a corpus is typically a large collection of text excerpts, representing a range of register...
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
This paper introduces Kwaras and Namuti, two new tools for building, managing, accessing, and mobili...
This is a "white paper" proposing the construction of a "universal corpus" containing digitizations ...
Corpus data have emerged as the raw data/benchmark for several NLP applications. Corpus is described...
The paper discusses several key concepts related to the development of corpora and reconsiders them ...
Document servers complying to the standards of the Open Archives Initiative (OAI) are rich, yet seld...
"To understand the role of machine-readable text corpora in linguistics it is necessary to consider ...