The Leipzig Corpora Collection offers free online access to 136 monolingual dictionaries enriched with statistical information. In this paper we describe current advances of the project in collecting and processing text data automatically for a large number of languages. Our main interest lies in languages of “low density”, where only few text data exists online. The aim of this approach is to create monolingual dictionaries and statistical information for a high number of new languages and to expand the existing dictionaries, opening up new possibilities for linguistic typology and other research. Focus of this paper will be set on the infrastructure for the automatic acquisition of large amounts of monolingual text in many languages from ...
This paper presents a new research and development project called Papillon. It started as a French-J...
A Danish corpus, holding 40 million words of general language from the period 1983-92, was designed ...
International audienceThe MotAMot project aims to develop of a multilingual lexi- cal network focuse...
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left...
In this paper we describe a flexible and portable infrastructure for setting up large monolingual la...
Since 2011 the comprehensive, electronically available sources of the Leipzig Corpora Collection hav...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
This paper deals with multilingual database generation from parallel corpora. The idea is to contrib...
Goal of this thesis is to implement system, capable of extracting bilingual dictionaries from parall...
Abstract Many translation scholars have proposed the use of corpora to allow professional translator...
This paper proposes approaches to automatically createa large number of new bilingual dictionaries f...
The paper describes the algorithmic methods used in a German monolingual lexicon project dealing wit...
The aim of our software presentation is to demonstrate that corpus-driven bilingual dictionaries gen...
This paper presents SwissCrawl, the largest Swiss German text corpus to date. Composed of more than ...
The algorithm of the creation texts parallel corpora was presented. The algorithm is based on the us...
This paper presents a new research and development project called Papillon. It started as a French-J...
A Danish corpus, holding 40 million words of general language from the period 1983-92, was designed ...
International audienceThe MotAMot project aims to develop of a multilingual lexi- cal network focuse...
Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left...
In this paper we describe a flexible and portable infrastructure for setting up large monolingual la...
Since 2011 the comprehensive, electronically available sources of the Leipzig Corpora Collection hav...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
This paper deals with multilingual database generation from parallel corpora. The idea is to contrib...
Goal of this thesis is to implement system, capable of extracting bilingual dictionaries from parall...
Abstract Many translation scholars have proposed the use of corpora to allow professional translator...
This paper proposes approaches to automatically createa large number of new bilingual dictionaries f...
The paper describes the algorithmic methods used in a German monolingual lexicon project dealing wit...
The aim of our software presentation is to demonstrate that corpus-driven bilingual dictionaries gen...
This paper presents SwissCrawl, the largest Swiss German text corpus to date. Composed of more than ...
The algorithm of the creation texts parallel corpora was presented. The algorithm is based on the us...
This paper presents a new research and development project called Papillon. It started as a French-J...
A Danish corpus, holding 40 million words of general language from the period 1983-92, was designed ...
International audienceThe MotAMot project aims to develop of a multilingual lexi- cal network focuse...