Large textual resources are the basis for a variety of applications in the field of corpus linguistics. For most languages spoken by large user groups a comprehensive set of these corpora are constantly generated and exploited. Unfortunately for modern Indian languages there are still shortcomings that interfere with systematic text analysis. This paper describes the Indian part of the Leipzig Corpora Collection which is a provider of freely available resources for more than 200 languages. This project focuses on providing modern text corpora and wordlists via web-based interfaces for the academic community. As an example for the exploitation of these resources it will be shown that they can be used for the visualization of semantic context...
Lew The Web, teeming as it is with language data, of all manner of varieties and languages, in vast ...
This paper describes the work carried out on the EMILLE Project (Enabling Minority Language Engineer...
In this paper we have developed an open-source online computational framework that can be used by di...
The paper compares systematically the utility of specially-made text corpora and the textual resourc...
Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpo...
Recently there have been several initiatives to create locally accessible large scale corpora based ...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
Comunicació presentada a: EACL '06: Eleventh Conference of the European Chapter of the Association f...
Language identification (LI) in textual documents is the process of automatically detecting the lang...
The utility of a language corpus is drastically enhanced when it is properly processed in various wa...
The paper aims at capturing the importance and viability of corpora in language teaching. These larg...
This article examines the impact of Corpus Linguistics on language learning, teaching and testing, f...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Key to fast adaptation of language technologies for any language hinges on the availability of funda...
Lew The Web, teeming as it is with language data, of all manner of varieties and languages, in vast ...
This paper describes the work carried out on the EMILLE Project (Enabling Minority Language Engineer...
In this paper we have developed an open-source online computational framework that can be used by di...
The paper compares systematically the utility of specially-made text corpora and the textual resourc...
Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpo...
Recently there have been several initiatives to create locally accessible large scale corpora based ...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
Comunicació presentada a: EACL '06: Eleventh Conference of the European Chapter of the Association f...
Language identification (LI) in textual documents is the process of automatically detecting the lang...
The utility of a language corpus is drastically enhanced when it is properly processed in various wa...
The paper aims at capturing the importance and viability of corpora in language teaching. These larg...
This article examines the impact of Corpus Linguistics on language learning, teaching and testing, f...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Key to fast adaptation of language technologies for any language hinges on the availability of funda...
Lew The Web, teeming as it is with language data, of all manner of varieties and languages, in vast ...
This paper describes the work carried out on the EMILLE Project (Enabling Minority Language Engineer...
In this paper we have developed an open-source online computational framework that can be used by di...