AbstractEveryone working on general language would like their corpus to be bigger, wider-coverage, cleaner, duplicate-free, and with richer metadata. As a response to that wish, Lexical Computing Ltd. has a programme to develop very large ‘TenTen’ web corpora. In this paper we introduce the Spanish corpus, esTenTen, of 8 billion words and 19 different national varieties of Spanish. We investigate the distance between the national varieties as represented in the corpus, and examine in detail the keywords of Peninsular Spanish vs. American Spanish, finding a wide range of linguistic, cultural and political contrasts
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
This article is concerned with the choice of a corpus to be used as the empirical basis of a bilingu...
International audienceWe perform a large-scale analysis of language diatopic variation using geotagg...
AbstractEveryone working on general language would like their corpus to be bigger, wider-coverage, c...
AbstractThis paper outlines current work on the construction of a high-quality, richly-annotated and...
Iberia is a synchronic corpus of scientific Spanish designed mainly for terminological studies. In t...
This paper maps the large-scale variation of the Spanish language by employing a corpus based on geo...
International audienceThis paper describes a pilot study in lexical encoding of multi-word expressio...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
In this study we consider the problem of determining whether an English corpus constructed from a gi...
Historical corpora offer many potentialities for linguistic research. Thus, the present article prov...
The University of León has been engaged for several years now in a wide-reaching project on corpus-d...
This proposal requests Level 1 funding to develop a novel Spanish-language corpus, ACTIV-ES. This el...
In this paper, we introduce the Spanish FrameNet Project which is creating an online lexical resourc...
DESCRIPTION: ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexi...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
This article is concerned with the choice of a corpus to be used as the empirical basis of a bilingu...
International audienceWe perform a large-scale analysis of language diatopic variation using geotagg...
AbstractEveryone working on general language would like their corpus to be bigger, wider-coverage, c...
AbstractThis paper outlines current work on the construction of a high-quality, richly-annotated and...
Iberia is a synchronic corpus of scientific Spanish designed mainly for terminological studies. In t...
This paper maps the large-scale variation of the Spanish language by employing a corpus based on geo...
International audienceThis paper describes a pilot study in lexical encoding of multi-word expressio...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
In this study we consider the problem of determining whether an English corpus constructed from a gi...
Historical corpora offer many potentialities for linguistic research. Thus, the present article prov...
The University of León has been engaged for several years now in a wide-reaching project on corpus-d...
This proposal requests Level 1 funding to develop a novel Spanish-language corpus, ACTIV-ES. This el...
In this paper, we introduce the Spanish FrameNet Project which is creating an online lexical resourc...
DESCRIPTION: ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexi...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
This article is concerned with the choice of a corpus to be used as the empirical basis of a bilingu...
International audienceWe perform a large-scale analysis of language diatopic variation using geotagg...