Trabajo presentado en el Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial’2017), celebrado en Valencia el 3 de abril de 2017.In the last few years, microblogging platforms such as Twitter have given rise to a deluge of textual data that can be used for the analysis of informal communication between millions of individuals. In this work, we propose an information-theoretic approach to geographic language variation using a corpus based on Twitter. We test our models with tens of concepts and their associated keywords detected in Spanish tweets geolocated in Spain. We employ dialectometric measures (cosine similarity and Jensen-Shannon divergence) to quantify the linguistic distance on the lexical level between cel...
The use of both production and perceptual data has the potential to provide a more complete picture ...
Recent research on dialect variation using social media data has so far provided evidence that spell...
Computer-mediated communication is driving fundamental changes in the nature of written language. We...
This paper maps the large-scale variation of the Spanish language by employing a corpus based on geo...
International audienceWe perform a large-scale analysis of language diatopic variation using geotagg...
We perform a large-scale analysis of language diatopic variation using geotagged mi-croblogging data...
Most NLP applications assume that a particular language is homogeneous in the regions where it is sp...
In this paper we present a new computational technique to detect and analyze statistically significa...
There is a growing trend in sociolinguistics and dialectology to analyse large corpora of social med...
Geotagged Twitter data allows us to investigate correlations of geographic language variation, both ...
Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and...
Electronic social media offers new opportunities for informal communication in written language, whi...
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the re...
Abstract—Having access to content of messages sent by some given group of subscribers of a social ne...
The task of detecting regionalisms (expressions or words used in certain regions) has traditionally ...
The use of both production and perceptual data has the potential to provide a more complete picture ...
Recent research on dialect variation using social media data has so far provided evidence that spell...
Computer-mediated communication is driving fundamental changes in the nature of written language. We...
This paper maps the large-scale variation of the Spanish language by employing a corpus based on geo...
International audienceWe perform a large-scale analysis of language diatopic variation using geotagg...
We perform a large-scale analysis of language diatopic variation using geotagged mi-croblogging data...
Most NLP applications assume that a particular language is homogeneous in the regions where it is sp...
In this paper we present a new computational technique to detect and analyze statistically significa...
There is a growing trend in sociolinguistics and dialectology to analyse large corpora of social med...
Geotagged Twitter data allows us to investigate correlations of geographic language variation, both ...
Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and...
Electronic social media offers new opportunities for informal communication in written language, whi...
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the re...
Abstract—Having access to content of messages sent by some given group of subscribers of a social ne...
The task of detecting regionalisms (expressions or words used in certain regions) has traditionally ...
The use of both production and perceptual data has the potential to provide a more complete picture ...
Recent research on dialect variation using social media data has so far provided evidence that spell...
Computer-mediated communication is driving fundamental changes in the nature of written language. We...