We study regional similarities and differences in language use on an anonymous mobile chat application in the German-speaking area. We use a neural network on 2.3 million online conversations to automatically learn representations of words and cities. These linguistic-use-based representations capture regional distinctions in a high-dimensional vector space that can be clustered and visualized to discover patterns in the data. We find that the resulting regional patterns are closely linked to the traditional division of German dialects, even though most of the conversations are written in standard German. The resulting maps correspond to traditional dialect divisions and language-external spatial structures, with a few notable exceptions th...
Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and...
Abstract—Having access to content of messages sent by some given group of subscribers of a social ne...
Trabajo presentado en el Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDi...
We study regional similarities and differences in language use on an anonymous mobile chat applicati...
We study regional variation in social media communications collected from the chat app "Jodel". Usin...
In this paper we present a new computational technique to detect and analyze statistically significa...
Electronic social media offers new opportunities for informal communication in written language, whi...
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the re...
Geotagged Twitter data allows us to investigate correlations of geographic language variation, both ...
Research on regional linguistic variation typically involves data collection in the field. This proc...
Research on regional linguistic variation typically involves data collection in the field. This proc...
Research on regional linguistic variation typically involves data collection in the field. This proc...
We present a first attempt at classifying German tweets by region using only the text of the tweets....
Research on regional linguistic variation typically involves data collection in the field. This proc...
Principal component analysis (PCA) and related techniques have been success-fully employed in natura...
Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and...
Abstract—Having access to content of messages sent by some given group of subscribers of a social ne...
Trabajo presentado en el Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDi...
We study regional similarities and differences in language use on an anonymous mobile chat applicati...
We study regional variation in social media communications collected from the chat app "Jodel". Usin...
In this paper we present a new computational technique to detect and analyze statistically significa...
Electronic social media offers new opportunities for informal communication in written language, whi...
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the re...
Geotagged Twitter data allows us to investigate correlations of geographic language variation, both ...
Research on regional linguistic variation typically involves data collection in the field. This proc...
Research on regional linguistic variation typically involves data collection in the field. This proc...
Research on regional linguistic variation typically involves data collection in the field. This proc...
We present a first attempt at classifying German tweets by region using only the text of the tweets....
Research on regional linguistic variation typically involves data collection in the field. This proc...
Principal component analysis (PCA) and related techniques have been success-fully employed in natura...
Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and...
Abstract—Having access to content of messages sent by some given group of subscribers of a social ne...
Trabajo presentado en el Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDi...