This paper measures similarity both within and between 84 language varieties across nine languages. These corpora are drawn from digital sources (the web and tweets), allowing us to evaluate whether such geo-referenced corpora are reliable for modelling linguistic variation. The basic idea is that, if each source adequately represents a single underlying language variety, then the similarity between these sources should be stable across all languages and countries. The paper shows that there is a consistent agreement between these sources using frequency-based corpus similarity measures. This provides further evidence that digital geo-referenced corpora consistently represent local language varieties
The goal of this paper is to provide a complete representation of regional linguistic variation on a...
There has been a lot of recent interest in the natural language processing (NLP) community in the co...
Large-scale dialect surveys have long been a fundamental component of sociolin-guistics and variatio...
This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sour...
This paper measures the stability of cross-linguistic register variation. A register is a variety of...
In this study we consider the problem of determining whether an English corpus constructed from a gi...
Computational measures of linguistic diversity help us understand the linguistic landscape using dig...
There is a growing trend in sociolinguistics and dialectology to analyse large corpora of social med...
Inspired by work in comparative sociolinguistics and quantitative dialectometry, we sketch a corpus-...
This paper evaluates global-scale dialect identification for 14 national varieties of English as a me...
This paper measures the stability of cross-linguistic register variation. A register is a variety of...
A neural language model trained on a text corpus can be used to induce distributed representations o...
International audienceAs the quality and availability of corpora of lesser-documented languages grow...
A neural language model trained on a text corpus can be used to induce distributed representations o...
This dissertation takes a quantitative perspective on variation in English world-wide. It applies a ...
The goal of this paper is to provide a complete representation of regional linguistic variation on a...
There has been a lot of recent interest in the natural language processing (NLP) community in the co...
Large-scale dialect surveys have long been a fundamental component of sociolin-guistics and variatio...
This paper evaluates large georeferenced corpora, taken from both web-crawled and social media sour...
This paper measures the stability of cross-linguistic register variation. A register is a variety of...
In this study we consider the problem of determining whether an English corpus constructed from a gi...
Computational measures of linguistic diversity help us understand the linguistic landscape using dig...
There is a growing trend in sociolinguistics and dialectology to analyse large corpora of social med...
Inspired by work in comparative sociolinguistics and quantitative dialectometry, we sketch a corpus-...
This paper evaluates global-scale dialect identification for 14 national varieties of English as a me...
This paper measures the stability of cross-linguistic register variation. A register is a variety of...
A neural language model trained on a text corpus can be used to induce distributed representations o...
International audienceAs the quality and availability of corpora of lesser-documented languages grow...
A neural language model trained on a text corpus can be used to induce distributed representations o...
This dissertation takes a quantitative perspective on variation in English world-wide. It applies a ...
The goal of this paper is to provide a complete representation of regional linguistic variation on a...
There has been a lot of recent interest in the natural language processing (NLP) community in the co...
Large-scale dialect surveys have long been a fundamental component of sociolin-guistics and variatio...