Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used
This paper investigates the usability of Twitter as a resource for the study of language change in p...
The study of code-switching (CS) speech has produced a wealth of knowledge in the understanding of b...
We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish an...
Many people are multilingual and they may draw from multiple language varieties when writing their m...
We present a novel lexicon-based classification approach for code-switching detection on Twitter. Th...
Immigrant communities host multilingual speakers who switch across languages and cultures in their d...
Immigrant communities host multilingual speakers who switch across languages and cultures in their d...
Immigrant communities host multilingual speakers who switch across languages and cultures in their d...
When code switching, individuals incor-porate elements of multiple languages into the same utterance...
ABSTRACT: Automatic understanding of noisy social media text is one of the prime present-day resear...
Language identification at the document level has been considered an almost solved problem in some a...
This study examines lexical borrowing, code switching, and polylanguaging in Valencian Spanish to be...
This paper discusses the extent to which two characteristics of digital data make such data suitable...
Geotagged Twitter data allows us to investigate correlations of geographic language variation, both ...
This study explores some important issues, namely the occurrences of code switching types, languages...
This paper investigates the usability of Twitter as a resource for the study of language change in p...
The study of code-switching (CS) speech has produced a wealth of knowledge in the understanding of b...
We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish an...
Many people are multilingual and they may draw from multiple language varieties when writing their m...
We present a novel lexicon-based classification approach for code-switching detection on Twitter. Th...
Immigrant communities host multilingual speakers who switch across languages and cultures in their d...
Immigrant communities host multilingual speakers who switch across languages and cultures in their d...
Immigrant communities host multilingual speakers who switch across languages and cultures in their d...
When code switching, individuals incor-porate elements of multiple languages into the same utterance...
ABSTRACT: Automatic understanding of noisy social media text is one of the prime present-day resear...
Language identification at the document level has been considered an almost solved problem in some a...
This study examines lexical borrowing, code switching, and polylanguaging in Valencian Spanish to be...
This paper discusses the extent to which two characteristics of digital data make such data suitable...
Geotagged Twitter data allows us to investigate correlations of geographic language variation, both ...
This study explores some important issues, namely the occurrences of code switching types, languages...
This paper investigates the usability of Twitter as a resource for the study of language change in p...
The study of code-switching (CS) speech has produced a wealth of knowledge in the understanding of b...
We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish an...