In social media communication, multilin-gual speakers often switch between lan-guages, and, in such an environment, au-tomatic language identification becomes both a necessary and challenging task. In this paper, we describe our work in progress on the problem of automatic language identification for the language of social media. We describe a new dataset that we are in the process of cre-ating, which contains Facebook posts and comments that exhibit code mixing be-tween Bengali, English and Hindi. We also present some preliminary word-level language identification experiments using this dataset. Different techniques are employed, including a simple unsuper-vised dictionary-based approach, super-vised word-level classification with and with...
Natural Language Processing (NLP) tools typically struggle to process code-switched data and so ling...
Code-switching is the practice of moving back and forth between two languages in spoken or written f...
The paper reports work on collecting and annotating code-mixed English-Hindi so- cial media text ...
In social media communication, multilin-gual speakers often switch between lan-guages, and, in such ...
In social media communication, multilingual speakers often switch between languages, and, in such an...
In social media communication, multilingual speakers often switch between languages, and, in such an...
ABSTRACT: Automatic understanding of noisy social media text is one of the prime present-day resear...
Language identification at the document level has been considered an almost solved problem in some a...
Language identification at the document level has been considered an almost solved problem in ...
Code-mixing or language-mixing is a linguistic phenomenon where multiple language mix together durin...
Code-mixing is frequently observed in user generated content on social media, especially from multil...
Code-mixing is frequently observed in user generated content on social media, especially from multil...
Automatic analyzing and extracting useful information from the noisy social media content are curren...
Code-Mixing is a frequently observed phenomenon in social media content gen-erated by multi-lingual ...
The paper reports work on collecting and annotating code-mixed English-Hindi so-cial media text (Twi...
Natural Language Processing (NLP) tools typically struggle to process code-switched data and so ling...
Code-switching is the practice of moving back and forth between two languages in spoken or written f...
The paper reports work on collecting and annotating code-mixed English-Hindi so- cial media text ...
In social media communication, multilin-gual speakers often switch between lan-guages, and, in such ...
In social media communication, multilingual speakers often switch between languages, and, in such an...
In social media communication, multilingual speakers often switch between languages, and, in such an...
ABSTRACT: Automatic understanding of noisy social media text is one of the prime present-day resear...
Language identification at the document level has been considered an almost solved problem in some a...
Language identification at the document level has been considered an almost solved problem in ...
Code-mixing or language-mixing is a linguistic phenomenon where multiple language mix together durin...
Code-mixing is frequently observed in user generated content on social media, especially from multil...
Code-mixing is frequently observed in user generated content on social media, especially from multil...
Automatic analyzing and extracting useful information from the noisy social media content are curren...
Code-Mixing is a frequently observed phenomenon in social media content gen-erated by multi-lingual ...
The paper reports work on collecting and annotating code-mixed English-Hindi so-cial media text (Twi...
Natural Language Processing (NLP) tools typically struggle to process code-switched data and so ling...
Code-switching is the practice of moving back and forth between two languages in spoken or written f...
The paper reports work on collecting and annotating code-mixed English-Hindi so- cial media text ...