Offering access to information in microblog posts requires successful language identification. Language identification on sparse and noisy data can be challenging. In this paper we explore the performance of a state-of-the-art n-gram-based language identifier, and we introduce two semi-supervised priors to enhance performance at microblog post level: (i) blogger-based prior, using previous posts by the same blogger, and (ii) link-based prior, using the pages linked to from the post. We test our models on five languages (Dutch, English, French, German, and Spanish), and a set of 1,000 tweets per language. Results show that our priors improve accuracy, but that there is still room for improvement
Open-source software available (Microblog Explorer: https://github.com/adbar/microblog-explorer)Inte...
Native Language Identification is one of the growing subfields in Natural Language Processing (NLP)....
none3siMultilingual speakers communicate in more than one language in daily life and on social media...
Offering access to information in microblog posts requires suc-cessful language identification. Lang...
Multilingual posts can potentially affect the outcomes of content analysis on microblog platforms. T...
Abstract Multilingual posts can potentially affect the outcomes of content analysis on microblog pla...
We present an evaluation of “off-the-shelf ” language identification systems as applied to microblog...
Automatic Language Identification (LI) is a widely addressed task, but not all users (for example li...
Microblogging websites, such as Twitter, provide seemingly endless amount of textual information on ...
In social media communication, multilin-gual speakers often switch between lan-guages, and, in such ...
Microblogging websites, such as Twitter, provide seem-ingly endless amount of textual information on...
A raw stream of posts from a microblogging platform such as Twitter contains text written in a large...
We describe the IUCL+ system for the shared task of the First Workshop on Computational Approaches t...
Many algorithms for natural language processing rely on manual feature engineering. In this paper, w...
Resumen: Este paper presenta los resultados de varios experimentos que hacen uso de un algoritmo sen...
Open-source software available (Microblog Explorer: https://github.com/adbar/microblog-explorer)Inte...
Native Language Identification is one of the growing subfields in Natural Language Processing (NLP)....
none3siMultilingual speakers communicate in more than one language in daily life and on social media...
Offering access to information in microblog posts requires suc-cessful language identification. Lang...
Multilingual posts can potentially affect the outcomes of content analysis on microblog platforms. T...
Abstract Multilingual posts can potentially affect the outcomes of content analysis on microblog pla...
We present an evaluation of “off-the-shelf ” language identification systems as applied to microblog...
Automatic Language Identification (LI) is a widely addressed task, but not all users (for example li...
Microblogging websites, such as Twitter, provide seemingly endless amount of textual information on ...
In social media communication, multilin-gual speakers often switch between lan-guages, and, in such ...
Microblogging websites, such as Twitter, provide seem-ingly endless amount of textual information on...
A raw stream of posts from a microblogging platform such as Twitter contains text written in a large...
We describe the IUCL+ system for the shared task of the First Workshop on Computational Approaches t...
Many algorithms for natural language processing rely on manual feature engineering. In this paper, w...
Resumen: Este paper presenta los resultados de varios experimentos que hacen uso de un algoritmo sen...
Open-source software available (Microblog Explorer: https://github.com/adbar/microblog-explorer)Inte...
Native Language Identification is one of the growing subfields in Natural Language Processing (NLP)....
none3siMultilingual speakers communicate in more than one language in daily life and on social media...