The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total, the corpus contains almost 20K annotated messages and 350K tokens. The corpus is described in Luis Rei, Dunja Mladenić, Simon Krek. A Multilingual Social Media Linguistic Corpus. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities. 27–28 September 2016, Ljubljana, Slovenia. http://nl.ijs.si/janes/cmc-corpora2016/proceedings
International audienceDes registres tels que familier, courant et soutenu sont un phénomène immédiat...
This work reviews recent publications addressing the Twitter translation task, and highlights the la...
In recent years, social networks, microblogs, and short message service have deeply penetrated peopl...
Janes-Tweet is an annotated corpus of almost 10 million tweets posted from 2013-06 to 2017-06 by app...
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracte...
The paper investigates if and to what extent some linguistic traits are peculiar to the messages pro...
International audienceThe casual, neutral, and formal language registers are highly perceptible in d...
We present a new corpus of German tweets. Due to the relatively small number of German messages on T...
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators...
Il contributo è incentrato su un’analisi linguistica dell’italiano scritto nella piattaforma sociale...
Social networks like Twitter are increasingly important in the creation of new ways of communication...
The purpose of this study is to develop a corpus, which consists of 2 (two) languages: Bahasa Indone...
Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)...
International audienceThis poster aims to describe issues encountered whilst structuring a corpus of...
A trilingual Latvian-Russian-English corpus of tweets is presented with an analysis of users, langua...
International audienceDes registres tels que familier, courant et soutenu sont un phénomène immédiat...
This work reviews recent publications addressing the Twitter translation task, and highlights the la...
In recent years, social networks, microblogs, and short message service have deeply penetrated peopl...
Janes-Tweet is an annotated corpus of almost 10 million tweets posted from 2013-06 to 2017-06 by app...
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracte...
The paper investigates if and to what extent some linguistic traits are peculiar to the messages pro...
International audienceThe casual, neutral, and formal language registers are highly perceptible in d...
We present a new corpus of German tweets. Due to the relatively small number of German messages on T...
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators...
Il contributo è incentrato su un’analisi linguistica dell’italiano scritto nella piattaforma sociale...
Social networks like Twitter are increasingly important in the creation of new ways of communication...
The purpose of this study is to develop a corpus, which consists of 2 (two) languages: Bahasa Indone...
Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)...
International audienceThis poster aims to describe issues encountered whilst structuring a corpus of...
A trilingual Latvian-Russian-English corpus of tweets is presented with an analysis of users, langua...
International audienceDes registres tels que familier, courant et soutenu sont un phénomène immédiat...
This work reviews recent publications addressing the Twitter translation task, and highlights the la...
In recent years, social networks, microblogs, and short message service have deeply penetrated peopl...