Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes such as training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that it is feasible to build such a resource by building and analysing a pilot corpus, which is made publicly available and can be used to construct a large comparable corpus
This paper presents the current status of the Latvian-Russian parallel corpus, which is an ongoing p...
The increasing popularity of electronic messages challenges social science researchers, particularly...
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been dialectologic...
<p>Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingua...
A trilingual Latvian-Russian-English corpus of tweets is presented with an analysis of users, langua...
This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer ...
This paper presents the Nordic Tweet Stream, a cross-disciplinary digital humanities project that do...
International audienceWe present a 78.8-million-tweet, 1.3-billion-word corpus aimed at studying reg...
We carried out a study in which we explored the feasibility of machine translation for Twitter for t...
This dataset is created by leveraging the social media platforms such as twitter for developing corp...
This paper describes the collection and clas-sification of a multi-dialectal corpus of Ara-bic based...
We present a new corpus of German tweets. Due to the relatively small number of German messages on T...
The paper investigates if and to what extent some linguistic traits are peculiar to the messages pro...
The paper describes the design and implementation of a system for human and machine translation of t...
Janes-Tweet is an annotated corpus of almost 10 million tweets posted from 2013-06 to 2017-06 by app...
This paper presents the current status of the Latvian-Russian parallel corpus, which is an ongoing p...
The increasing popularity of electronic messages challenges social science researchers, particularly...
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been dialectologic...
<p>Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingua...
A trilingual Latvian-Russian-English corpus of tweets is presented with an analysis of users, langua...
This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer ...
This paper presents the Nordic Tweet Stream, a cross-disciplinary digital humanities project that do...
International audienceWe present a 78.8-million-tweet, 1.3-billion-word corpus aimed at studying reg...
We carried out a study in which we explored the feasibility of machine translation for Twitter for t...
This dataset is created by leveraging the social media platforms such as twitter for developing corp...
This paper describes the collection and clas-sification of a multi-dialectal corpus of Ara-bic based...
We present a new corpus of German tweets. Due to the relatively small number of German messages on T...
The paper investigates if and to what extent some linguistic traits are peculiar to the messages pro...
The paper describes the design and implementation of a system for human and machine translation of t...
Janes-Tweet is an annotated corpus of almost 10 million tweets posted from 2013-06 to 2017-06 by app...
This paper presents the current status of the Latvian-Russian parallel corpus, which is an ongoing p...
The increasing popularity of electronic messages challenges social science researchers, particularly...
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been dialectologic...