This dataset for emoji topic prediction was collected by scraping ~1M tweets. We only kept the 24,794 tweets that are written in Hindi and contain at least one emoji. We duplicated all tweets that contain multiple emojis by the number of emojis contained, assigning a single emoji to each copy, which resulted in the final dataset of 118,030 tweets with 700 unique emojis. Due to the imbalanced distribution of emojis in our dataset, we assign emojis to 10 coarse-grained categories. This reduction i.e., from multi-label to multi-class and unique emojis into categories, risks losing the semantic meaning of emojis. Our decision is motivated by how challenging emoji prediction is without such reductions. We pre-processed our data to limit the ri...
International audienceIn this paper we present the system submitted to the SemEval2018 task2 : Multi...
Every day, we send almost 6 billion emoji from our smartphones, but what kinds of patterns can you f...
This paper describes the results of the first shared task on Multilingual Emoji Prediction, organize...
We provide a filtered, pre-processed and anonymized dataset collected originally from the Twitter de...
International audienceEmoji usage drastically increased recently, they are becoming some of the most...
Comunicació presentada a la 12th International AAAI Conference on Web and Social Media (ICWSM 2018) ...
This dataset is created by leveraging the social media platforms such as twitter for developing corp...
As part of a SemEval 2018 shared task an attempt was made to build a system capable of predicting th...
We present our submission to the Semeval 2018 task on emoji prediction. We used a random forest, wit...
This paper presents the release of EmojiNet, the largest machine-readable emoji sense inventory that...
Over the past decade, emoji have emerged as a new and widespread form of digital communication, span...
Comunicació presentada a la 15th Conference of the European Chapter of the Association for Computati...
Comunicació presentada a la Conference on Empirical Methods in Natural Language Processing, celebrad...
Emoji are pictographs commonly used in microblogs as emotion markers, but they can also represent a ...
International audienceIn this paper we present the system submitted to the SemEval2018 task2 : Multi...
Every day, we send almost 6 billion emoji from our smartphones, but what kinds of patterns can you f...
This paper describes the results of the first shared task on Multilingual Emoji Prediction, organize...
We provide a filtered, pre-processed and anonymized dataset collected originally from the Twitter de...
International audienceEmoji usage drastically increased recently, they are becoming some of the most...
Comunicació presentada a la 12th International AAAI Conference on Web and Social Media (ICWSM 2018) ...
This dataset is created by leveraging the social media platforms such as twitter for developing corp...
As part of a SemEval 2018 shared task an attempt was made to build a system capable of predicting th...
We present our submission to the Semeval 2018 task on emoji prediction. We used a random forest, wit...
This paper presents the release of EmojiNet, the largest machine-readable emoji sense inventory that...
Over the past decade, emoji have emerged as a new and widespread form of digital communication, span...
Comunicació presentada a la 15th Conference of the European Chapter of the Association for Computati...
Comunicació presentada a la Conference on Empirical Methods in Natural Language Processing, celebrad...
Emoji are pictographs commonly used in microblogs as emotion markers, but they can also represent a ...
International audienceIn this paper we present the system submitted to the SemEval2018 task2 : Multi...
Every day, we send almost 6 billion emoji from our smartphones, but what kinds of patterns can you f...
This paper describes the results of the first shared task on Multilingual Emoji Prediction, organize...