We present TwHIN-BERT, a multilingual language model trained on in-domain data from the popular social network Twitter. TwHIN-BERT differs from prior pre-trained language models as it is trained with not only text-based self-supervision, but also with a social objective based on the rich social engagements within a Twitter heterogeneous information network (TwHIN). Our model is trained on 7 billion tweets covering over 100 distinct languages providing a valuable representation to model short, noisy, user-generated text. We evaluate our model on a variety of multilingual social recommendation and semantic understanding tasks and demonstrate significant metric improvement over established pre-trained language models. We will freely open-sourc...
International audienceMultiword expression (MWE) identification in tweets is a complex task due to t...
Online social networks are widespread means to enact interactive collaboration among people by, e.g....
International audienceUnlabelled - Information is spread as individuals engage with other users in t...
In online domain-specific customer service applications, many companies struggle to deploy advanced ...
Social networks are enormous sources of human-generated content. Users continuously create informa...
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracte...
This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9 "Multilingual Tweet Inti...
[EN] In recent years, the Natural Language Processing community have been moving from uncontextualiz...
Social networks like Twitter are increasingly important in the creation of new ways of communication...
International audienceWe introduce BERTweetFR, the first largescale pre-trained language model for F...
With the proliferation of social media, many studies resort to social media to construct datasets fo...
This paper presents the different models submitted by the LT@Helsinki team for the SemEval2020 Share...
In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a la...
Obtaining meaning-rich representations of social media inputs, such as Tweets (unstructured and nois...
Since BERT appeared, Transformer language models and transfer learning have become state-of-the-art ...
International audienceMultiword expression (MWE) identification in tweets is a complex task due to t...
Online social networks are widespread means to enact interactive collaboration among people by, e.g....
International audienceUnlabelled - Information is spread as individuals engage with other users in t...
In online domain-specific customer service applications, many companies struggle to deploy advanced ...
Social networks are enormous sources of human-generated content. Users continuously create informa...
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracte...
This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9 "Multilingual Tweet Inti...
[EN] In recent years, the Natural Language Processing community have been moving from uncontextualiz...
Social networks like Twitter are increasingly important in the creation of new ways of communication...
International audienceWe introduce BERTweetFR, the first largescale pre-trained language model for F...
With the proliferation of social media, many studies resort to social media to construct datasets fo...
This paper presents the different models submitted by the LT@Helsinki team for the SemEval2020 Share...
In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a la...
Obtaining meaning-rich representations of social media inputs, such as Tweets (unstructured and nois...
Since BERT appeared, Transformer language models and transfer learning have become state-of-the-art ...
International audienceMultiword expression (MWE) identification in tweets is a complex task due to t...
Online social networks are widespread means to enact interactive collaboration among people by, e.g....
International audienceUnlabelled - Information is spread as individuals engage with other users in t...