In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the abuse-inclined version obtained by retraining with posts from the banned communities on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the general pre-...
The massive spread of hate speech, hateful content targeted at specific subpopulations, is a problem...
Automated hate speech detection systems have great potential in the realm of social media but have s...
International audienceThe state-of-the-art abusive language detection models report great in-corpus ...
In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in Engl...
Over the past two decades, online discussion has skyrocketed in scope and scale. However, so has the...
Abstract This paper presents the results and main findings of the HASOC-2021 Hate/Offensive Languag...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
This report was written to describe the systems that were submitted by the team “TheNorth” for the H...
International audienceGenerated hateful and toxic content by a portion of users in social media is a...
Disparate biases associated with datasets and trained classifiers in hateful and abusive content ide...
In the past decade, usage of social media platforms has increased significantly. People use these pl...
The popularity of social media platforms has led to an increase in user-generated content being post...
In this paper we present our submission to sub-task A at SemEval 2020 Task 12: Multilingual Offensiv...
This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch....
This paper presents the different models submitted by the LT@Helsinki team for the SemEval2020 Share...
The massive spread of hate speech, hateful content targeted at specific subpopulations, is a problem...
Automated hate speech detection systems have great potential in the realm of social media but have s...
International audienceThe state-of-the-art abusive language detection models report great in-corpus ...
In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in Engl...
Over the past two decades, online discussion has skyrocketed in scope and scale. However, so has the...
Abstract This paper presents the results and main findings of the HASOC-2021 Hate/Offensive Languag...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
This report was written to describe the systems that were submitted by the team “TheNorth” for the H...
International audienceGenerated hateful and toxic content by a portion of users in social media is a...
Disparate biases associated with datasets and trained classifiers in hateful and abusive content ide...
In the past decade, usage of social media platforms has increased significantly. People use these pl...
The popularity of social media platforms has led to an increase in user-generated content being post...
In this paper we present our submission to sub-task A at SemEval 2020 Task 12: Multilingual Offensiv...
This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch....
This paper presents the different models submitted by the LT@Helsinki team for the SemEval2020 Share...
The massive spread of hate speech, hateful content targeted at specific subpopulations, is a problem...
Automated hate speech detection systems have great potential in the realm of social media but have s...
International audienceThe state-of-the-art abusive language detection models report great in-corpus ...