We discuss the impact of data bias on abusive language detection. We show that classification scores on popular datasets reported in previous work are much lower under realistic settings in which this bias is reduced. Such biases are most notably observed on datasets that are created by focused sampling instead of random sampling. Datasets with a higher proportion of implicit abuse are more affected than datasets with a lower proportion
Disparate biases associated with datasets and trained classifiers in hateful and abusive content ide...
Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. ...
A considerable body of research deals with the automatic identification of hate speech and related ...
Avoiding to rely on dataset artifacts to predict hate speech is at the cornerstone of robust and fai...
Abusive language detection is an emerging field in natural language processing which has received a ...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of ...
The use of abusive language online has become an increasingly pervasive problem that damages both in...
Recent research has demonstrated how racial biases against users who write African American English ...
Thesis (Master's)--University of Washington, 2021Biased associations have been a challenge in the de...
We examine the task of detecting implicitly abusive comparisons (e.g. “Your hair looks like you have...
The datasets most widely used for abusive language detection contain lists of messages, usually twe...
While social media offers freedom of self-expression, abusive language carry significant negative so...
Algorithms are widely applied to detect hate speech and abusive language in social media. We investi...
Abusive language is a massive problem in online social platforms. Existing abusive language detectio...
Disparate biases associated with datasets and trained classifiers in hateful and abusive content ide...
Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. ...
A considerable body of research deals with the automatic identification of hate speech and related ...
Avoiding to rely on dataset artifacts to predict hate speech is at the cornerstone of robust and fai...
Abusive language detection is an emerging field in natural language processing which has received a ...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of ...
The use of abusive language online has become an increasingly pervasive problem that damages both in...
Recent research has demonstrated how racial biases against users who write African American English ...
Thesis (Master's)--University of Washington, 2021Biased associations have been a challenge in the de...
We examine the task of detecting implicitly abusive comparisons (e.g. “Your hair looks like you have...
The datasets most widely used for abusive language detection contain lists of messages, usually twe...
While social media offers freedom of self-expression, abusive language carry significant negative so...
Algorithms are widely applied to detect hate speech and abusive language in social media. We investi...
Abusive language is a massive problem in online social platforms. Existing abusive language detectio...
Disparate biases associated with datasets and trained classifiers in hateful and abusive content ide...
Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. ...
A considerable body of research deals with the automatic identification of hate speech and related ...