The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases. We seek to understand the who, why, and what behind biases in toxicity annotations. In two online studies with demographically and politically diverse participants, we investigate the effect of annotator identities (who) and beliefs (why), drawing from social psychology research about hate speech, free speech, racist beliefs, political leaning, and more. We disentangle what is annotated as toxic by considering posts with three characteristics: anti-Black language, African American English (AAE) dialect, and vulgarity. Our results show strong a...
In this paper we present a proposal to address the problem of the pricey and unreliable human annota...
Researchers in computer science have spent considerable time developing methods to increase the accu...
In this paper we present a proposal to address the problem of the pricey and unreliable human annota...
Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is i...
Social Networking Sites are home to different forms of hate, including "Misogynoir", which specifica...
Social Networking Sites are home to different forms of hate, including "Misogynoir", which specifica...
Social Networking Sites are home to different forms of hate, including "Misogynoir", which specifica...
Recent research has demonstrated how racial biases against users who write African American English ...
Annotators are not fungible. Their demographics, life experiences, and backgrounds all contribute to...
Due to the rise in toxic speech on social media and other online platforms, there is a growing need ...
Crowdsourced annotation is vital to both collecting labelled data to train and test automated conten...
The rise of toxicity and hate speech on social media has become a cause for concern due to their eff...
Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is...
Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though...
Identifying and annotating toxic online content on social media platforms is an extremely challengin...
In this paper we present a proposal to address the problem of the pricey and unreliable human annota...
Researchers in computer science have spent considerable time developing methods to increase the accu...
In this paper we present a proposal to address the problem of the pricey and unreliable human annota...
Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is i...
Social Networking Sites are home to different forms of hate, including "Misogynoir", which specifica...
Social Networking Sites are home to different forms of hate, including "Misogynoir", which specifica...
Social Networking Sites are home to different forms of hate, including "Misogynoir", which specifica...
Recent research has demonstrated how racial biases against users who write African American English ...
Annotators are not fungible. Their demographics, life experiences, and backgrounds all contribute to...
Due to the rise in toxic speech on social media and other online platforms, there is a growing need ...
Crowdsourced annotation is vital to both collecting labelled data to train and test automated conten...
The rise of toxicity and hate speech on social media has become a cause for concern due to their eff...
Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is...
Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though...
Identifying and annotating toxic online content on social media platforms is an extremely challengin...
In this paper we present a proposal to address the problem of the pricey and unreliable human annota...
Researchers in computer science have spent considerable time developing methods to increase the accu...
In this paper we present a proposal to address the problem of the pricey and unreliable human annota...