Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. Data augmentation – generating new synthetic data from a labeled seed dataset – can help. The efficacy of data augmentation on toxic language classification has not been fully explored. We present the first systematic study on how data augmentation techniques impact performance across toxic language classifiers, ranging from shallow logistic regression architectures to BERT – a state-of-the-art pre-trained Transformer network. We compare the performance of eight techniques on very scarce seed datasets. We show that while BERT performed the best, shallow classifiers performed comparably when trained on data augmented with a combination of thre...
This work examines the role of both cross-lingual zero-shot learning and data augmentation in detect...
Social networks sometimes become a medium for threats, insults, and other types of cyberbullying. A ...
Transformer-based Language Models (LMs) have achieved impressive results on natural language underst...
Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. ...
Thesis (Master's)--University of Washington, 2021Biased associations have been a challenge in the de...
Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we syst...
Toxic language detection systems often falsely flag text that contains minority group mentions as to...
Transformer-based language models are able to generate fluent text and be efficiently adapted across...
A considerable body of research deals with the automatic identification of hate speech and related ...
International audienceDeep Neural Network (DNN) based classifiers have gained increased attention in...
As user-generated contents thrive, so does the spread of toxic comment. Therefore, detecting toxic c...
Abstract This paper presents the results and main findings of the HASOC-2021 Hate/Offensive Languag...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
We discuss the impact of data bias on abusive language detection. We show that classification scores...
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of ...
This work examines the role of both cross-lingual zero-shot learning and data augmentation in detect...
Social networks sometimes become a medium for threats, insults, and other types of cyberbullying. A ...
Transformer-based Language Models (LMs) have achieved impressive results on natural language underst...
Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. ...
Thesis (Master's)--University of Washington, 2021Biased associations have been a challenge in the de...
Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we syst...
Toxic language detection systems often falsely flag text that contains minority group mentions as to...
Transformer-based language models are able to generate fluent text and be efficiently adapted across...
A considerable body of research deals with the automatic identification of hate speech and related ...
International audienceDeep Neural Network (DNN) based classifiers have gained increased attention in...
As user-generated contents thrive, so does the spread of toxic comment. Therefore, detecting toxic c...
Abstract This paper presents the results and main findings of the HASOC-2021 Hate/Offensive Languag...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
We discuss the impact of data bias on abusive language detection. We show that classification scores...
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of ...
This work examines the role of both cross-lingual zero-shot learning and data augmentation in detect...
Social networks sometimes become a medium for threats, insults, and other types of cyberbullying. A ...
Transformer-based Language Models (LMs) have achieved impressive results on natural language underst...