When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, violent, or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we first demonstrate a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce. We refer to this capability as self-diagnosis. Based on this finding, we then propose a decoding algorithm that, given only a textual description of the undesir...
Language models (LMs) can reproduce (or amplify) toxic language seen during training, which poses a ...
Natural language understanding (NLU) models often rely on dataset biases rather than intended task-r...
Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, boo...
When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce al...
Warning: this paper contains model outputs exhibiting offensiveness and biases. Recently pre-trained...
NLU models often exploit biases to achieve high dataset-specific performance without properly learni...
Cheap-to-Build Very Large-Language Models (CtB-LLMs) with affordable training are emerging as the ne...
Thesis (Master's)--University of Washington, 2021Biased associations have been a challenge in the de...
Despite their impressive performance in a wide range of NLP tasks, Large Language Models (LLMs) have...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesi...
Recent discoveries have revealed that deep neural networks might behave in a biased manner in many r...
Automated hate speech detection systems have great potential in the realm of social media but have s...
This paper is a summary of the work in my PhD thesis. In which, I investigate the impact of bias in ...
Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we syst...
Language models (LMs) can reproduce (or amplify) toxic language seen during training, which poses a ...
Natural language understanding (NLU) models often rely on dataset biases rather than intended task-r...
Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, boo...
When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce al...
Warning: this paper contains model outputs exhibiting offensiveness and biases. Recently pre-trained...
NLU models often exploit biases to achieve high dataset-specific performance without properly learni...
Cheap-to-Build Very Large-Language Models (CtB-LLMs) with affordable training are emerging as the ne...
Thesis (Master's)--University of Washington, 2021Biased associations have been a challenge in the de...
Despite their impressive performance in a wide range of NLP tasks, Large Language Models (LLMs) have...
Datasets to train models for abusive language detection are at the same time necessary and still sca...
Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesi...
Recent discoveries have revealed that deep neural networks might behave in a biased manner in many r...
Automated hate speech detection systems have great potential in the realm of social media but have s...
This paper is a summary of the work in my PhD thesis. In which, I investigate the impact of bias in ...
Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we syst...
Language models (LMs) can reproduce (or amplify) toxic language seen during training, which poses a ...
Natural language understanding (NLU) models often rely on dataset biases rather than intended task-r...
Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, boo...