Interpreting Language Models with Contrastive Explanations

Yin, Kayo
Neubig, Graham

Publication date

May 2022

Abstract

Model interpretability methods are often used to explain NLP model decisions on tasks such as text classification, where the output space is relatively small. However, when applied to language generation, where the output space often consists of tens of thousands of tokens, these methods are unable to provide informative explanations. Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. To disentangle the different decisions in language modeling, we focus on explaining language models contrastively: we look for salient in...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Interpreting Language Models with Contrastive Explanations

Abstract

Extracted data

Interpreting Language Models with Contrastive Explanations

Abstract

Extracted data

Related items

Related items