Model interpretability methods are often used to explain NLP model decisions on tasks such as text classification, where the output space is relatively small. However, when applied to language generation, where the output space often consists of tens of thousands of tokens, these methods are unable to provide informative explanations. Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. To disentangle the different decisions in language modeling, we focus on explaining language models contrastively: we look for salient in...
Analogies play a central role in human commonsense reasoning. The ability to recognize analogies suc...
Contrastive explanations, where one decision is explained in contrast to another, are supposed to be...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
Language models learn and represent language differently than humans; they learn the form and not th...
Thesis (Ph.D.)--University of Washington, 2023The rise of large language models as the workhorse of ...
Funding Information: Supported by EPSRC DTP Grant Number EP/N509814/1Peer reviewedPublisher PD
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
Natural language explanations (NLEs) are a special form of data annotation in which annotators ident...
Pre-trained language models (PLMs) like BERT are being used for almost all language-related tasks, b...
Recent work on interpretability in machine learning and AI has focused on the building of simplified...
Recent work on interpretability in machine learning and AI has focused on the building of simplified...
Recent research on model interpretability in natural language processing extensively uses feature sc...
Deep Neural Networks such as Recurrent Neural Networks and Transformer models are widely adopted for...
Language Generation Models produce words based on the previous context. Although existing methods of...
Do state-of-the-art models for language understanding already have, or can they easily learn, abilit...
Analogies play a central role in human commonsense reasoning. The ability to recognize analogies suc...
Contrastive explanations, where one decision is explained in contrast to another, are supposed to be...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
Language models learn and represent language differently than humans; they learn the form and not th...
Thesis (Ph.D.)--University of Washington, 2023The rise of large language models as the workhorse of ...
Funding Information: Supported by EPSRC DTP Grant Number EP/N509814/1Peer reviewedPublisher PD
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
Natural language explanations (NLEs) are a special form of data annotation in which annotators ident...
Pre-trained language models (PLMs) like BERT are being used for almost all language-related tasks, b...
Recent work on interpretability in machine learning and AI has focused on the building of simplified...
Recent work on interpretability in machine learning and AI has focused on the building of simplified...
Recent research on model interpretability in natural language processing extensively uses feature sc...
Deep Neural Networks such as Recurrent Neural Networks and Transformer models are widely adopted for...
Language Generation Models produce words based on the previous context. Although existing methods of...
Do state-of-the-art models for language understanding already have, or can they easily learn, abilit...
Analogies play a central role in human commonsense reasoning. The ability to recognize analogies suc...
Contrastive explanations, where one decision is explained in contrast to another, are supposed to be...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...