A growing line of work has investigated the development of neural NLP models that can produce rationales--subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
The black-box nature of neural models has motivated a line of research that aims to generate natural...
Explaining the predictions of AI models is paramount in safety-critical applications, such as in leg...
Recent works have shown explainability and robustness are two crucial ingredients of trustworthy and...
In interpretable NLP, we require faithful rationales that reflect the model's decision-making proces...
The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority...
In recent years, deep learning models have become very powerful – even outperforming humans on a va...
Neural language models show vulnerability to adversarial examples which are semantically similar to ...
With recent advances in natural language processing, rationalization becomes an essential self-expla...
Deep neural networks have proven remarkably effective at solving many classification problems, but h...
Recent research on model interpretability in natural language processing extensively uses feature sc...
Neural language models (LMs) have achieved impressive results on various language-based reasoning ta...
We build on abduction-based explanations for machine learning and develop a method for computing loc...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
The black-box nature of neural models has motivated a line of research that aims to generate natural...
Explaining the predictions of AI models is paramount in safety-critical applications, such as in leg...
Recent works have shown explainability and robustness are two crucial ingredients of trustworthy and...
In interpretable NLP, we require faithful rationales that reflect the model's decision-making proces...
The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority...
In recent years, deep learning models have become very powerful – even outperforming humans on a va...
Neural language models show vulnerability to adversarial examples which are semantically similar to ...
With recent advances in natural language processing, rationalization becomes an essential self-expla...
Deep neural networks have proven remarkably effective at solving many classification problems, but h...
Recent research on model interpretability in natural language processing extensively uses feature sc...
Neural language models (LMs) have achieved impressive results on various language-based reasoning ta...
We build on abduction-based explanations for machine learning and develop a method for computing loc...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
To increase trust in artificial intelligence systems, a promising research direction consists of des...
To increase trust in artificial intelligence systems, a promising research direction consists of des...