Robustness and counterfactual bias are usually evaluated on a test dataset. However, are these evaluations robust? In other words, if a model is robust or unbiased on a test set, will the properties still hold under a slightly perturbed test set? In this paper, we propose a ``double perturbation'' framework to uncover model weaknesses beyond the test dataset. The framework first perturbs the test dataset to construct abundant natural sentences similar to the test data, and then diagnoses the prediction change regarding a single-word substitution. We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias. (1) For robustness, we focus on synonym substitutions and ide...
International audienceCounterfactual explanations have become a mainstay of the XAI field. This part...
Counterfactual prediction methods are required when a model will be deployed in a setting where trea...
Model Zoo (PyTorch) of non-adversarially trained models for Robust Models are less Over-Confident (N...
The use of counterfactual explanations (CFXs) is an increasingly popular explanation strategy for ma...
Recent studies have revealed that Machine Learning (ML) models are vulnerable to adversarial perturb...
Correctly quantifying the robustness of machine learning models is a central aspect in judging their...
Spurious correlations threaten the validity of statistical classifiers. While model accuracy may app...
Deep learning is a common method to create models for the binary task of classifying comments on onl...
State-of-the-art deep NLP models have achieved impressive improvements on many tasks. However, they ...
Neural language models show vulnerability to adversarial examples which are semantically similar to ...
Counterfactual explanations are a prominent example of post-hoc interpretability methods in the expl...
The use of counterfactual explanations (CFXs) is an increasingly popular explanation strategy for ma...
Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algor...
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
International audienceCounterfactual explanations have become a mainstay of the XAI field. This part...
Counterfactual prediction methods are required when a model will be deployed in a setting where trea...
Model Zoo (PyTorch) of non-adversarially trained models for Robust Models are less Over-Confident (N...
The use of counterfactual explanations (CFXs) is an increasingly popular explanation strategy for ma...
Recent studies have revealed that Machine Learning (ML) models are vulnerable to adversarial perturb...
Correctly quantifying the robustness of machine learning models is a central aspect in judging their...
Spurious correlations threaten the validity of statistical classifiers. While model accuracy may app...
Deep learning is a common method to create models for the binary task of classifying comments on onl...
State-of-the-art deep NLP models have achieved impressive improvements on many tasks. However, they ...
Neural language models show vulnerability to adversarial examples which are semantically similar to ...
Counterfactual explanations are a prominent example of post-hoc interpretability methods in the expl...
The use of counterfactual explanations (CFXs) is an increasingly popular explanation strategy for ma...
Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algor...
Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
International audienceCounterfactual explanations have become a mainstay of the XAI field. This part...
Counterfactual prediction methods are required when a model will be deployed in a setting where trea...
Model Zoo (PyTorch) of non-adversarially trained models for Robust Models are less Over-Confident (N...