Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning methods substantially improve accuracy on a given target distribution, they often reduce robustness to distribution shifts. We address this tension by introducing a simple and effective method for improving robustness while fine-tuning: ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT). Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution. On ImageNet and five derived distribution shifts,...
Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples ...
In machine learning, we traditionally evaluate the performance of a single model, averaged over a co...
We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target...
Robustness to natural distribution shifts has seen remarkable progress thanks to recent pre-training...
The conventional recipe for maximizing model accuracy is to (1) train multiple models with various h...
Large pre-trained, zero-shot capable models have shown considerable success both for standard transf...
Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inferenc...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accura...
Cross-domain few-shot learning (CD-FSL), where there are few target samples under extreme difference...
Certified robustness in machine learning has primarily focused on adversarial perturbations of the i...
Nowadays, owing to the superior capacity of the large pre-trained language models (PLM), the PLM-bas...
When deployed in the real world, machine learning models inevitably encounter changes in the data di...
Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift wher...
Real world uses of deep learning require predictable model behavior under distribution shifts. Model...
Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples ...
In machine learning, we traditionally evaluate the performance of a single model, averaged over a co...
We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target...
Robustness to natural distribution shifts has seen remarkable progress thanks to recent pre-training...
The conventional recipe for maximizing model accuracy is to (1) train multiple models with various h...
Large pre-trained, zero-shot capable models have shown considerable success both for standard transf...
Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inferenc...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accura...
Cross-domain few-shot learning (CD-FSL), where there are few target samples under extreme difference...
Certified robustness in machine learning has primarily focused on adversarial perturbations of the i...
Nowadays, owing to the superior capacity of the large pre-trained language models (PLM), the PLM-bas...
When deployed in the real world, machine learning models inevitably encounter changes in the data di...
Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift wher...
Real world uses of deep learning require predictable model behavior under distribution shifts. Model...
Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples ...
In machine learning, we traditionally evaluate the performance of a single model, averaged over a co...
We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target...