The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-tra...
5 pages, with extended appendicesInternational audienceHyperparameter optimization (HPO) is crucial ...
no issnIn model averaging a weighted estimator is constructed based on a set of models, extending mo...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data dist...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Models trained on different datasets can be merged by a weighted-averaging of their parameters, but ...
Fine-tuning from a collection of models pre-trained on different domains (a “model zoo”) is emerging...
The standard recipe applied in transfer learning is to finetune a pretrained model on the task-speci...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
As language models scale up, it becomes increasingly expensive to verify research ideas because conc...
We consider prediction based on a main model. When the main model shares partial parameters with sev...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Deep networks are typically trained with many more parameters than the size of the training dataset....
5 pages, with extended appendicesInternational audienceHyperparameter optimization (HPO) is crucial ...
no issnIn model averaging a weighted estimator is constructed based on a set of models, extending mo...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data dist...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Models trained on different datasets can be merged by a weighted-averaging of their parameters, but ...
Fine-tuning from a collection of models pre-trained on different domains (a “model zoo”) is emerging...
The standard recipe applied in transfer learning is to finetune a pretrained model on the task-speci...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
As language models scale up, it becomes increasingly expensive to verify research ideas because conc...
We consider prediction based on a main model. When the main model shares partial parameters with sev...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Deep networks are typically trained with many more parameters than the size of the training dataset....
5 pages, with extended appendicesInternational audienceHyperparameter optimization (HPO) is crucial ...
no issnIn model averaging a weighted estimator is constructed based on a set of models, extending mo...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...