Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Wortsman, Mitchell
Ilharco, Gabriel
Gadre, Samir Yitzhak
Roelofs, Rebecca
Gontijo-Lopes, Raphael
Morcos, Ari S.
Namkoong, Hongseok
Farhadi, Ali
Carmon, Yair
Kornblith, Simon
Schmidt, Ludwig

Publication date

July 2022

Abstract

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-tra...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Abstract

Extracted data

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Abstract

Extracted data

Related items

Related items