Cramming: Training a Language Model on a Single GPU in One Day

Geiping, Jonas
Goldstein, Tom

Publication date

December 2022

Language

English

Abstract

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we inves...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Cramming: Training a Language Model on a Single GPU in One Day

Abstract

Extracted data

Cramming: Training a Language Model on a Single GPU in One Day

Abstract

Extracted data

Related items

Related items