SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

Thangarasa, Vithursan
Gupta, Abhay
Marshall, William
Li, Tianda
Leong, Kevin
DeCoste, Dennis
Lie, Sean
Saxena, Shreyas

Publication date

July 2023

Language

English

Abstract

The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Scaling the model and dataset size has helped improve the performance of LLMs, but unfortunately, this also lead to highly prohibitive computational costs. Pre-training LLMs often require orders of magnitude more FLOPs than fine-tuning and the model capacity often remains the same between the two phases. To achieve training efficiency w.r.t training F...

Extracted data

We use cookies to provide a better user experience.

Data Protection

SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

Abstract

Extracted data

SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

Abstract

Extracted data

Related items

Related items