Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Hsieh, Cheng-Yu
Li, Chun-Liang
Yeh, Chih-Kuan
Nakhost, Hootan
Fujii, Yasuhisa
Ratner, Alexander
Krishna, Ranjay
Lee, Chen-Yu
Pfister, Tomas

Publication date

July 2023

Language

English

Abstract

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for training small models within a multi-task framework. We present three findings across 4...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Abstract

Extracted data

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Abstract

Extracted data

Related items

Related items