Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Yao, Zhewei
Wu, Xiaoxia
Li, Conglong
Holmes, Connor
Zhang, Minjia
Li, Cheng
He, Yuxiong

Publication date

November 2022

Language

English

Abstract

Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Compared to other token dropping methods, random-LTD does not require (1) any importance score-based metrics, (2) any special token treatment (e.g., [CLS]), and (3) many layers in full sequence length training except the first and the last layers. Bes...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Abstract

Extracted data

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Abstract

Extracted data

Related items

Related items