Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

Zeng, Yujie
He, Wenlong
Vasyltsov, Ihor
Pang, Jiali
Chen, Lin

Open link

Publication date

June 2023

DOI

10.1609/aaai.v37i9.26321

Publisher

Association for the Advancement of Artificial Intelligence

Abstract

Transformer models are widely used in AI applications such as Natural Language Processing (NLP), Computer Vision (CV), etc. However, enormous computation workload be-comes an obstacle to train large transformer models efficiently. Recently, some methods focus on reducing the computation workload during the training by skipping some layers. How-ever, these methods use simple probability distribution and coarse-grained probability calculation, which significantly affect the model accuracy. To address the issue, in this paper we propose a novel method to accelerate training—Sensitivity-Based Layer Dropping (SBLD). SBLD uses lay-er-wise sensitivity data to switch on/off transformer layers in proper order to keep high accuracy. Besides, we adjus...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

Abstract

Extracted data

Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

Abstract

Extracted data

Related items

Related items