Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Huo, Zhouyuan
Gu, Bin
Huang, Heng

Open link

Publication date

May 2021

DOI

10.1609/aaai.v35i9.16962

Publisher

Association for the Advancement of Artificial Intelligence

Abstract

Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. Warmup is one of nontrivial techniques to stabilize the convergence of large batch training. However, warmup is an empirical method and it is still unknown whether there is a better algorithm with theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We prove the convergence of our algorithm by introducing a new fine-grained analysis of gradient-based methods. Furthermore, the new analysis also helps to understand two other empirical tricks, layer-wise adaptive rate scaling and linear learning rate scaling. We conduct extensi...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Abstract

Extracted data

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Abstract

Extracted data

Related items

Related items