Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Jingcheng Zhou
Wei Wei
Ruizhi Zhang
Zhiming Zheng

Open link

Publication date

June 2021

DOI

10.3390/math9131533

Publisher

MDPI AG

Journal

Mathematics

Abstract

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs f...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Abstract

Extracted data

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Abstract

Extracted data

Related items

Related items