Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. In this work, inspired by diagonal approximations and factored approximations such as Kronecker-factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC), which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to ...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate ...
For training fully-connected neural networks (FCNNs), we propose a practical approximate second-orde...
Second-order optimization methods applied to train deep neural net- works use the curvature informat...
The current scalable Bayesian methods for Deep Neural Networks (DNNs) often rely on the Fisher Infor...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
Several studies have shown the ability of natural gradient descent to minimize the objective functio...
We design four novel approximations of the Fisher Information Matrix (FIM) that plays a central role...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
The stochastic gradient method is currently the prevailing technology for training neural networks. ...
Second-order optimizers are thought to hold the potential to speed up neural network training, but d...
Deep Learning learning has recently become one of the most predominantly used techniques in the fiel...
It is well-known that second-order optimizer can accelerate the training of deep neural networks, ho...
For a long time, second-order optimization methods have been regarded as computationally inefficient...
Efficiently approximating local curvature information of the loss function is a key tool for optimiz...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate ...
For training fully-connected neural networks (FCNNs), we propose a practical approximate second-orde...
Second-order optimization methods applied to train deep neural net- works use the curvature informat...
The current scalable Bayesian methods for Deep Neural Networks (DNNs) often rely on the Fisher Infor...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
Several studies have shown the ability of natural gradient descent to minimize the objective functio...
We design four novel approximations of the Fisher Information Matrix (FIM) that plays a central role...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
The stochastic gradient method is currently the prevailing technology for training neural networks. ...
Second-order optimizers are thought to hold the potential to speed up neural network training, but d...
Deep Learning learning has recently become one of the most predominantly used techniques in the fiel...
It is well-known that second-order optimizer can accelerate the training of deep neural networks, ho...
For a long time, second-order optimization methods have been regarded as computationally inefficient...
Efficiently approximating local curvature information of the loss function is a key tool for optimiz...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate ...
For training fully-connected neural networks (FCNNs), we propose a practical approximate second-orde...