Incorporating second-order curvature information into machine learning optimization algorithms can be subtle, and doing so naïvely can lead to high per-iteration costs associated with forming the Hessian and performing the associated linear system solve. To address this, we introduce ADAHESSIAN, a new stochastic optimization algorithm. ADAHESSIAN directly incorporates approximate curvature information from the loss function, and it includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out ...
This paper proposes an improved stochastic second order learning algorithm for supervised neural net...
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appea...
Gradient-based optimization and Markov Chain Monte Carlo sampling can be found at the heart of a mul...
Hessian-based analysis/computation is widely used in scientific computing. However, due to the (inco...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
abstract: This thesis presents a family of adaptive curvature methods for gradient-based stochastic ...
The emergent field of machine learning has by now become the main proponent of data-driven discovery...
The interplay between optimization and machine learning is one of the most important developments in...
In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic opti...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate ...
This work considers optimization methods for large-scale machine learning (ML). Optimization in ML ...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
This paper proposes an improved stochastic second order learning algorithm for supervised neural net...
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appea...
Gradient-based optimization and Markov Chain Monte Carlo sampling can be found at the heart of a mul...
Hessian-based analysis/computation is widely used in scientific computing. However, due to the (inco...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
abstract: This thesis presents a family of adaptive curvature methods for gradient-based stochastic ...
The emergent field of machine learning has by now become the main proponent of data-driven discovery...
The interplay between optimization and machine learning is one of the most important developments in...
In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic opti...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate ...
This work considers optimization methods for large-scale machine learning (ML). Optimization in ML ...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
This paper proposes an improved stochastic second order learning algorithm for supervised neural net...
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appea...
Gradient-based optimization and Markov Chain Monte Carlo sampling can be found at the heart of a mul...