Hessian-based analysis/computation is widely used in scientific computing. However, due to the (incorrect, but in our experience widespread) belief that Hessian-based computations are infeasible for large machine learning (ML) problems, the majority of work in ML (except for quite small problems) only performs the first-order methods. However, using sub-sampling and randomized numerical linear algebra algorithms, the computation of second-order methods can be efficiently extracted for large-scale machine learning problems. In this thesis, we consider three use cases of second-order methods as follows: (i) For non-convex optimization and/or ML problems, we propose inexact variants of three classic Newton-type methods---Trust Region method, C...
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions unde...
Training deep neural networks consumes increasing computational resource shares in many compute cent...
Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundament...
We consider variants of trust-region and adaptive cubic regularization methods for non-convex optimi...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
There are several benefits of taking the Hessian of the objective function into account when designi...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL)....
Newton methods can be applied in many supervised learning approaches. However, for large-scale data,...
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appea...
Incorporating second-order curvature information into machine learning optimization algorithms can b...
While first-order methods are popular for solving optimization problems that arise in large-scale de...
In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations...
Efficiently approximating local curvature information of the loss function is a key tool for optimiz...
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions unde...
Training deep neural networks consumes increasing computational resource shares in many compute cent...
Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundament...
We consider variants of trust-region and adaptive cubic regularization methods for non-convex optimi...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
There are several benefits of taking the Hessian of the objective function into account when designi...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL)....
Newton methods can be applied in many supervised learning approaches. However, for large-scale data,...
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appea...
Incorporating second-order curvature information into machine learning optimization algorithms can b...
While first-order methods are popular for solving optimization problems that arise in large-scale de...
In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations...
Efficiently approximating local curvature information of the loss function is a key tool for optimiz...
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions unde...
Training deep neural networks consumes increasing computational resource shares in many compute cent...
Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundament...