Why do Gradient Methods Work in Optimization and Sampling?

Chatterji, Niladri Shekhar

Publication date

January 2021

Publisher

eScholarship, University of California

Abstract

Modern machine learning models are complex, hierarchical, and large-scale and are trained using non-convex objective functions. The algorithms used to train these models, however, are incremental, first-order gradient-based algorithms like gradient descent and Langevin Monte Carlo. Why and when do these seemingly simple algorithms succeed? This question is the focus of this thesis. We will consider three problems. The first problem involves the training of deep neural network classifiers using the logistic loss function with gradient descent. We establish conditions under which gradient descent drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU activati...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Why do Gradient Methods Work in Optimization and Sampling?

Abstract

Extracted data

Why do Gradient Methods Work in Optimization and Sampling?

Abstract

Extracted data

Related items

Related items