Modern machine learning models are complex, hierarchical, and large-scale and are trained using non-convex objective functions. The algorithms used to train these models, however, are incremental, first-order gradient-based algorithms like gradient descent and Langevin Monte Carlo. Why and when do these seemingly simple algorithms succeed? This question is the focus of this thesis. We will consider three problems. The first problem involves the training of deep neural network classifiers using the logistic loss function with gradient descent. We establish conditions under which gradient descent drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU activati...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
An influential line of recent work has focused on the generalization properties of unregularized gra...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
The goal of this paper is to debunk and dispel the magic behind black-box optimizers and stochastic ...
We study the connections between optimization and sampling. In one direction, we study sampling algo...
Neural networks are ubiquitous components of Machine Learning (ML) algorithms. However, training the...
We study the connections between optimization and sampling. In one direction, we study sampling algo...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
Machine learning is a technology developed for extracting predictive models from data so as to be ...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
An influential line of recent work has focused on the generalization properties of unregularized gra...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
The goal of this paper is to debunk and dispel the magic behind black-box optimizers and stochastic ...
We study the connections between optimization and sampling. In one direction, we study sampling algo...
Neural networks are ubiquitous components of Machine Learning (ML) algorithms. However, training the...
We study the connections between optimization and sampling. In one direction, we study sampling algo...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
Machine learning is a technology developed for extracting predictive models from data so as to be ...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
An influential line of recent work has focused on the generalization properties of unregularized gra...