Deep neural networks have become the state-of-the-art tool to solve many computer vision problems. However, these algorithms face a lot of computational and optimization challenges. For example, a) the training of deep networks is not only computationally intensive but also requires a lot of manual effort and parameter turning, b) for some particular use-cases, such as adversarial deep networks, it’s even challenging to optimize to achieve good or stable performance. In this dissertation, we address these challenges by targeting the following closely related problems.First, we focus on the problem of automating the step-size and decay parameters in the training of deep networks. Classical stochastic gradient methods for optimization rely on...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
A number of results have recently demonstrated the benefits of incorporating various constraints whe...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
Deep neural network models can achieve greater performance in numerous machine learning tasks by rai...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
In modern supervised learning, many deep neural networks are able to interpolate the data: the empir...
Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that itera...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
Learning a deep neural network requires solving a challenging optimization problem: it is a high-dim...
Learning a deep neural network requires solving a challenging optimization problem: it is a high-dim...
Deep learning has achieved great performance in various areas, such as computer vision, natural lang...
Over the past few years, there have been fundamental breakthroughs in core problems in machine learn...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
A number of results have recently demonstrated the benefits of incorporating various constraints whe...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
Deep neural network models can achieve greater performance in numerous machine learning tasks by rai...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
In modern supervised learning, many deep neural networks are able to interpolate the data: the empir...
Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that itera...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
Learning a deep neural network requires solving a challenging optimization problem: it is a high-dim...
Learning a deep neural network requires solving a challenging optimization problem: it is a high-dim...
Deep learning has achieved great performance in various areas, such as computer vision, natural lang...
Over the past few years, there have been fundamental breakthroughs in core problems in machine learn...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
A number of results have recently demonstrated the benefits of incorporating various constraints whe...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...