To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability. Recently, sharpness-aware minimization (SAM) establishes a generic scheme for generalization improvements by minimizing the sharpness measure within a small neighborhood and achieves state-of-the-art performance. However, SAM requires two consecutive gradient evaluations for solving the min-max problem and inevitably doubles the training time. In this paper, we resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM. Different from the small adversarial perturbations in SAM, RWP is softer and allows a much larger magnitude of perturbations....
The general features of the optimization problem for the case of overparametrized nonlinear networks...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
Abstract We present weight normalization: a reparameterization of the weight vectors in a neural net...
Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentia...
Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically ove...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve gene...
Network quantization has gained increasing attention since it can significantly reduce the model siz...
In deep learning, optimization plays a vital role. By focusing on image classification, this work in...
Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN...
In an effort to improve generalization in deep learning and automate the process of learning rate sc...
A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve gene...
Deep neural networks train millions of parameters to achieve state-of-the-art performance on a wide ...
Recently, flat-minima optimizers, which seek to find parameters in low loss neighborhoods, have been...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
Abstract We present weight normalization: a reparameterization of the weight vectors in a neural net...
Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentia...
Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically ove...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve gene...
Network quantization has gained increasing attention since it can significantly reduce the model siz...
In deep learning, optimization plays a vital role. By focusing on image classification, this work in...
Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN...
In an effort to improve generalization in deep learning and automate the process of learning rate sc...
A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve gene...
Deep neural networks train millions of parameters to achieve state-of-the-art performance on a wide ...
Recently, flat-minima optimizers, which seek to find parameters in low loss neighborhoods, have been...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
Abstract We present weight normalization: a reparameterization of the weight vectors in a neural net...