The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring increased computational costs, we investigate a family of low-dimensional late-phase weight models which interact multiplicatively with the remaining parameters. Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are compleme...
Abstract We present weight normalization: a reparameterization of the weight vectors in a neural net...
Current deep neural networks are highly overparameterized (up to billions of connection weights) and...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
The largely successful method of training neural networks is to learn their weights using some varia...
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight d...
Over decades, gradient descent has been applied to develop learning algorithm to train a neural netw...
A new method of initializing the weights in deep neural networks is proposed. The method follows two...
Stochasticity and limited precision of synaptic weights in neural network models is a key aspect of ...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak'...
Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. Ho...
A good weight initialization is crucial to accelerate the convergence of the weights in a neural net...
The function and performance of neural networks are largely determined by the evolution of their wei...
Stochasticity and limited precision of synaptic weights in neural network models are key aspects of ...
© 2017 IEEE. Optimization is important in neural networks to iteratively update weights for pattern ...
Abstract We present weight normalization: a reparameterization of the weight vectors in a neural net...
Current deep neural networks are highly overparameterized (up to billions of connection weights) and...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
The largely successful method of training neural networks is to learn their weights using some varia...
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight d...
Over decades, gradient descent has been applied to develop learning algorithm to train a neural netw...
A new method of initializing the weights in deep neural networks is proposed. The method follows two...
Stochasticity and limited precision of synaptic weights in neural network models is a key aspect of ...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak'...
Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. Ho...
A good weight initialization is crucial to accelerate the convergence of the weights in a neural net...
The function and performance of neural networks are largely determined by the evolution of their wei...
Stochasticity and limited precision of synaptic weights in neural network models are key aspects of ...
© 2017 IEEE. Optimization is important in neural networks to iteratively update weights for pattern ...
Abstract We present weight normalization: a reparameterization of the weight vectors in a neural net...
Current deep neural networks are highly overparameterized (up to billions of connection weights) and...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...