Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling for a fixed budget of optimization steps. We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of networks and datasets. Our results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the stat...
Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s input...
Neural networks possess an ability to generalize well to data distribution, to an extent that they a...
Importance sampling, a variant of online sampling, is often used in neural network training to impro...
Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that itera...
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep n...
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Alt...
© 2018 Curran Associates Inc.All rights reserved. Batch Normalization (BatchNorm) is a widely adopte...
Utilizing recently introduced concepts from statistics and quantitative risk management, we present ...
Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via stat...
Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via stat...
In this work, we propose to progressively increase the training difficulty during learning a neural ...
We present a comprehensive framework of search methods, such as simulated annealing and batch traini...
In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up t...
Despite the significant success of deep learning in computer vision tasks, cross-domain tasks still ...
We propose a metric for evaluating the generalization ability of deep neural networks trained with m...
Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s input...
Neural networks possess an ability to generalize well to data distribution, to an extent that they a...
Importance sampling, a variant of online sampling, is often used in neural network training to impro...
Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that itera...
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep n...
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Alt...
© 2018 Curran Associates Inc.All rights reserved. Batch Normalization (BatchNorm) is a widely adopte...
Utilizing recently introduced concepts from statistics and quantitative risk management, we present ...
Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via stat...
Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via stat...
In this work, we propose to progressively increase the training difficulty during learning a neural ...
We present a comprehensive framework of search methods, such as simulated annealing and batch traini...
In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up t...
Despite the significant success of deep learning in computer vision tasks, cross-domain tasks still ...
We propose a metric for evaluating the generalization ability of deep neural networks trained with m...
Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s input...
Neural networks possess an ability to generalize well to data distribution, to an extent that they a...
Importance sampling, a variant of online sampling, is often used in neural network training to impro...