Training deep neural networks is inherently subject to the predefined and fixed loss functions during optimizing. To improve learning efficiency, we develop Stochastic Loss Function (SLF) to dynamically and automatically generating appropriate gradients to train deep networks in the same round of back-propagation, while maintaining the completeness and differentiability of the training pipeline. In SLF, a generic loss function is formulated as a joint optimization problem of network weights and loss parameters. In order to guarantee the requisite efficiency, gradients with the respect to the generic differentiable loss are leveraged for selecting loss function and optimizing network weights. Extensive experiments on a variety of popular dat...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Training deep neural networks is inherently subject to the predefined and fixed loss functions durin...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
This thesis characterizes the training process of deep neural networks. We are driven by two apparen...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
In this thesis, we study model parameterization for deep learning applications. Part of the mathemat...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
Deep neural networks have achieved significant success in a number of challenging engineering proble...
The deep learning optimization community has observed how the neural networks generalization ability...
Deep neural networks have achieved significant success in a number of challenging engineering proble...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in ma...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Training deep neural networks is inherently subject to the predefined and fixed loss functions durin...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
This thesis characterizes the training process of deep neural networks. We are driven by two apparen...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
In this thesis, we study model parameterization for deep learning applications. Part of the mathemat...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
Deep neural networks have achieved significant success in a number of challenging engineering proble...
The deep learning optimization community has observed how the neural networks generalization ability...
Deep neural networks have achieved significant success in a number of challenging engineering proble...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in ma...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...