It has been hypothesized that neural network models with cyclic connectivity may be more powerful than their feed-forward counterparts. This thesis investigates this hypothesis in several ways. We study the gradient estimation and optimization procedures for several variants of these networks. We show how the convergence of the gradient estimation procedures are related to the properties of the networks. Then we consider how to tune the relative rates of gradient estimation and parameter adaptation to ensure successful optimization in these models. We also derive new gradient estimators for stochastic models. First, we port the forward sensitivity analysis method to the stochastic setting. Secondly, we show how to apply measure valued diffe...
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Mul...
Since the discovery of the back-propagation method, many modified and new algorithms have been propo...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...
It has been hypothesized that neural network models with cyclic connectivity may be more powerful th...
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
We study the probabilistic generative models parameterized by feedforward neural networks. An attrac...
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
This thesis aims to characterize the statistical properties of Monte Carlo simulation-based gradient...
Many novel graph neural network models have reported an impressive performance on benchmark dataset,...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
The deep learning optimization community has observed how the neural networks generalization ability...
Optimizing via stochastic gradients is a powerful and exible technique ubiquitously used in machine ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical ...
237 pagesIt seems that in the current age, computers, computation, and data have an increasingly imp...
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Mul...
Since the discovery of the back-propagation method, many modified and new algorithms have been propo...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...
It has been hypothesized that neural network models with cyclic connectivity may be more powerful th...
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
We study the probabilistic generative models parameterized by feedforward neural networks. An attrac...
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
This thesis aims to characterize the statistical properties of Monte Carlo simulation-based gradient...
Many novel graph neural network models have reported an impressive performance on benchmark dataset,...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
The deep learning optimization community has observed how the neural networks generalization ability...
Optimizing via stochastic gradients is a powerful and exible technique ubiquitously used in machine ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical ...
237 pagesIt seems that in the current age, computers, computation, and data have an increasingly imp...
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Mul...
Since the discovery of the back-propagation method, many modified and new algorithms have been propo...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...