We explicitly analyze the trajectories of learning near singularities in hierar-chical networks, such as multilayer perceptrons and radial basis function networks which include permutation symmetry of hidden nodes, and show their general prop-erties. Such symmetry induces singularities in their parameter space, where the Fisher information matrix degenerates and strange learning behaviors, especially the existence of plateaus in gradient descent learning, arise due to the geometric structure of singularity. We plot dynamic vector fields to demonstrate the univer-sal trajectories of learning near singularities. The singularity induces two types of plateaus, the on-singularity plateau and the near-singularity plateau, depend-ing on the stabil...
International audienceRecurrent neural networks have been extensively studied in the context of neur...
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2002.I...
The lack of crisp mathematical models that capture the structure of real-world data sets is a major ...
The existence of singularities often affects the learning dynamics in feedforward neural networks. I...
Abstract — We numerically and theoretically demonstrate var-ious singularities, as a dynamical syste...
While the empirical success of self-supervised learning (SSL) heavily relies on the usage of deep no...
We study the effect of learning dynamics on network topology. Firstly, a network of discrete dynamic...
We consider the learning coefficients in learning theory and give two new methods for obtaining thes...
Gradient descent learning algorithms may get stuck in local minima, thus making the learning subopti...
Recently, singular learning theory has been analyzed using algebraic geometry as its basis. It is es...
We study on-line gradient-descent learning in multilayer networks analytically and numerically. The ...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
We consider the idealized setting of gradient flow on the population risk for infinitely wide two-la...
International audienceWe show how a Hopfield network with modifiable recurrent connections undergoin...
Rumelhart, Hinton and Williams [Rumelhart et al. 86] describe a learning procedure for layered netwo...
International audienceRecurrent neural networks have been extensively studied in the context of neur...
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2002.I...
The lack of crisp mathematical models that capture the structure of real-world data sets is a major ...
The existence of singularities often affects the learning dynamics in feedforward neural networks. I...
Abstract — We numerically and theoretically demonstrate var-ious singularities, as a dynamical syste...
While the empirical success of self-supervised learning (SSL) heavily relies on the usage of deep no...
We study the effect of learning dynamics on network topology. Firstly, a network of discrete dynamic...
We consider the learning coefficients in learning theory and give two new methods for obtaining thes...
Gradient descent learning algorithms may get stuck in local minima, thus making the learning subopti...
Recently, singular learning theory has been analyzed using algebraic geometry as its basis. It is es...
We study on-line gradient-descent learning in multilayer networks analytically and numerically. The ...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
We consider the idealized setting of gradient flow on the population risk for infinitely wide two-la...
International audienceWe show how a Hopfield network with modifiable recurrent connections undergoin...
Rumelhart, Hinton and Williams [Rumelhart et al. 86] describe a learning procedure for layered netwo...
International audienceRecurrent neural networks have been extensively studied in the context of neur...
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2002.I...
The lack of crisp mathematical models that capture the structure of real-world data sets is a major ...