Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters; however, this implies that the neural network's activation function must exhibit a degree of continuity which limits the neural network model's uniform approximation capacity to continuous functions. This paper focuses on the case where the discontinuities arise from distinct sub-patterns, each defined on different parts of the input space. We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure that avoids passing gradient updates through the network's only and strategically placed, discontinuous unit. We provide approximation guarantees for our architecture in the space of ...
We study the diffusivity of random walks with transition probabilities depending on the number of co...
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number...
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and ...
Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable...
In this paper a neural network for approximating continuous and discontinuous mappings is described....
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
In this paper, a new Discontinuity Capturing Shallow Neural Network (DCSNN) for approximating $d$-di...
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
We study the overparametrization bounds required for the global convergence of stochastic gradient d...
This paper proposes a new family of algorithms for training neural networks (NNs). These...
2In the framework of discontinuous function approximation and discontinuity interface detection, we ...
The deep learning optimization community has observed how the neural networks generalization ability...
We study the dynamics of gradient descent in learning neural networks for classification problems. U...
The first purpose of this paper is to present a class of algorithms for finding the global minimum o...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
We study the diffusivity of random walks with transition probabilities depending on the number of co...
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number...
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and ...
Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable...
In this paper a neural network for approximating continuous and discontinuous mappings is described....
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
In this paper, a new Discontinuity Capturing Shallow Neural Network (DCSNN) for approximating $d$-di...
The paper studies a stochastic extension of continuous recurrent neural networks and analyzes gradie...
We study the overparametrization bounds required for the global convergence of stochastic gradient d...
This paper proposes a new family of algorithms for training neural networks (NNs). These...
2In the framework of discontinuous function approximation and discontinuity interface detection, we ...
The deep learning optimization community has observed how the neural networks generalization ability...
We study the dynamics of gradient descent in learning neural networks for classification problems. U...
The first purpose of this paper is to present a class of algorithms for finding the global minimum o...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
We study the diffusivity of random walks with transition probabilities depending on the number of co...
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number...
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and ...