Activation function design for deep networks: linearity and effective initialisation

Murray, M
Abrol, V
Tanner, J

Publication date

January 2022

Publisher

Elsevier BV

Abstract

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance σ2/b of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms of test and training accuracy and in terms of tra...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Activation function design for deep networks: linearity and effective initialisation

Abstract

Extracted data

Activation function design for deep networks: linearity and effective initialisation

Abstract

Extracted data

Related items

Related items