This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm. Similarly to checkpoint-ing techniques coming from the literature on Automatic Differentiation, it consists in dynamically selecting the forward activations that are saved during the training phase, and then automatically recomputing missing activations from those previously recorded. We propose an original computation model that combines two types of activation savings: either only storing the layer inputs, or recording the complete history of operations that produced the outputs (this uses more memory, but requires fewer recomputations in the backward phase)...
Naive backpropagation through time has a memory footprint that grows linearly in the sequence length...
Overparameterized deep neural networks have redundant neurons that do not contribute to the network'...
The training of deep neural networks utilizes the backpropagation algorithm which consists of the fo...
In the context of Deep Learning training, memory needs to store activations can prevent ...
Artificial Intelligence is a field that has received a lot of attention recently. Its success is due...
International audienceTraining Deep Neural Networks is known to be an expensive operation, both in t...
Motivated by the goal of enabling energy-efficient and/or lower-cost hardware implementations of dee...
RÉSUMÉ: RÉSUMÉ Les réseaux de neurones artificiels ont gagné en popularité dans divers domaines. Cep...
International audienceRematerialization and offloading are two well known strategies to save memory ...
International audienceWith the emergence of versatile storage systems, multi-level checkpointing (ML...
Memory efficiency is crucial in training deep learning networks on resource-restricted devices. Duri...
International audienceOne of the recent developments in deep learning is the ability to train extrem...
An activation function is an element-wise mathematical function and plays a crucial role in deep neu...
International audienceDeep Learning training memory needs can preventthe user to consider large mode...
The lifecycle of a deep learning application consists of five phases: Data collection, Architecture ...
Naive backpropagation through time has a memory footprint that grows linearly in the sequence length...
Overparameterized deep neural networks have redundant neurons that do not contribute to the network'...
The training of deep neural networks utilizes the backpropagation algorithm which consists of the fo...
In the context of Deep Learning training, memory needs to store activations can prevent ...
Artificial Intelligence is a field that has received a lot of attention recently. Its success is due...
International audienceTraining Deep Neural Networks is known to be an expensive operation, both in t...
Motivated by the goal of enabling energy-efficient and/or lower-cost hardware implementations of dee...
RÉSUMÉ: RÉSUMÉ Les réseaux de neurones artificiels ont gagné en popularité dans divers domaines. Cep...
International audienceRematerialization and offloading are two well known strategies to save memory ...
International audienceWith the emergence of versatile storage systems, multi-level checkpointing (ML...
Memory efficiency is crucial in training deep learning networks on resource-restricted devices. Duri...
International audienceOne of the recent developments in deep learning is the ability to train extrem...
An activation function is an element-wise mathematical function and plays a crucial role in deep neu...
International audienceDeep Learning training memory needs can preventthe user to consider large mode...
The lifecycle of a deep learning application consists of five phases: Data collection, Architecture ...
Naive backpropagation through time has a memory footprint that grows linearly in the sequence length...
Overparameterized deep neural networks have redundant neurons that do not contribute to the network'...
The training of deep neural networks utilizes the backpropagation algorithm which consists of the fo...