Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they are reused in backpropagation, some forward tensors can be discarded and recomputed later from saved tensors, so-called checkpoints. This allows, in particular, for resource-constrained heterogeneous environments to make use of all available compute devices. Unfortunately, the definition of these checkpoints is a non-trivial problem and poses a challenge to the programmer—improper or excessive recomputations negate the benefit of checkpointing. In this article, we present XEngine, an approach that sche...
Machine learning has gained success in many application domains including medical data analysis, fin...
Deep learning-based solutions and, in particular, deep neural networks (DNNs) are at the heart of se...
This paper addresses design of accelerators using systolic architectures for training of neural netw...
International audienceRematerialization and offloading are two well known strategies to save memory ...
This paper introduces a new activation checkpointing method which allows to significantly decrease m...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
Deploying deep learning models on various devices has become an important topic. The wave of hardwar...
Auto-scheduling for tensor programs is a process where a search algorithm automatically explores can...
In this thesis, we develop high performance algorithms for certain computations involving dense tens...
The memory space taken to host and process large tensor graphs is a limiting factor for embedded Con...
International audienceTraining Deep Neural Networks is known to be an expensive operation, both in t...
Thesis (Ph.D.)--University of Washington, 2022As the scaling and performance demands for deep learni...
Improving the e ciency of neural networks has great potential impact due to their wide range of pos...
High-performance tensor programs are crucial to guarantee efficient execution of deep neural network...
© 2018 ACM. Going deeper and wider in neural architectures improves their accuracy, while the limite...
Machine learning has gained success in many application domains including medical data analysis, fin...
Deep learning-based solutions and, in particular, deep neural networks (DNNs) are at the heart of se...
This paper addresses design of accelerators using systolic architectures for training of neural netw...
International audienceRematerialization and offloading are two well known strategies to save memory ...
This paper introduces a new activation checkpointing method which allows to significantly decrease m...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
Deploying deep learning models on various devices has become an important topic. The wave of hardwar...
Auto-scheduling for tensor programs is a process where a search algorithm automatically explores can...
In this thesis, we develop high performance algorithms for certain computations involving dense tens...
The memory space taken to host and process large tensor graphs is a limiting factor for embedded Con...
International audienceTraining Deep Neural Networks is known to be an expensive operation, both in t...
Thesis (Ph.D.)--University of Washington, 2022As the scaling and performance demands for deep learni...
Improving the e ciency of neural networks has great potential impact due to their wide range of pos...
High-performance tensor programs are crucial to guarantee efficient execution of deep neural network...
© 2018 ACM. Going deeper and wider in neural architectures improves their accuracy, while the limite...
Machine learning has gained success in many application domains including medical data analysis, fin...
Deep learning-based solutions and, in particular, deep neural networks (DNNs) are at the heart of se...
This paper addresses design of accelerators using systolic architectures for training of neural netw...