The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to its unique features, the GPU continues to remain the most widely used accelerator for DL applications. In this paper, we present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review techniques for both inference and training and for both single GPU and distributed system with multiple GPUs. We bring out the similarities and differences of different works and highlight their key attributes. This survey will be useful for both novice and experts in the field of machine learning, processor architecture and high-performance computing
Abstract In the next decade, the demands for computing in large scientific experimen...
Our work seeks to improve and adapt computing systems and machine learning (ML) algorithms to match ...
Abstract. One of the major research trends currently is the evolution of heterogeneous parallel comp...
The aim of this project is to conduct a study of deep learning on multi-core processors. The study i...
Deep learning is an emerging workload in the field of HPC. This powerful method of resolution is abl...
PU is a powerful, pervasive, and indispensable platform for running deep learning (DL) workloads in ...
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing hardware like NVI...
The invention of deep belief network (DBN) provides a powerful tool for data modeling. The key advan...
Neural networks get more difficult and longer time to train if the depth become deeper. As deep neur...
The widely-adopted practice is to train deep learning models with specialized hardware accelerators,...
Presented at DATE Friday Workshop on System-level Design Methods for Deep Learning on Heterogeneous ...
Deep learning (DL) training jobs now constitute a large portion of the jobs in the GPU clusters. Fol...
There are many successful applications to take advantages of massive parallelization on GPU for deep...
Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, ...
Training deep learning (DL) models is a highly compute-intensive task since it involves operating on...
Abstract In the next decade, the demands for computing in large scientific experimen...
Our work seeks to improve and adapt computing systems and machine learning (ML) algorithms to match ...
Abstract. One of the major research trends currently is the evolution of heterogeneous parallel comp...
The aim of this project is to conduct a study of deep learning on multi-core processors. The study i...
Deep learning is an emerging workload in the field of HPC. This powerful method of resolution is abl...
PU is a powerful, pervasive, and indispensable platform for running deep learning (DL) workloads in ...
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing hardware like NVI...
The invention of deep belief network (DBN) provides a powerful tool for data modeling. The key advan...
Neural networks get more difficult and longer time to train if the depth become deeper. As deep neur...
The widely-adopted practice is to train deep learning models with specialized hardware accelerators,...
Presented at DATE Friday Workshop on System-level Design Methods for Deep Learning on Heterogeneous ...
Deep learning (DL) training jobs now constitute a large portion of the jobs in the GPU clusters. Fol...
There are many successful applications to take advantages of massive parallelization on GPU for deep...
Recent advances in Deep Learning (DL) research have been adopted in a wide variety of applications, ...
Training deep learning (DL) models is a highly compute-intensive task since it involves operating on...
Abstract In the next decade, the demands for computing in large scientific experimen...
Our work seeks to improve and adapt computing systems and machine learning (ML) algorithms to match ...
Abstract. One of the major research trends currently is the evolution of heterogeneous parallel comp...