Deep learning frameworks optimize the computation graphs and intra-operator computations to boost the inference performance on GPUs, while inter-operator parallelism is usually ignored. In this paper, a unified framework, AutoGraph, is proposed to obtain highly optimized computation graphs in favor of parallel executions of GPU kernels. A novel dynamic programming algorithm, combined with backtracking search, is adopted to explore the optimal graph optimization solution, with the fast performance estimation from the mixed critical path cost. Accurate runtime information based on GPU Multi-Stream launched with CUDA Graph is utilized to determine the convergence of the optimization. Experimental results demonstrate that our method achieves...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose ...
A graph is a ubiquitous data structure that models entities and their interactions through the colle...
Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intellige...
In recent years, machine learning (ML) and, more noticeably, deep learning (DL), have be- come incre...
We present a library that provides optimized implementations for deep learning primitives. Deep lear...
Graph Neural Networks (GNNs) are an important tool for extracting value from relational and unstruct...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to its uniq...
Dynamic graph neural network (DGNN) is becoming increasingly popular because of its widespread use i...
Neural networks stand out from artificial intelligence because they can complete challenging tasks, ...
Our work seeks to improve and adapt computing systems and machine learning (ML) algorithms to match ...
Thesis (Master's)--University of Washington, 2018Embedded platforms with integrated graphics process...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
Deep neural networks (DNN) have recently achieved extraordinary results in domains like computer vis...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose ...
A graph is a ubiquitous data structure that models entities and their interactions through the colle...
Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intellige...
In recent years, machine learning (ML) and, more noticeably, deep learning (DL), have be- come incre...
We present a library that provides optimized implementations for deep learning primitives. Deep lear...
Graph Neural Networks (GNNs) are an important tool for extracting value from relational and unstruct...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to its uniq...
Dynamic graph neural network (DGNN) is becoming increasingly popular because of its widespread use i...
Neural networks stand out from artificial intelligence because they can complete challenging tasks, ...
Our work seeks to improve and adapt computing systems and machine learning (ML) algorithms to match ...
Thesis (Master's)--University of Washington, 2018Embedded platforms with integrated graphics process...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
Deep neural networks (DNN) have recently achieved extraordinary results in domains like computer vis...
The spread of deep learning on embedded devices has prompted the development of numerous methods to ...
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose ...
A graph is a ubiquitous data structure that models entities and their interactions through the colle...