Automatic Fusions of CUDA-GPU Kernels for Parallel Map

Matúš Madzin

Publication date

October 2015

Abstract

When implementing a function mapping on the contem-porary GPU, several contradictory performance factors affecting distribution of computation into GPU kernels have to be balanced. A decomposition-fusion scheme suggests to decompose computational problem to be solved by several simple functions implemented as stan-dalone kernels and to fuse some of these functions later into more complex kernels to improve memory locality. In this paper, a prototype of source-to-source compiler automating the fusion phase is presented and the im-pact of fusions generated by the compiler as well as compiler efficiency is experimentally evaluated. 1

Extracted data

We use cookies to provide a better user experience.

Data Protection

Automatic Fusions of CUDA-GPU Kernels for Parallel Map

Abstract

Extracted data

Automatic Fusions of CUDA-GPU Kernels for Parallel Map

Abstract

Extracted data

Related items

Related items