When implementing a function mapping on the contem-porary GPU, several contradictory performance factors affecting distribution of computation into GPU kernels have to be balanced. A decomposition-fusion scheme suggests to decompose computational problem to be solved by several simple functions implemented as stan-dalone kernels and to fuse some of these functions later into more complex kernels to improve memory locality. In this paper, a prototype of source-to-source compiler automating the fusion phase is presented and the im-pact of fusions generated by the compiler as well as compiler efficiency is experimentally evaluated. 1
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
state.edu GPUs are a class of specialized parallel architectures with tremen-dous computational powe...
Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single ...
This artifact describes the steps to reproduce the results for the CUDA code generation with kernel ...
Abstract—Recently parallel architectures have entered every area of computing, from multi-core proce...
Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in ...
Recent advances in multi-core and many-core processors requires programmers to exploit an increasing...
In this paper we introduce a novel transformation pass written using LLVM that performs kernel fusio...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
Abstract. CUDA is a data parallel programming model that supports several key abstractions- thread b...
GPUs are getting more and more important in scientific computing, slowly growing from peripheral acc...
The shift toward parallel processor architectures has made programming and code generation increasin...
Recent advances in multi-core and many-core processors re-quires programmers to exploit an increasin...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
state.edu GPUs are a class of specialized parallel architectures with tremen-dous computational powe...
Modern GPUs are able to perform significantly more arithmetic operations than transfers of a single ...
This artifact describes the steps to reproduce the results for the CUDA code generation with kernel ...
Abstract—Recently parallel architectures have entered every area of computing, from multi-core proce...
Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in ...
Recent advances in multi-core and many-core processors requires programmers to exploit an increasing...
In this paper we introduce a novel transformation pass written using LLVM that performs kernel fusio...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
Abstract. CUDA is a data parallel programming model that supports several key abstractions- thread b...
GPUs are getting more and more important in scientific computing, slowly growing from peripheral acc...
The shift toward parallel processor architectures has made programming and code generation increasin...
Recent advances in multi-core and many-core processors re-quires programmers to exploit an increasin...
2012-05-02Graphics Processing Units (GPUs) have evolved to devices with teraflop-level performance p...
This paper presents a novel optimizing compiler for general purpose computation on graphics processi...
state.edu GPUs are a class of specialized parallel architectures with tremen-dous computational powe...