We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPMD) programming model. It identifies statically several kinds of regular patterns that can occur between adjacent threads, including common computations, memory accesses at consecutive locations or at the same location and uniform control flow. This knowledge can be exploited by SPMD compilers targeting SIMD architectures. We present a compiler pass developed within the Ocelot framework that performs this analysis on NVIDIA CUDA programs at the PTX intermediate language level. Results are compared with optima obtained by simulation of several sets of CUDA benchmarks
AbstractThe past decade has produced numerous CPU architectural innovations. These have included mul...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
The tutorial at CONCUR will provide a practical overview of work undertaken over the last six years ...
We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPM...
Abstract. CUDA is a data parallel programming model that supports several key abstractions- thread b...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
With serial, or sequential, computational operations\u27 growth rate slowing over the past few years...
The most popular multithreaded languages based on the fork-join concurrency model (CIlkPlus, OpenMP)...
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data...
Parallel programming requires a significant amount of developer effort, and creating optimized paral...
General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effe...
With GPU architectures becoming increasingly important due to their large number of parallel process...
Abstract During the past few years the increase of computational power has been realized using more ...
In Compute Unified Device Architecture (CUDA), programmers must manage memory operations, synchroniz...
Enhancing the match between software executions and hardware features is key to computing efficiency...
AbstractThe past decade has produced numerous CPU architectural innovations. These have included mul...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
The tutorial at CONCUR will provide a practical overview of work undertaken over the last six years ...
We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPM...
Abstract. CUDA is a data parallel programming model that supports several key abstractions- thread b...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
With serial, or sequential, computational operations\u27 growth rate slowing over the past few years...
The most popular multithreaded languages based on the fork-join concurrency model (CIlkPlus, OpenMP)...
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data...
Parallel programming requires a significant amount of developer effort, and creating optimized paral...
General purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effe...
With GPU architectures becoming increasingly important due to their large number of parallel process...
Abstract During the past few years the increase of computational power has been realized using more ...
In Compute Unified Device Architecture (CUDA), programmers must manage memory operations, synchroniz...
Enhancing the match between software executions and hardware features is key to computing efficiency...
AbstractThe past decade has produced numerous CPU architectural innovations. These have included mul...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
The tutorial at CONCUR will provide a practical overview of work undertaken over the last six years ...