Abstract—Data-parallel architectures must provide efficient support for complex control-flow constructs to support sophis-ticated applications coded in modern single-program multiple-data languages. As these architectures have wide datapaths that process a single instruction across parallel threads, a mecha-nism is needed to track and sequence threads as they traverse potentially divergent control paths through the program. The design space for divergence management ranges from software-only approaches where divergence is explicitly managed by the compiler, to hardware solutions where divergence is managed implicitly by the microarchitecture. In this paper, we explore this space and propose a new predication-based approach for handling cont...
International audienceThread divergence optimization in GPU architectures have long been hindered by...
International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought ren...
Parallel programming involves finding the potential parallelism in an application, choos-ing an algo...
The energy costs of data movement are limiting the performance scaling of future generations of high...
Graphic processing units (GPUs) are composed of a group of single-instruction multiple data (SIMD) s...
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by ...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-pred...
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data...
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by ...
Branch divergence is a very commonly occurring performance problem in GPGPU in which the execution o...
. We present compiler optimization techniques for explicitly parallel programs that communicate thro...
National audienceParallel architectures following the SIMT model such as GPUs benefit from applicati...
We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPM...
International audienceThread divergence optimization in GPU architectures have long been hindered by...
International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought ren...
Parallel programming involves finding the potential parallelism in an application, choos-ing an algo...
The energy costs of data movement are limiting the performance scaling of future generations of high...
Graphic processing units (GPUs) are composed of a group of single-instruction multiple data (SIMD) s...
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by ...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-pred...
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data...
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by ...
Branch divergence is a very commonly occurring performance problem in GPGPU in which the execution o...
. We present compiler optimization techniques for explicitly parallel programs that communicate thro...
National audienceParallel architectures following the SIMT model such as GPUs benefit from applicati...
We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPM...
International audienceThread divergence optimization in GPU architectures have long been hindered by...
International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought ren...
Parallel programming involves finding the potential parallelism in an application, choos-ing an algo...