International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought renewed attention to old problems related to the Single Instruction, Multiple Data execution model. One of these problems is the reconvergence of divergent threads. A divergence happens at a conditional branch when different threads disagree on the path to follow upon reaching this split point. Divergences may impose a heavy burden on the performance of parallel programs. In this paper we propose a compiler-level optimization to mitigate this performance loss. This optimization consists in merging function call sites located at different paths that sprout from the same branch. We show that our optimization adds negligible overhead on the compile...
For a wide variety of applications, both task and data parallelism must be exploited to achieve the ...
As transistors sizes shrink and architects put more and more cores on chip, computer systems become ...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought ren...
Graphic processing units (GPUs) are composed of a group of single-instruction multiple data (SIMD) s...
Irregular control-flow structures like deeply nested conditional branches are common in real-world s...
Branch divergence is a very commonly occurring performance problem in GPGPU in which the execution o...
This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider ...
Branch divergence has a significant impact on the perfor-mance of GPU programs. We propose two novel...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-pred...
International audienceSingle-Instruction Multiple-Thread (SIMT) micro-architectures implemented in G...
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TL...
Loops in scientific and engineering applications provide a rich source of parallelism. In order to o...
GPU’s SIMD architecture is a double-edged sword con-fronting parallel tasks with control flow diverg...
For a wide variety of applications, both task and data parallelism must be exploited to achieve the ...
As transistors sizes shrink and architects put more and more cores on chip, computer systems become ...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
International audienceThe increasing popularity of Graphics Processing Units (GPUs), has brought ren...
Graphic processing units (GPUs) are composed of a group of single-instruction multiple data (SIMD) s...
Irregular control-flow structures like deeply nested conditional branches are common in real-world s...
Branch divergence is a very commonly occurring performance problem in GPGPU in which the execution o...
This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider ...
Branch divergence has a significant impact on the perfor-mance of GPU programs. We propose two novel...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-pred...
International audienceSingle-Instruction Multiple-Thread (SIMT) micro-architectures implemented in G...
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TL...
Loops in scientific and engineering applications provide a rich source of parallelism. In order to o...
GPU’s SIMD architecture is a double-edged sword con-fronting parallel tasks with control flow diverg...
For a wide variety of applications, both task and data parallelism must be exploited to achieve the ...
As transistors sizes shrink and architects put more and more cores on chip, computer systems become ...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...