Abstract. We explore the backtracking paradigm with properties seen as sub-optimal for GPU architectures, using as a case study the maximal clique enumeration problem, and find that the presence of these properties limit GPU performance to approximately 1.4--2.25 times a single CPU core. The GPU performance ''lessons'' we find critical to providing this performance include a coarse-and-fine-grain parallelization of the search space, a low-overhead load-balanced distribution of work, global memory latency hiding through coalescence, saturation, and shared memory utilization, and the use of GPU output buffering as a solution to irregular workloads and a large solution domain. We also find a strong reliance on an efficient global problem struc...
Analytical performance models yield valuable architectural insight without incurring the excessive r...
We present an iterative breadth-first approach to maximum clique enumeration on the GPU. The memory ...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
Abstract. We explore the backtracking paradigm with properties seen as sub-optimal for GPU architect...
International audienceNew GPGPU technologies, such as CUDA Dynamic Parallelism (CDP), can help deali...
Advances in parallel computing architectures (e.g., Graphics Processing Units (GPUs)) have had great...
New GPGPU technologies, such as CUDA Dynamic Parallelism (CDP), can help dealing with recursive patt...
The last few years has seen an explosion of effort in designing algorithms that harness the power of...
In recent years the power wall has prevented the continued scaling of single core performance. This ...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
AbstractAlthough volunteer computing with a huge number of high-performance game consoles connected ...
Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of re...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a...
Analytical performance models yield valuable architectural insight without incurring the excessive r...
We present an iterative breadth-first approach to maximum clique enumeration on the GPU. The memory ...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
Abstract. We explore the backtracking paradigm with properties seen as sub-optimal for GPU architect...
International audienceNew GPGPU technologies, such as CUDA Dynamic Parallelism (CDP), can help deali...
Advances in parallel computing architectures (e.g., Graphics Processing Units (GPUs)) have had great...
New GPGPU technologies, such as CUDA Dynamic Parallelism (CDP), can help dealing with recursive patt...
The last few years has seen an explosion of effort in designing algorithms that harness the power of...
In recent years the power wall has prevented the continued scaling of single core performance. This ...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
AbstractAlthough volunteer computing with a huge number of high-performance game consoles connected ...
Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of re...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a...
Analytical performance models yield valuable architectural insight without incurring the excessive r...
We present an iterative breadth-first approach to maximum clique enumeration on the GPU. The memory ...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...