We propose and evaluate a novel strategy for tuning the performance of a class of stencil computations on Graphics Processing Units. The strategy uses a machine learning model to predict the optimal way to load data from memory followed by a heuristic that divides other optimizations into groups and exhaustively explores one group at a time. We use a set of 104 synthetic OpenCL stencil benchmarks that are representative of many real stencil computations. We first demonstrate the need for auto-tuning by showing that the optimization space is sufficiently complex that simple approaches to determining a high-performing configuration fail. We then demonstrate the effectiveness of our approach on NVIDIA and AMD GPUs. Relative to a random samplin...
Summary Stencil computation is of paramount importance in many fields, in image processing, structur...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Stencil computations arise in many scientific computing do-mains, and often represent time-critical ...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
AbstractIt is crucial to optimize stencil computations since they are the core (and most computation...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
AbstractIn this paper we investigate how stencil computations can be implemented on state-of-the-art...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Summary Stencil computation is of paramount importance in many fields, in image processing, structur...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
Stencil computations arise in many scientific computing do-mains, and often represent time-critical ...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
AbstractIt is crucial to optimize stencil computations since they are the core (and most computation...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
AbstractIn this paper we investigate how stencil computations can be implemented on state-of-the-art...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Summary Stencil computation is of paramount importance in many fields, in image processing, structur...
Graphics hardware's performance is advancing much faster than the performance of conventional microp...
We propose a generalized method for adapting and optimizing algorithms for efficient execution on mo...