The focus of this work is the automatic performance tuning of stencil computations on Graphics Processing Units (GPUs). A strategy is presented that uses machine learning to determine the best way to use the GPU memory followed by a heuristic that divides the remaining optimizations into groups and exhaustively explores one group at a time. The strategy is evaluated using 104 synthetically generated OpenCL stencil kernels on an Nvidia GTX Titan GPU. The strategy is assessed both in terms of the number of configurations explored during auto-tuning and the quality of the best configuration obtained. Two alternative heuristics that use different groupings of the optimizations are explored. Relative to a random sampling of the space and an exp...
Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requi...
Summary Stencil computation is of paramount importance in many fields, in image processing, structur...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requi...
Summary Stencil computation is of paramount importance in many fields, in image processing, structur...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
We present a novel strategy for automatic performance tuning of GPU programs. The strategy combines ...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations are a class of algorithms operating on multi-dimensional arrays, which update a...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Proc...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requi...
Summary Stencil computation is of paramount importance in many fields, in image processing, structur...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...