Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requires knowledge of the underlying hardware, the data being operated on, and the implementation of the kernel. This makes portable performance of OpenCL programs a challenging goal, since simple heuristics and statically chosen values fail to exploit the available performance. To address this, we propose the use of machine learning-enabled autotuning to automatically predict workgroup sizes for stencil patterns on CPUs and multi-GPUs. We present three methodologies for predicting workgroup sizes. The first, using classifiers to select the optimal workgroup size. The second and third proposed methodologies employ the novel use of regressors fo...
In the last decade graphics processors (GPUs) have been extensively used to solve computationally i...
Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizati...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requi...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
OpenCL-based high-level synthesis framework is getting popular to used for pro- gramming FPGA as a n...
The implementation of stencil computations on modern, massively parallel systems with GPUs and other...
Open Computing Language (OpenCL) is emerging as a standard for parallel programming of heterogeneous...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedd...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
OpenCL has been designed to achieve functional portability across multi-core devices from different ...
In the last decade graphics processors (GPUs) have been extensively used to solve computationally i...
Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizati...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requi...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
OpenCL-based high-level synthesis framework is getting popular to used for pro- gramming FPGA as a n...
The implementation of stencil computations on modern, massively parallel systems with GPUs and other...
Open Computing Language (OpenCL) is emerging as a standard for parallel programming of heterogeneous...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedd...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using...
OpenCL has been designed to achieve functional portability across multi-core devices from different ...
In the last decade graphics processors (GPUs) have been extensively used to solve computationally i...
Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizati...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...