PRACE 2IP White PaperOn multi-core clusters or supercomputers, how to get good performance when running high performance computing (HPC) applications is a main concern. In this report, performance oriented auto-tuning strategies and experimental results are presented for stencil HPC applications on multi-core parallel machines. A typical 2D Jacobi benchmark is chosen as the experimental stencil application. The main tuning strategies include data partitioning within a multi-core node, number of threads within a multi-core node, data partitioning for a number of nodes, number of nodes in a multi-core cluster system. The results of the experiments are based on multi-core parallel machines from PRACE or Grid'5000, such as Curie, and Stremi clu...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
Communicated by Guest Editors Our aim is to apply program transformations to stencil codes in order ...
Communicated by Guest Editors The implementation of stencil computations on modern, massively parall...
On multi-core clusters or supercomputers, how to get good performance when running high performance ...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
High Performance Computing (HPC) can be defined as the practice of combining computing power to atta...
This study focuses on the key numerical technique of stencil computations, used in many different sc...
The recent transformation from an environment where gains in computational performance came from inc...
Although modern supercomputers are composed of multicore machines, one can find scientists that stil...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
International audienceHardware accelerators are classic scientific coprocessors in HPC machines. How...
International audienceStencil computation represents an important numerical kernel in scientific com...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
Communicated by Guest Editors Our aim is to apply program transformations to stencil codes in order ...
Communicated by Guest Editors The implementation of stencil computations on modern, massively parall...
On multi-core clusters or supercomputers, how to get good performance when running high performance ...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
High Performance Computing (HPC) can be defined as the practice of combining computing power to atta...
This study focuses on the key numerical technique of stencil computations, used in many different sc...
The recent transformation from an environment where gains in computational performance came from inc...
Although modern supercomputers are composed of multicore machines, one can find scientists that stil...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
International audienceHardware accelerators are classic scientific coprocessors in HPC machines. How...
International audienceStencil computation represents an important numerical kernel in scientific com...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
Communicated by Guest Editors Our aim is to apply program transformations to stencil codes in order ...
Communicated by Guest Editors The implementation of stencil computations on modern, massively parall...