On multi-core clusters or supercomputers, how to get good performance when running high performance computing (HPC) applications is a main concern. In this report, performance oriented auto-tuning strategies and experimental results are presented for stencil HPC applications on multi-core parallel machines. A typical 2D Jacobi benchmark is chosen as the experimental stencil application. The main tuning strategies include data partitioning within a multi-core node, number of threads within a multi-core node, data partitioning for a number of nodes, number of nodes in a multi-core cluster system. The results of the experiments are based on multi-core parallel machines from PRACE or Grid'5000, such as Curie, and Stremi cluster
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
International audienceStencil based computation on structured grids is a kernel at the heart of a la...
PRACE 2IP White PaperOn multi-core clusters or supercomputers, how to get good performance when runn...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
High Performance Computing (HPC) can be defined as the practice of combining computing power to atta...
The recent transformation from an environment where gains in computational performance came from inc...
This study focuses on the key numerical technique of stencil computations, used in many different sc...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
International audienceHardware accelerators are classic scientific coprocessors in HPC machines. How...
Although modern supercomputers are composed of multicore machines, one can find scientists that stil...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
International audienceStencil computation represents an important numerical kernel in scientific com...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
International audienceStencil based computation on structured grids is a kernel at the heart of a la...
PRACE 2IP White PaperOn multi-core clusters or supercomputers, how to get good performance when runn...
Understanding the most efficient design and utilization of emerging multicore systems is one of the ...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
High Performance Computing (HPC) can be defined as the practice of combining computing power to atta...
The recent transformation from an environment where gains in computational performance came from inc...
This study focuses on the key numerical technique of stencil computations, used in many different sc...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
The focus of this work is the automatic performance tuning of stencil computations on Graphics Proce...
International audienceHardware accelerators are classic scientific coprocessors in HPC machines. How...
Although modern supercomputers are composed of multicore machines, one can find scientists that stil...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
International audienceStencil computation represents an important numerical kernel in scientific com...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
International audienceStencil based computation on structured grids is a kernel at the heart of a la...