In this paper we present the work carried out by CINECA in the framework of the PRACE-2IP project which had the aim of improving the performance of the FFTW library by refining the auto-tuning mechanism that is already implemented in this library. This optimization was realized with the following activities: Identification of the major bottlenecks present in the current FFTW implementation; Investigation of the auto-tuning mechanism provided in FFTW in order to understand how performance is affected by domain decomposition; Introduction of a new parallel domain decomposition; Construction of a library to improve the performance of the auto-tuning mechanism. In particular, we have compared the performance of the standard Slab Decomposition a...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
FFT implementations today generally fall into two categories: Library generators (such as FFTW and S...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...
The aim of this paper is to provide a strategy for overcoming the limits of codes employing the FFTW...
We present an auto-tuning framework for FFTs on graphics pro-cessors (GPUs). Due to complex design o...
O desenvolvimento de aplicações de forma a atingir níveis de desempenho próximos aos níveis teóricos...
In high-performance computing, excellent node-level performance is required for the efficient use of...
Polynomial multiplication is a key algorithm underlying computer algebra systems (CAS) and its effic...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
Abstract. This paper introduces a formal framework for automatically generating performance optimize...
Fourier methods have revolutionized many fields of science and engineering, such as astronomy, medic...
We present a MPI based software library for computing the fast Fourier transforms on massively paral...
Abstract. Achieving peak performance in important numerical kernels such as dense matrix multiply or...
On multi-core clusters or supercomputers, how to get good performance when running high performance ...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
FFT implementations today generally fall into two categories: Library generators (such as FFTW and S...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...
The aim of this paper is to provide a strategy for overcoming the limits of codes employing the FFTW...
We present an auto-tuning framework for FFTs on graphics pro-cessors (GPUs). Due to complex design o...
O desenvolvimento de aplicações de forma a atingir níveis de desempenho próximos aos níveis teóricos...
In high-performance computing, excellent node-level performance is required for the efficient use of...
Polynomial multiplication is a key algorithm underlying computer algebra systems (CAS) and its effic...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
Abstract. This paper introduces a formal framework for automatically generating performance optimize...
Fourier methods have revolutionized many fields of science and engineering, such as astronomy, medic...
We present a MPI based software library for computing the fast Fourier transforms on massively paral...
Abstract. Achieving peak performance in important numerical kernels such as dense matrix multiply or...
On multi-core clusters or supercomputers, how to get good performance when running high performance ...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural r...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
FFT implementations today generally fall into two categories: Library generators (such as FFTW and S...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...