The aim of this paper is to provide a strategy for overcoming the limits of codes employing the FFTW library by implementing a more powerful parallel domain decomposition algorithm and by refining the auto-tuning mechanism that is already implemented in this library. In the first part of this paper we identify some of the major performance bottlenecks present in the current FFTW implementation, in particular the auto-tuning mechanism provided in FFTW. To do this we have tested for the first time on a Blue Gene/Q system a 2D Parallel Domain Decomposition algorithm provided by the 2DECOMP&FFT library. We found that on massively parallel supercomputers such as Blue Gene/Q clusters the performance of this new algorithm is significantly higher. ...
We present a parallel FFT algorithm for SIMD systems following the "Transpose Algorithm" approach. T...
this paper we have introduced a parallelization for the calculation of fluid flow problems on unstru...
With modern advancements in hardware and software technology scaling towards new limits, our compute...
In this paper we present the work carried out by CINECA in the framework of the PRACE-2IP project wh...
Fast Fourier Transform is a class of efficient algorithms used to compute Discrete Fourier Transform...
In this paper we will present part of the work carried out by CINECA in the framework of the PRACE-2...
We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. T...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
This paper presents performance characteristics of a communications-intensive kernel, the complex da...
Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many su...
We present a MPI based software library for computing the fast Fourier transforms on massively paral...
<p>This figure shows the weak scaling of a parallel FMM-based fluid solver on GPUs, from 1 to 4096 p...
The focus of the subject DOE sponsored research concerns parallel methods, algorithms, and software ...
Domain decomposition is the most widely used technique to achieve parallelism in CFD applications. F...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...
We present a parallel FFT algorithm for SIMD systems following the "Transpose Algorithm" approach. T...
this paper we have introduced a parallelization for the calculation of fluid flow problems on unstru...
With modern advancements in hardware and software technology scaling towards new limits, our compute...
In this paper we present the work carried out by CINECA in the framework of the PRACE-2IP project wh...
Fast Fourier Transform is a class of efficient algorithms used to compute Discrete Fourier Transform...
In this paper we will present part of the work carried out by CINECA in the framework of the PRACE-2...
We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. T...
Physics-based simulation, Computational Fluid Dynamics (CFD) in particular, has substantially reshap...
This paper presents performance characteristics of a communications-intensive kernel, the complex da...
Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many su...
We present a MPI based software library for computing the fast Fourier transforms on massively paral...
<p>This figure shows the weak scaling of a parallel FMM-based fluid solver on GPUs, from 1 to 4096 p...
The focus of the subject DOE sponsored research concerns parallel methods, algorithms, and software ...
Domain decomposition is the most widely used technique to achieve parallelism in CFD applications. F...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...
We present a parallel FFT algorithm for SIMD systems following the "Transpose Algorithm" approach. T...
this paper we have introduced a parallelization for the calculation of fluid flow problems on unstru...
With modern advancements in hardware and software technology scaling towards new limits, our compute...