We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (FMM) for solving N-body problems. Our research specifically addresses two key challenges. The first chal-lenge is how to engineer fast code for today’s plat-forms. We present the first in-depth study of multi-core optimizations and tuning for FMM, along with a systematic approach for transforming a conventionally-parallelized FMM into a highly-tuned one. We in-troduce novel optimizations that significantly improve the within-node scalability of the FMM, thereby en-abling high-performance in the face of multicore and manycore systems. The second challenge is how to understand scalability on future systems. We present a new algorithmic complexit...
In today's MD simulations the scaling bottleneck is shifted more and more from computation towards c...
O(N) algorithms for N-body simulations enable the simulation of particle systems with up to 100 mill...
In this paper, we analyze the communication pattern and study the scalability of a distributed memor...
The present work attempts to integrate the independent efforts in the fast N-body commu-nity to crea...
This thesis presents a top to bottom analysis on designing and implementing fast algorithms for curr...
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
We describe the design of several portable and efficient parallel implementations of adaptive N-body...
This work presents the first extensive study of single- node performance optimization, tuning, and a...
We present parallel versions of a representative N-body application that uses Greengard and Rokhlin&...
The fast multipole method is an algorithm first developed to approximately solve the N-body problem ...
Among the algorithms that are likely to play a major role in future exascale computing, the fast mul...
<b>Invited Lecture at the SIAM <i>"Encuentro Nacional de Ingeniería Matemática,"</i> at Pontificia U...
The N-body problem appears in many computational physics simulations. At each time step the computat...
The simulation of N-body system has been used extensively in biophysics and chemistry to investigate...
The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing worklo...
In today's MD simulations the scaling bottleneck is shifted more and more from computation towards c...
O(N) algorithms for N-body simulations enable the simulation of particle systems with up to 100 mill...
In this paper, we analyze the communication pattern and study the scalability of a distributed memor...
The present work attempts to integrate the independent efforts in the fast N-body commu-nity to crea...
This thesis presents a top to bottom analysis on designing and implementing fast algorithms for curr...
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
We describe the design of several portable and efficient parallel implementations of adaptive N-body...
This work presents the first extensive study of single- node performance optimization, tuning, and a...
We present parallel versions of a representative N-body application that uses Greengard and Rokhlin&...
The fast multipole method is an algorithm first developed to approximately solve the N-body problem ...
Among the algorithms that are likely to play a major role in future exascale computing, the fast mul...
<b>Invited Lecture at the SIAM <i>"Encuentro Nacional de Ingeniería Matemática,"</i> at Pontificia U...
The N-body problem appears in many computational physics simulations. At each time step the computat...
The simulation of N-body system has been used extensively in biophysics and chemistry to investigate...
The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing worklo...
In today's MD simulations the scaling bottleneck is shifted more and more from computation towards c...
O(N) algorithms for N-body simulations enable the simulation of particle systems with up to 100 mill...
In this paper, we analyze the communication pattern and study the scalability of a distributed memor...