We present parallel versions of a representative N-body application that uses Greengard and Rokhlin's adaptive Fast Multipole Method (FMM). While parallel implementations of the uniform FMM are straightforward and have been developed on different architectures, the adaptive version complicates the task of obtaining effective parallel performance owing to the nonuniform and dynamically changing nature of the problem domains to which it is applied. We propose and evaluate two techniques for providing load balancing and data locality, both of which take advantage of key insights into the method and its typical applications. Using the better of these techniques, we demonstrate 45-fold speedups on galactic simulations on a 48-processor Stan...
International audienceLearn about the fast multipole method (FMM) and its optimization on NVIDIA GPU...
The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at ...
We present an ecient and provably good partitioning and load balancing algorithm for parallel adapti...
We describe the design of several portable and efficient parallel implementations of adaptive N-body...
It has been shown that fast multipole methods can achieve good scalability on multi-core architectur...
Among the algorithms that are likely to play a major role in future exascale computing, the fast mul...
We present efficient algorithms to build data structures and the lists needed for fast multipole met...
Hierarchical N-body methods, which are based on a fundamental insight into the nature of many physic...
We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (F...
<b>Invited Lecture at the SIAM <i>"Encuentro Nacional de Ingeniería Matemática,"</i> at Pontificia U...
AbstractThis paper presents a parallel version of the fast multipole method (FMM). The FMM is a rece...
The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at ...
This work presents the first extensive study of single- node performance optimization, tuning, and a...
The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing worklo...
The simulation of N-body system has been used extensively in biophysics and chemistry to investigate...
International audienceLearn about the fast multipole method (FMM) and its optimization on NVIDIA GPU...
The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at ...
We present an ecient and provably good partitioning and load balancing algorithm for parallel adapti...
We describe the design of several portable and efficient parallel implementations of adaptive N-body...
It has been shown that fast multipole methods can achieve good scalability on multi-core architectur...
Among the algorithms that are likely to play a major role in future exascale computing, the fast mul...
We present efficient algorithms to build data structures and the lists needed for fast multipole met...
Hierarchical N-body methods, which are based on a fundamental insight into the nature of many physic...
We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (F...
<b>Invited Lecture at the SIAM <i>"Encuentro Nacional de Ingeniería Matemática,"</i> at Pontificia U...
AbstractThis paper presents a parallel version of the fast multipole method (FMM). The FMM is a rece...
The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at ...
This work presents the first extensive study of single- node performance optimization, tuning, and a...
The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing worklo...
The simulation of N-body system has been used extensively in biophysics and chemistry to investigate...
International audienceLearn about the fast multipole method (FMM) and its optimization on NVIDIA GPU...
The Fast Multipole Method allows the rapid evaluation of sums of radial basis functions centered at ...
We present an ecient and provably good partitioning and load balancing algorithm for parallel adapti...