In today's MD simulations the scaling bottleneck is shifted more and more from computation towards communication. Especially the calculation of long range interactions in O(n) amplifies this imbalance even more. The Fast Multipole Method used in this project is one algorithm for calculating long range interactions with linear complexity. How can we reduce the total run-time further? To reduce the latency and bandwidth costs along the critical communication path we must change the communication pattern. This leads to additional replication of computation which eventually can be overlapped with communication. To reduce the visibility of such communication inside the algorithm we introduced several abstraction layers. This allows an easy excha...
To efficiently scale dense linear algebra problems to future exascale systems, communication cost mu...
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
PhD ThesisThis thesis develops and evaluates a number of efficient algorithms for performing paralle...
We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (F...
In this book chapter, the authors discuss some important communication issues to obtain a highly sca...
Simulations of interacting particles are common in science and engineering, appearing in such divers...
In this paper, we analyze the communication pattern and study the scalability of a distributed memor...
We consider the problem of communication avoidance in computing interactions between a set of partic...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
Parallelizing large sized problem in parallel systems has always been a challenge for programmer. Th...
International audienceA common approach in HPC applications is to use MPI and OpenMP programming mod...
In parallel computing environments from multicore systems to cloud computers and supercomputers, dat...
This paper describes experiments with two paral-lel implementations of the Fast Multipole Method { o...
Many parallel algorithms exhibit a hypercube communication topology. Such algorithms can easily be e...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
To efficiently scale dense linear algebra problems to future exascale systems, communication cost mu...
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
PhD ThesisThis thesis develops and evaluates a number of efficient algorithms for performing paralle...
We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (F...
In this book chapter, the authors discuss some important communication issues to obtain a highly sca...
Simulations of interacting particles are common in science and engineering, appearing in such divers...
In this paper, we analyze the communication pattern and study the scalability of a distributed memor...
We consider the problem of communication avoidance in computing interactions between a set of partic...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
Parallelizing large sized problem in parallel systems has always been a challenge for programmer. Th...
International audienceA common approach in HPC applications is to use MPI and OpenMP programming mod...
In parallel computing environments from multicore systems to cloud computers and supercomputers, dat...
This paper describes experiments with two paral-lel implementations of the Fast Multipole Method { o...
Many parallel algorithms exhibit a hypercube communication topology. Such algorithms can easily be e...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
To efficiently scale dense linear algebra problems to future exascale systems, communication cost mu...
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
PhD ThesisThis thesis develops and evaluates a number of efficient algorithms for performing paralle...