International audienceA common approach in HPC applications is to use MPI and OpenMP programming models to express the parallelism. Refined solutions are using asynchronous communications to take advantage from overlapping. In this paper we propose an early implementation of the ML-FMM algorithm using GASPI asynchronous one-sided communications to demonstrate how PGAS ad task based programming can impact the code performance. Early results, on 32 nodes, show a 49% improvement on communications over the optimized MPI+OpenMP version
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
We examined hybrid parallel infrastructures in order to ensure performance and scalability for beam ...
Fast summation methods like the FMM are the backbone of a multitude of simulations in MD, astrophysi...
International audienceA common approach in HPC applications is to use MPI and OpenMP programming mod...
In this paper, a new strategy for the parallelization of the multilevel fast multipole algorithm (ML...
The computational solution of large-scale linear systems of equations necessitates the use of fast a...
Today's supercomputers often consists of clusters of SMP nodes. Both OpenMP and MPI are programming ...
In today's MD simulations the scaling bottleneck is shifted more and more from computation towards c...
As the dawn of the exascale era arrives, high-performance computing (HPC) researchers continue to se...
With a large variety and complexity of existing HPC machines and uncertainty regarding exact future ...
We present efficient algorithms to build data structures and the lists needed for fast multipole met...
The availability of cheap computers with outstanding single-processor performance coupled with Ether...
In this paper, we analyze the communication pattern and study the scalability of a distributed memor...
The paper presents Heterogeneous MPI (HMPI), an extension of MPI for programming high-performance co...
Over the last few decades, Message Passing Interface (MPI) has become the parallel-communication sta...
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
We examined hybrid parallel infrastructures in order to ensure performance and scalability for beam ...
Fast summation methods like the FMM are the backbone of a multitude of simulations in MD, astrophysi...
International audienceA common approach in HPC applications is to use MPI and OpenMP programming mod...
In this paper, a new strategy for the parallelization of the multilevel fast multipole algorithm (ML...
The computational solution of large-scale linear systems of equations necessitates the use of fast a...
Today's supercomputers often consists of clusters of SMP nodes. Both OpenMP and MPI are programming ...
In today's MD simulations the scaling bottleneck is shifted more and more from computation towards c...
As the dawn of the exascale era arrives, high-performance computing (HPC) researchers continue to se...
With a large variety and complexity of existing HPC machines and uncertainty regarding exact future ...
We present efficient algorithms to build data structures and the lists needed for fast multipole met...
The availability of cheap computers with outstanding single-processor performance coupled with Ether...
In this paper, we analyze the communication pattern and study the scalability of a distributed memor...
The paper presents Heterogeneous MPI (HMPI), an extension of MPI for programming high-performance co...
Over the last few decades, Message Passing Interface (MPI) has become the parallel-communication sta...
In the last two decades, physical constraints in chip design have spawned a paradigm shift in comput...
We examined hybrid parallel infrastructures in order to ensure performance and scalability for beam ...
Fast summation methods like the FMM are the backbone of a multitude of simulations in MD, astrophysi...