Abstract W e describe a technique for speed-ing up the performance of global collective operations on a cluster of symmetric multi-processor (SMP) parallel computer. Global collective operations are inherently faster within an SMP computer than between such computers. This algorithm takes advantage of this fact and performs the global collective operations first within the SMP machine, and then completes the operations between the machines. This results in significant im-provement in global collective performance timing, almost twice as fast as conventional MPI global reduction calls in some cases
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
As many scientific applications require large data processing, the importance of parallel I/O has be...
We describe a methodology for developing high performance programs running on clusters of SMP nodes....
This paper describes a novel methodology for implementing a common set of collective communication o...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Further performance improvements of parallel simulation applications will not be reached by simply s...
Workstation cluster multicomputers are increasingly being applied for solving scientific problems th...
Although cluster environments have an enormous potential processing power, real applications that ta...
We describe a methodology for developing high performance programs running on clusters of SMP no...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Several MPI systems for Grid environment, in which clusters are connected by wide-area networks, hav...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
As many scientific applications require large data processing, the importance of parallel I/O has be...
We describe a methodology for developing high performance programs running on clusters of SMP nodes....
This paper describes a novel methodology for implementing a common set of collective communication o...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Collective communications occupy 20-90% of total execution times in many MPI applications. In this p...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Further performance improvements of parallel simulation applications will not be reached by simply s...
Workstation cluster multicomputers are increasingly being applied for solving scientific problems th...
Although cluster environments have an enormous potential processing power, real applications that ta...
We describe a methodology for developing high performance programs running on clusters of SMP no...
Many parallel applications from scientific computing use MPI collective communication operations to ...
Several MPI systems for Grid environment, in which clusters are connected by wide-area networks, hav...
Abstract Many parallel applications from scientific computing use collective MPI communication oper-...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
As many scientific applications require large data processing, the importance of parallel I/O has be...