This paper describes a novel methodology for implementing a common set of collective communication operations on clusters based on symmetric multiprocessor (SMP) nodes. Called Shared-Remote-Memory collectives, or SRM, our approach replaces the point-to-point message passing, traditionally used in implementation of collective message-passing operations, with a combination of shared and remote memory access (RMA) protocols that are used to implement semantics of the collective operations directly. Appropriate embedding of the communication graphs in a cluster maximizes the use of shared memory and reduces network communication. Substantial performance improvements are achieved over the highly optimized commercial IBM implementation and the op...
We describe a methodology for developing high performance programs running on clusters of SMP nodes....
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
In order for collective communication routines to achieve high performance on different platforms, t...
High performance scientific applications require efficient and fast collective communication operati...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Collective communication allows efficient communication and synchronization among a collection of pr...
We discuss the design and high-performance implementation of collective communications operations on...
In this paper we investigate a tunable MPI collective communications library on a cluster of SMPs. M...
Parallel computing on clusters of workstations and personal computers has very high potential, sinc...
This paper describes the design and implementation of mechanisms for latency tolerance in the remote...
Parallel computing on clusters of workstations and personal computers has very high potential, since...
Networks of Workstations (NOW) have become an attractive alternative platform for high performance c...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
Collective communication is an important subset of Message Passing Interface. Improving the perform...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
We describe a methodology for developing high performance programs running on clusters of SMP nodes....
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
In order for collective communication routines to achieve high performance on different platforms, t...
High performance scientific applications require efficient and fast collective communication operati...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Collective communication allows efficient communication and synchronization among a collection of pr...
We discuss the design and high-performance implementation of collective communications operations on...
In this paper we investigate a tunable MPI collective communications library on a cluster of SMPs. M...
Parallel computing on clusters of workstations and personal computers has very high potential, sinc...
This paper describes the design and implementation of mechanisms for latency tolerance in the remote...
Parallel computing on clusters of workstations and personal computers has very high potential, since...
Networks of Workstations (NOW) have become an attractive alternative platform for high performance c...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
Collective communication is an important subset of Message Passing Interface. Improving the perform...
The emergence of meta computers and computational grids makes it feasible to run parallel programs o...
We describe a methodology for developing high performance programs running on clusters of SMP nodes....
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
In order for collective communication routines to achieve high performance on different platforms, t...