The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message delivery to improve communication throughput. In this paper we quantify the performance of one-sided and two-sided communication paradigms with respect to: 1) the optimal available hardware transfer mechanism, 2) message ordering constraints, 3) per node and per core message concurrency. In addition to using Cray native communication APIs, we use UPC and MPI micro-benchmarks to capture one- and two-sided semantics respectively. Our results indicate that relaxing the message delivery order can improve performance up to 4.6x when compared with strict ordering. When hardware allows it, high-level one-sided programming models can already take ad...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
The performance evaluation of multiprocessor interconnects cannot be divorced from issues of traffic...
The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message...
This paper demonstrates the one-sided communication used in languages like UPC can provide a signifi...
One-sided communication in MPI requires the use of one of three different synchro-nization mechanism...
Today's high performance systems are typically built from shared memory nodes connected by a high sp...
In earlier work, we showed that the one-sided communication model found in PGAS languages (such as U...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
The one-sided communication model (or remote memory access) supported by MPI-2 is more convenient to...
MPI-2 provides interfaces for one sided communication, which is becoming increasingly important in s...
Communication hardware and software have a significant impact on the performance of clusters and sup...
Abstract—One-sided communication is important to enable asynchronous communication and data movement...
We systematically evaluate the performance of five implementations of a single, user-level communica...
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2009Parallel programming presents a number o...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
The performance evaluation of multiprocessor interconnects cannot be divorced from issues of traffic...
The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message...
This paper demonstrates the one-sided communication used in languages like UPC can provide a signifi...
One-sided communication in MPI requires the use of one of three different synchro-nization mechanism...
Today's high performance systems are typically built from shared memory nodes connected by a high sp...
In earlier work, we showed that the one-sided communication model found in PGAS languages (such as U...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
The one-sided communication model (or remote memory access) supported by MPI-2 is more convenient to...
MPI-2 provides interfaces for one sided communication, which is becoming increasingly important in s...
Communication hardware and software have a significant impact on the performance of clusters and sup...
Abstract—One-sided communication is important to enable asynchronous communication and data movement...
We systematically evaluate the performance of five implementations of a single, user-level communica...
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2009Parallel programming presents a number o...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
The performance evaluation of multiprocessor interconnects cannot be divorced from issues of traffic...