We conduct a detailed study of the performance effects of irregular communications patterns on the CM-2. We characterize the communications capabilities of the CM-2 under a variety of controlled conditions. In the process of carrying out our performance evaluation, we develop and make extensive use of a parameterized synthetic mesh. In addition we carry out timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns of non-zeros. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Our benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network
Abstract Communications overhead is one of the most important factors affecting per-fonnance in mess...
In this paper we review network related performance issues for current Massively Parallel Processors...
Motivated by observations about job runtimes on the CPlant system, we use a trace-driven microsimula...
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
For parallel computers, the execution time of communication routines is an important determinate of ...
Interprocessor communication overhead is a crucial measure of the power of parallel computing system...
This paper describes a number of optimizations that can be used to support the efficient execution o...
International audienceParallel runtime systems such as MPI or task-based libraries provide models to...
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all commu...
Interprocessor communication overhead is a crucial measure of the power of parallel computing system...
Thinking Machines\u27 CM-5 machine is a distributed-memory, message-passing computer. In this paper ...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
Abstract Communications overhead is one of the most important factors affecting per-fonnance in mess...
In this paper we review network related performance issues for current Massively Parallel Processors...
Motivated by observations about job runtimes on the CPlant system, we use a trace-driven microsimula...
In this paper, we study the communication characteristics of the CM-5 and the performance effects of...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
In irregular all-to-all communication, messages are exchanged between every pair of processors. The ...
For parallel computers, the execution time of communication routines is an important determinate of ...
Interprocessor communication overhead is a crucial measure of the power of parallel computing system...
This paper describes a number of optimizations that can be used to support the efficient execution o...
International audienceParallel runtime systems such as MPI or task-based libraries provide models to...
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all commu...
Interprocessor communication overhead is a crucial measure of the power of parallel computing system...
Thinking Machines\u27 CM-5 machine is a distributed-memory, message-passing computer. In this paper ...
Parallelizing sparse irregular application on distributed memory systems poses serious scalability c...
Abstract Communications overhead is one of the most important factors affecting per-fonnance in mess...
In this paper we review network related performance issues for current Massively Parallel Processors...
Motivated by observations about job runtimes on the CPlant system, we use a trace-driven microsimula...