Achieving high-speed network I/O on distributedmemory systems is a hard problem because their architectures are, in general, ill-suited for communication with the external world. One of the problems is that messages are distributed over the private memories of the distributedmemory system. This can result in poor performance since communication includes a complex scatter/gather operation. This paper presents a strategy in which the task of creating large contiguous messages is performed on the distributed-memory system, thus minimizing the overhead on the network interface. The performance results for an implementation of this strategy for an iWarp system with a HIPPI interface board are presented. 1 Introduction Supercomputer applications...
In distributed systems, the lack of global information about data transfer between clients and serve...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
The current trends in high performance computing show that large machines with tens of thousands of ...
Heterogeneity is becoming quite common in distributed parallel computing systems, both in processor ...
Distributed memory multiprocessor architectures offer enormous computational power, by exploiting th...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
The scalability and performance of parallel applications on distributed-memory multiprocessors depen...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
The increasing number of cores per node has propelled the performance of leadershipscale systems fro...
This paper describes a new host interface architecture for high-speed networks operating at 800 of M...
In this paper we present several algorithms for performing all-to-many personalized communication on...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
We will cover distributed memory programming of high-performance supercomputers and datacenter compu...
Passing messages between programs using shared memory, what we refer to as memory-based messaging, i...
In distributed systems, the lack of global information about data transfer between clients and serve...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
The current trends in high performance computing show that large machines with tens of thousands of ...
Heterogeneity is becoming quite common in distributed parallel computing systems, both in processor ...
Distributed memory multiprocessor architectures offer enormous computational power, by exploiting th...
In this work we analyze the communication load imbalance generated by irregular-data applications ru...
The scalability and performance of parallel applications on distributed-memory multiprocessors depen...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
The increasing number of cores per node has propelled the performance of leadershipscale systems fro...
This paper describes a new host interface architecture for high-speed networks operating at 800 of M...
In this paper we present several algorithms for performing all-to-many personalized communication on...
Distributed shared-memory systems provide scalable performance and a convenient model for parallel p...
Since the invention of the transistor, clock frequency increase was the primary method of improving ...
We will cover distributed memory programming of high-performance supercomputers and datacenter compu...
Passing messages between programs using shared memory, what we refer to as memory-based messaging, i...
In distributed systems, the lack of global information about data transfer between clients and serve...
With the current continuation of Moore’s law and the presumed end of improved single core performanc...
The current trends in high performance computing show that large machines with tens of thousands of ...