Many-core processors provide the raw computation power required by modern high-performance multimedia and signal processing workloads. The conversion of this computation power into ex-ecution performance is often constrained by the overheads of communication between concurrent tasks. This paper presents Pronto, a low overhead message passing system which simplies the semantics of data movement between communicating tasks by performing buffer management, message synchronization and address translation directly in hardware. The integration of these functions into hardware results in transfer latencies upto 30 % shorter than state of the art MPI derivatives. The overheads for communication with Pronto in an 18-core processor array are under 5 ...
We describe the design and implementation of MPI-NP, a Myrinet communication system tailored to sup...
This paper discusses some of the issues involved in implementing a shared-address space programming ...
Abstract. The BlueGene/L supercoputer, with 65,536 dual-processor compute nodes, was designed from t...
Abstract—Many-core processors provide the raw computation power required by modern high-performance ...
The design challenge for huge-scale multiprocessors is (1) to min-imize communication overhead, (2) ...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
This paper describes the design of LWSLT, a robust, portable, high-performance messagepassing librar...
Abstract Modern high-end computing systems utilize spe-cialized offload engines to enhance various a...
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) ...
Summarization: Highly parallel systems are becoming mainstream in a wide range of sectors ranging fr...
Shared-memory and message-passing are two op- posite models to develop parallel computations. The sh...
This paper presents the design and implementation of an efficient communication system, Pupa, devel...
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) ...
This paper describes an efficient mechanism of inter-processor message transfer on loosely-coupled/m...
With processor speeds no longer doubling every 18–24 months owing to the exponential increase in pow...
We describe the design and implementation of MPI-NP, a Myrinet communication system tailored to sup...
This paper discusses some of the issues involved in implementing a shared-address space programming ...
Abstract. The BlueGene/L supercoputer, with 65,536 dual-processor compute nodes, was designed from t...
Abstract—Many-core processors provide the raw computation power required by modern high-performance ...
The design challenge for huge-scale multiprocessors is (1) to min-imize communication overhead, (2) ...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
This paper describes the design of LWSLT, a robust, portable, high-performance messagepassing librar...
Abstract Modern high-end computing systems utilize spe-cialized offload engines to enhance various a...
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) ...
Summarization: Highly parallel systems are becoming mainstream in a wide range of sectors ranging fr...
Shared-memory and message-passing are two op- posite models to develop parallel computations. The sh...
This paper presents the design and implementation of an efficient communication system, Pupa, devel...
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) ...
This paper describes an efficient mechanism of inter-processor message transfer on loosely-coupled/m...
With processor speeds no longer doubling every 18–24 months owing to the exponential increase in pow...
We describe the design and implementation of MPI-NP, a Myrinet communication system tailored to sup...
This paper discusses some of the issues involved in implementing a shared-address space programming ...
Abstract. The BlueGene/L supercoputer, with 65,536 dual-processor compute nodes, was designed from t...