Abstract—Many-core processors provide the raw computation power required by modern high-performance multimedia and signal processing workloads. The translation of this into exe-cution performance is often constrained by the overheads of communication between concurrent tasks. This paper presents Pronto, a low overhead message passing system which simplifies the semantics of data movement between communicating tasks by performing buffer management, message synchronization and address translation directly in hardware. The integration of these functions into hardware results in transfer latencies upto 30% shorter than state of the art MPI derivatives. The overheads for communication in a 16-core processor array are under 5 % for 64-word burst ...
With processor speeds no longer doubling every 18–24 months owing to the exponential increase in pow...
This paper presents the design and implementation of an efficient communication system, Pupa, devel...
International audienceIn this paper, we present a generic high performance architecture model for ma...
Many-core processors provide the raw computation power required by modern high-performance multimedi...
Shared-memory and message-passing are two op- posite models to develop parallel computations. The sh...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
This paper describes an efficient mechanism of inter-processor message transfer on loosely-coupled/m...
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) ...
The emergence of multicore processors raises the need to efficiently transfer large amounts of data ...
This thesis presents the design and implementation of a Chip-Multiprocessor (CMP) targeted at stream...
The scalability and performance of parallel applications on distributed-memory multiprocessors depen...
Previous researchers in user-level message-passing parallel computing have attempted to reduce commu...
This paper describes the design of LWSLT, a robust, portable, high-performance messagepassing librar...
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Compute...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
With processor speeds no longer doubling every 18–24 months owing to the exponential increase in pow...
This paper presents the design and implementation of an efficient communication system, Pupa, devel...
International audienceIn this paper, we present a generic high performance architecture model for ma...
Many-core processors provide the raw computation power required by modern high-performance multimedi...
Shared-memory and message-passing are two op- posite models to develop parallel computations. The sh...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
This paper describes an efficient mechanism of inter-processor message transfer on loosely-coupled/m...
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) ...
The emergence of multicore processors raises the need to efficiently transfer large amounts of data ...
This thesis presents the design and implementation of a Chip-Multiprocessor (CMP) targeted at stream...
The scalability and performance of parallel applications on distributed-memory multiprocessors depen...
Previous researchers in user-level message-passing parallel computing have attempted to reduce commu...
This paper describes the design of LWSLT, a robust, portable, high-performance messagepassing librar...
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Compute...
In exascale computing era, applications are executed at larger scale than ever before, whichresults ...
With processor speeds no longer doubling every 18–24 months owing to the exponential increase in pow...
This paper presents the design and implementation of an efficient communication system, Pupa, devel...
International audienceIn this paper, we present a generic high performance architecture model for ma...