Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlapping communication with computation or other communication operations. Using non-blocking communication raises two issues: performance and programmability.In terms of performance, optimizers need to find a good communication schedule and are sometimes constrained by lack of full application knowledge. In terms of programmability, efficiently managing non-blocking communication can prove cumbersome for complex applications.In this paper we present the design principles of HUNT, a runtime system designed to search and exploit some of the available overlap prese...
The performance of a High Performance Parallel or Distributed Computation depends heavily on minimiz...
Developers of scalable libraries and applications for distributed-memory parallel systems face many ...
The PGAS paradigm provides a shared-memory abstraction for programming distributed-memory machines. ...
Overlapping communication with computation is an important optimization on current cluster architect...
Partitioned Global Address Space (PGAS) languages appeared to address programmer productivity in lar...
Asynchronous task-based programming models are gaining popularity to address the programmability and...
Effective overlap of computation and communication is a well understood technique for latency hiding...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
International audienceWe present a dynamic program analysis approach to optimize communication overl...
Multicomputer (distributed memory MIMD machines) have emerged as inexpensive, yet powerful parallel...
Parallel applications commonly face the problem of sitting idle while waiting for remote data to bec...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Abstract Communications overhead is one of the most important factors affecting per-fonnance in mess...
The performance of a High Performance Parallel or Distributed Computation depends heavily on minimiz...
Developers of scalable libraries and applications for distributed-memory parallel systems face many ...
The PGAS paradigm provides a shared-memory abstraction for programming distributed-memory machines. ...
Overlapping communication with computation is an important optimization on current cluster architect...
Partitioned Global Address Space (PGAS) languages appeared to address programmer productivity in lar...
Asynchronous task-based programming models are gaining popularity to address the programmability and...
Effective overlap of computation and communication is a well understood technique for latency hiding...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
The Unified Parallel C (UPC) programming language offers parallelism via logically partitioned share...
International audienceWe present a dynamic program analysis approach to optimize communication overl...
Multicomputer (distributed memory MIMD machines) have emerged as inexpensive, yet powerful parallel...
Parallel applications commonly face the problem of sitting idle while waiting for remote data to bec...
In this thesis, we studied the behavior of parallel programs to understand how to automated the task...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
Abstract Communications overhead is one of the most important factors affecting per-fonnance in mess...
The performance of a High Performance Parallel or Distributed Computation depends heavily on minimiz...
Developers of scalable libraries and applications for distributed-memory parallel systems face many ...
The PGAS paradigm provides a shared-memory abstraction for programming distributed-memory machines. ...