The increasing number of cores per processor is turning multicore-based systems in pervasive. This involves dealing with multiple levels of memory in NUMA systems, accessible via complex interconnects in order to dispatch the increasing amount of data required. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one-sided communications becomes more important in these systems, to avoid synchronization between pairs of processes in collective operations implemented using two-sided point to point functions. This Thesis proposes a series of collective algorithms that provide a good performance and scalability. They use hierarchical trees, ove...
More memory hierarchies, NUMA architectures and network-style interconnection are widely used in mod...
The next generations of supercomputers are projected to have hun-dreds of thousands of processors. H...
In earlier work, we showed that the one-sided communication model found in PGAS languages (such as U...
The increasing number of cores per processor is turning manycore-based systems in pervasive. This in...
Optimized collective operations are a crucial performance factor for many scientific applications. T...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
International audienceTo amortize the cost of MPI collective operations, nonblocking collectives hav...
Collective communication allows efficient communication and synchronization among a collection of pr...
To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so a...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
This whitepaper studies the various aspects and challenges of performance scaling on large scale sha...
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
More memory hierarchies, NUMA architectures and network-style interconnection are widely used in mod...
The next generations of supercomputers are projected to have hun-dreds of thousands of processors. H...
In earlier work, we showed that the one-sided communication model found in PGAS languages (such as U...
The increasing number of cores per processor is turning manycore-based systems in pervasive. This in...
Optimized collective operations are a crucial performance factor for many scientific applications. T...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
International audienceTo amortize the cost of MPI collective operations, nonblocking collectives hav...
Collective communication allows efficient communication and synchronization among a collection of pr...
To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so a...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
This whitepaper studies the various aspects and challenges of performance scaling on large scale sha...
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Technology trends suggest that future machines will rely on parallelism to meet increasing performan...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
More memory hierarchies, NUMA architectures and network-style interconnection are widely used in mod...
The next generations of supercomputers are projected to have hun-dreds of thousands of processors. H...
In earlier work, we showed that the one-sided communication model found in PGAS languages (such as U...