The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of a...
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express...
The next generations of supercomputers are projected to have hun-dreds of thousands of processors. H...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
Optimized collective operations are a crucial performance factor for many scientific applications. T...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
International audienceThe increasing number of cores led to scalability issues in modern servers tha...
This whitepaper studies the various aspects and challenges of performance scaling on large scale sha...
More memory hierarchies, NUMA architectures and network-style interconnection are widely used in mod...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
The Partitioned Global Address Space (PGAS) model has been widely used in multi-core clusters as an ...
Current generations of NUMA node clusters feature multicore or manycore processors. Programming such...
To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so a...
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express...
The next generations of supercomputers are projected to have hun-dreds of thousands of processors. H...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
Optimized collective operations are a crucial performance factor for many scientific applications. T...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
International audienceThe increasing number of cores led to scalability issues in modern servers tha...
This whitepaper studies the various aspects and challenges of performance scaling on large scale sha...
More memory hierarchies, NUMA architectures and network-style interconnection are widely used in mod...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
The Partitioned Global Address Space (PGAS) model has been widely used in multi-core clusters as an ...
Current generations of NUMA node clusters feature multicore or manycore processors. Programming such...
To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so a...
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express...
The next generations of supercomputers are projected to have hun-dreds of thousands of processors. H...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...