International audienceThe increasing number of cores led to scalability issues in modern servers that were addressed by using non-uniform memory interconnects such as HyperTransport and QPI. These technologies reintroduced Non-Uniform Memory Access (NUMA) architectures. They are also responsible for Non-Uniform Input/Output Access (NUIOA), as I/O devices may be directly connected to a single processor, thus getting faster access to some cores and memory banks than to the others. In this paper, we propose to adapt MPI collective operations to NUIOA constraints. These operations are now often based on the combination of multiple strategies depending on the underlying cluster topology, with local leader processes being used as intermediate. Ou...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 201...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...
International audienceMulticore processors have not only reintroduced Non-Uniform Memory Access (NUM...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
More memory hierarchies, NUMA architectures and network-style interconnection are widely used in mod...
International audienceCurrent generations of NUMA node clusters feature multicore or manycore proces...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
International audienceNowadays, virtualization is a central element in data centers as it allows sha...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
International audienceThe ever-growing level of parallelism within the multi-core and multi-processo...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 201...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...
International audienceMulticore processors have not only reintroduced Non-Uniform Memory Access (NUM...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
More memory hierarchies, NUMA architectures and network-style interconnection are widely used in mod...
International audienceCurrent generations of NUMA node clusters feature multicore or manycore proces...
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
Due to their excellent price-performance ratio, clusters built from commodity nodes have become broa...
International audienceNowadays, virtualization is a central element in data centers as it allows sha...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
International audienceThe ever-growing level of parallelism within the multi-core and multi-processo...
This work presents and evaluates algorithms for MPI collective communication operations on high perf...
Two-phase I/O is a well-known strategy for implementing collective MPI-IO functions. It redistribute...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 201...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...