Current generations of NUMA node clusters feature multicore or manycore processors. Programming such architectures efficiently is a challenge because numerous hardware characteristics have to be taken into account, especially the memory hierarchy. One appealing idea to improve the performance of parallel applications is to decrease their communication costs by matching the communication pattern to the underlying hardware architecture. In this report, we detail the algorithm and techniques proposed to achieve such a result: first, we gather both the communication pattern information and the hardware details. Then we compute a relevant reordering of the various process ranks of the application. Finally, those new ranks are used to reduce the ...
Nowadays, the scientific applications are developed with more complexity and accuracy and these prec...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
International audienceThe increasing number of cores led to scalability issues in modern servers tha...
International audienceCurrent generations of NUMA node clusters feature multicore or manycore proces...
International audienceCurrent generations of NUMA node clusters feature multicore or manycore proces...
International audienceThis paper presents a method to efficiently place MPI processes on multicore m...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
The emergence of multicore processors led to an increasing complexity inside the modern servers, wit...
International audienceDue to the advent of modern hardware architectures of high-performance comput-...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
International audienceMATCHING COMMUNICATION PATTERN WITH UNDERLYING HARDWARE ARCHITECTUR
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
International audienceProgramming multicore or manycore architectures is a hard challenge particular...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...
International audienceMulticore processors have not only reintroduced Non-Uniform Memory Access (NUM...
Nowadays, the scientific applications are developed with more complexity and accuracy and these prec...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
International audienceThe increasing number of cores led to scalability issues in modern servers tha...
International audienceCurrent generations of NUMA node clusters feature multicore or manycore proces...
International audienceCurrent generations of NUMA node clusters feature multicore or manycore proces...
International audienceThis paper presents a method to efficiently place MPI processes on multicore m...
This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. ...
The emergence of multicore processors led to an increasing complexity inside the modern servers, wit...
International audienceDue to the advent of modern hardware architectures of high-performance comput-...
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
International audienceMATCHING COMMUNICATION PATTERN WITH UNDERLYING HARDWARE ARCHITECTUR
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014...
International audienceProgramming multicore or manycore architectures is a hard challenge particular...
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, c...
International audienceMulticore processors have not only reintroduced Non-Uniform Memory Access (NUM...
Nowadays, the scientific applications are developed with more complexity and accuracy and these prec...
The increasing number of cores per processor is turning multicore-based systems in pervasive. This i...
International audienceThe increasing number of cores led to scalability issues in modern servers tha...