International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides the application programmer with a more flexible way of synchronizing tasks. Using such an approach allows both the compiler and the runtime system to know exactly which data are read or written by a given task, and how these data will be used through the program lifetime. Data placement and task scheduling strategies have a significant impact on performances when considering NUMA architectures. While numerous papers focus on these topics, none of them has made extensive use of the information available through dependencies. One can use this information to modify the behavior of the application at several levels : during initialization to contr...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
International audienceAnticipating the behavior of applications, studying, and designing algorithms ...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
International audienceOpenMP 4.0 introduced dependent tasks, which give the programmer a way to expr...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
Modern architectures have multiple processors, each of which contains multiple cores, connected to d...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
Within the last decade, microprocessor development reached a point at which higher clock rates and m...
International audienceApproaching the theoretical performance of hierarchical multicore machines req...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
International audienceAnticipating the behavior of applications, studying, and designing algorithms ...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
International audienceThe recent addition of data dependencies to the OpenMP 4.0 standard provides t...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
International audienceOpenMP 4.0 introduced dependent tasks, which give the programmer a way to expr...
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processo...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
Modern architectures have multiple processors, each of which contains multiple cores, connected to d...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
Within the last decade, microprocessor development reached a point at which higher clock rates and m...
International audienceApproaching the theoretical performance of hierarchical multicore machines req...
International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achievin...
The complexity of shared memory systems is becoming more relevant as the number of memory domains in...
International audienceWe present a joint scheduling and memory allocation algorithm for efficient ex...
International audienceAnticipating the behavior of applications, studying, and designing algorithms ...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...