International audienceExploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization steps. By using the BubbleSched platform as a threading backend for the GOMP OpenMP compiler, we are able to easily transpose affinities of thread teams into scheduling hints using abstractions called bubbles. We then propose a scheduling strategy suited to nested OpenMP parallelism. The resulting preliminary performance evaluat...
National audienceExploiting full computational power of hierarchical multiprocessor machines with ir...
International audienceOpenMP 4.0 introduced dependent tasks, which give the programmer a way to expr...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
International audienceApproaching the theoretical performance of hierarchical multicore machines req...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
International audienceThe now commonplace multi-core chips have introduced, by design, a deep hierar...
The current trend of constructors for scientific computation is towards an imbrication of technologi...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
International audienceExploiting full computational power of current more and more hierarchical mult...
The recent addition of task parallelism to the OpenMP shared memory API allows programmers to expres...
Scientific applications, like the ones involving numerical simulations, keep requiring more and more...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
Since the days of OpenMP 1.0 computer hardware has become more complex, typically by specializing co...
National audienceExploiting full computational power of hierarchical multiprocessor machines with ir...
International audienceOpenMP 4.0 introduced dependent tasks, which give the programmer a way to expr...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...
International audienceApproaching the theoretical performance of hierarchical multicore machines req...
International audienceExploiting the full computational power of current hierarchical multiprocessor...
International audienceThe now commonplace multi-core chips have introduced, by design, a deep hierar...
The current trend of constructors for scientific computation is towards an imbrication of technologi...
International audienceNowadays shared memory HPC platforms expose a large number of cores organized ...
International audienceExploiting full computational power of current more and more hierarchical mult...
The recent addition of task parallelism to the OpenMP shared memory API allows programmers to expres...
Scientific applications, like the ones involving numerical simulations, keep requiring more and more...
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify t...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extend...
Since the days of OpenMP 1.0 computer hardware has become more complex, typically by specializing co...
National audienceExploiting full computational power of hierarchical multiprocessor machines with ir...
International audienceOpenMP 4.0 introduced dependent tasks, which give the programmer a way to expr...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can...