International audienceEfficiently programming shared-memory machines is a difficult challenge because mapping application threads onto the memory hierarchy has a strong impact on the performance. However, optimizing such thread placement is difficult: architectures become increasingly complex and application behavior changes with implementations and input parameters, e.g problem size and number of threads. In this work, we propose a fully automatic, abstracted and portable affinity module. It produces and implements an optimized affinity strategy that combines knowledge about application characteristics and the platform topology. Implemented in the back-end of our runtime system (ORWL), our approach was used to enhance the performance and t...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
Many-core processors are becoming mainstream computing platforms nowadays. How to map the applicatio...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
Abstract. The complexity of an efficient thread management steadily rises with the number of process...
Abstract. Thread affinity has appeared as an important technique to improve the overall program perf...
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-bo...
International audienceWith the introduction of multi-core processors, thread affinity has quickly ap...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
doi: 10.1007/s10766-013-0253-xInternational audienceMemory affinity has become a key element to achi...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceThe ordered read-write lock model (ORWL) is a modern framework that proposes h...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
Many-core processors are becoming mainstream computing platforms nowadays. How to map the applicatio...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...
International audienceEfficiently programming shared-memory machines is a difficult challenge becaus...
Abstract. The complexity of an efficient thread management steadily rises with the number of process...
Abstract. Thread affinity has appeared as an important technique to improve the overall program perf...
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-bo...
International audienceWith the introduction of multi-core processors, thread affinity has quickly ap...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
doi: 10.1007/s10766-013-0253-xInternational audienceMemory affinity has become a key element to achi...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
International audienceCurrent and future architectures rely on thread-level parallelism to sustain p...
International audienceThe ordered read-write lock model (ORWL) is a modern framework that proposes h...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
Petascale machines with hundreds of thousands of cores are being built. These machines have varying ...
Many-core processors are becoming mainstream computing platforms nowadays. How to map the applicatio...
We present a completely new kind of approach for mapping the computation of an application to MP-SOC...