This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization and processor grouping. Two work generation schemes and one join mechanism are designed, implemented, evaluated and compared with the ones used in the IRIX MP library, an efficient implementation which supports a single level of parallelism. Supporting multiple levels of parallelism is a current research goal, both in shared and distributed memory machines. Our proposals include a first work generation scheme (GWD, or global work descriptor) which supports multiple levels of parallelism, but not processor grouping. Th...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-bo...
This paper presents some techniques for efficient thread forking and joining in parallel execution e...
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained...
International audienceTo amortize the cost of MPI collective operations, nonblocking collectives hav...
Parallel hardware1 has become a ubiquitous component in computer processing technology. Uniprocessor...
The use of multithreading can enhance the performance of a software system. However, its excessive u...
International audienceNon-blocking collectives have been proposed so as to allow communications to b...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
This paper presents a set of proposals for the OpenMP shared--memory programming model oriented tow...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-bo...
This paper presents some techniques for efficient thread forking and joining in parallel execution e...
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained...
International audienceTo amortize the cost of MPI collective operations, nonblocking collectives hav...
Parallel hardware1 has become a ubiquitous component in computer processing technology. Uniprocessor...
The use of multithreading can enhance the performance of a software system. However, its excessive u...
International audienceNon-blocking collectives have been proposed so as to allow communications to b...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
International audienceTo amortize the cost of MPI collective operations, non-blocking collectives ha...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
This paper presents a set of proposals for the OpenMP shared--memory programming model oriented tow...
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
International audienceThe parallelism in shared-memory systems has increased significantly with the ...
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-bo...