This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization and processor grouping. Two work generation schemes and one join mechanism are designed, implemented, evaluated and compared with the ones used in the IFUX MP library, an efficient implementation which supports a single level of parallelism. Supporting multiple levels of parallelism is a current research goal, both in shared and distributed memory machines. Our proposals include a first work generation scheme (GWD, or global work descriptor) which supports multiple levels of parallelism, but not processor grouping. Th...
NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load be...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
The increasing number of cores per processor is turning manycore-based systems in pervasive. This in...
This paper presents some techniques for efficient thread forking and joining in parallel execution e...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
Abstract. Driven by the two main hardware trends increasing main memory and massively parallel multi...
Large-scale Non-Uniform Memory Access (NUMA) multiprocessors are gaining increased attention due to ...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
International audienceNon-blocking collectives have been proposed so as to allow communications to b...
International audienceTo amortize the cost of MPI collective operations, nonblocking collectives hav...
NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load be...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
The increasing number of cores per processor is turning manycore-based systems in pervasive. This in...
This paper presents some techniques for efficient thread forking and joining in parallel execution e...
Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability....
The problem of placement of threads, or virtual cores, on physical cores in a multicore system has b...
Embedded manycore architectures are often organized as fabrics of tightly-coupled shared memory clus...
Non-uniform memory access (NUMA) architectures are modern shared-memory, multi-core machines offerin...
Multicore multiprocessors use Non Uniform Memory Ar-chitecture (NUMA) to improve their scalability. ...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
Abstract. Driven by the two main hardware trends increasing main memory and massively parallel multi...
Large-scale Non-Uniform Memory Access (NUMA) multiprocessors are gaining increased attention due to ...
This paper introduces a learning-based framework for dynamic placement of threads of parallel applic...
International audienceNon-blocking collectives have been proposed so as to allow communications to b...
International audienceTo amortize the cost of MPI collective operations, nonblocking collectives hav...
NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load be...
Current multi-socket systems have complex memory hierarchies with significant Non-Uniform Memory Acc...
The increasing number of cores per processor is turning manycore-based systems in pervasive. This in...