The article of record as published may be found at https://doi.org/10.1007/BF02577870In this paper we present Safe Self-Scheduling (SSS), a new scheduling scheme that schedules parallel loops with variable length iteration execution times not known at compile time. The scheme assumes a shared memory space. SSS combines static scheduling with dynamic scheduling and draws favorable advantages from each. First, it reduces the dynamic scheduling overhead by statistically scheduling a major portion of loop iterations. Second, the workload is balanced with simple and efficient self-scheduling scheme by applying a new measure, the smallest critical chore size. Experimental results comparing SSS with other scheduling schemes indicate that SSS surpa...
The recent shift to multi-core computing has meant more programmers are required to write parallel p...
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors...
Abstract—Using runtime information of load distributions and processor affinity, we propose an adapt...
Part 4: Applications of Parallel and Distributed ComputingInternational audienceOrdinary programs co...
In this paper, we present a new practical processor self-scheduling scheme, Trapezoid Self-Schedulin...
Link to published version: http://ieeexplore.ieee.org/iel2/390/6075/00236705.pdf?tp=&arnumber=236705...
AbstractWe here present ATLS, a self scheduling scheme designed for execution of parallel loops in d...
Efficiently scheduling parallel tasks on to the processors of a shared-memory multiprocessor is crit...
The 1st International Conference on Algorithms and Architectures for Parallel, Brisbane, Australia, ...
Funder: FP7 People: Marie‐Curie Actions; Id: http://dx.doi.org/10.13039/100011264; Grant(s): 327744S...
The limitation of vector supercomputing and of device speed has led to the development of multiproce...
Load imbalance is a serious impediment to achieving good performance in parallel processing. Global ...
In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by ...
The efficient implementation of parallel loops on distributed--memory multicomputers is a hot topic ...
The class of problems that can be effectively compiled by parallelizing compilers is discussed. This...
The recent shift to multi-core computing has meant more programmers are required to write parallel p...
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors...
Abstract—Using runtime information of load distributions and processor affinity, we propose an adapt...
Part 4: Applications of Parallel and Distributed ComputingInternational audienceOrdinary programs co...
In this paper, we present a new practical processor self-scheduling scheme, Trapezoid Self-Schedulin...
Link to published version: http://ieeexplore.ieee.org/iel2/390/6075/00236705.pdf?tp=&arnumber=236705...
AbstractWe here present ATLS, a self scheduling scheme designed for execution of parallel loops in d...
Efficiently scheduling parallel tasks on to the processors of a shared-memory multiprocessor is crit...
The 1st International Conference on Algorithms and Architectures for Parallel, Brisbane, Australia, ...
Funder: FP7 People: Marie‐Curie Actions; Id: http://dx.doi.org/10.13039/100011264; Grant(s): 327744S...
The limitation of vector supercomputing and of device speed has led to the development of multiproce...
Load imbalance is a serious impediment to achieving good performance in parallel processing. Global ...
In light of continued advances in loop scheduling, this work revisits the OpenMP loop scheduling by ...
The efficient implementation of parallel loops on distributed--memory multicomputers is a hot topic ...
The class of problems that can be effectively compiled by parallelizing compilers is discussed. This...
The recent shift to multi-core computing has meant more programmers are required to write parallel p...
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors...
Abstract—Using runtime information of load distributions and processor affinity, we propose an adapt...