Abstract—Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high perfor-mance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions targeted at the GPU may decrease performance on the CPU. This problem is typically ameliorated by statically scheduling a fixed amount of work for agglomeration. However, determining the ideal amount of work to compose requires experimentation because it varies between architectures and problem co...
A personal computer can be considered as a one-node heterogeneous cluster that simultaneously proces...
In order to satisfy timing constraints, modern real-time applications require massively parallel acc...
Summary form only given. Scheduling policies are proposed for parallelizing data intensive particle ...
International audienceComputing platforms are now extremely complex providing an increasing number o...
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterog...
GPUs (Graphics Processing Units) have become one of the main co-processors that contributed to deskt...
GPUs (Graphics Processing Units) have become one of the main co-processors that contributed to deskt...
In this paper, we consider task-based dense linear algebra applications on a single heterogeneous no...
Modern high-performance computers engage a variety of computing devices. Underutilization and oversu...
The ever increasing complexity of scientific applications has led to utilization of new HPC paradigm...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
International audienceThe use of accelerators such as GPUs has become mainstream to achieve high per...
Recent accelerators such as GPUs achieve better cost-performance and watt-performance ratio, while t...
GPU-based heterogeneous clusters continue to draw atten-tion from vendors and HPC users due to their...
International audienceAlthough the hardware has dramatically changed in the last few years, nodes of...
A personal computer can be considered as a one-node heterogeneous cluster that simultaneously proces...
In order to satisfy timing constraints, modern real-time applications require massively parallel acc...
Summary form only given. Scheduling policies are proposed for parallelizing data intensive particle ...
International audienceComputing platforms are now extremely complex providing an increasing number o...
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterog...
GPUs (Graphics Processing Units) have become one of the main co-processors that contributed to deskt...
GPUs (Graphics Processing Units) have become one of the main co-processors that contributed to deskt...
In this paper, we consider task-based dense linear algebra applications on a single heterogeneous no...
Modern high-performance computers engage a variety of computing devices. Underutilization and oversu...
The ever increasing complexity of scientific applications has led to utilization of new HPC paradigm...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
International audienceThe use of accelerators such as GPUs has become mainstream to achieve high per...
Recent accelerators such as GPUs achieve better cost-performance and watt-performance ratio, while t...
GPU-based heterogeneous clusters continue to draw atten-tion from vendors and HPC users due to their...
International audienceAlthough the hardware has dramatically changed in the last few years, nodes of...
A personal computer can be considered as a one-node heterogeneous cluster that simultaneously proces...
In order to satisfy timing constraints, modern real-time applications require massively parallel acc...
Summary form only given. Scheduling policies are proposed for parallelizing data intensive particle ...