Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set of optimization techniques to efficiently manage multi-accelerator systems. AMA maximizes the overlap of computation and communication in a blocking-free way. Then, we can use such spare time to do other work while waiting for device operations. Implemented on top of a task-based framework, the experimental evaluation of AMA on a quad-GPU node shows that we reach the performance of a hand-tuned native CUDA code, with the advantage of fully hiding the device management. In addition, we obtain up to more than 2x performance...
We present the design and first performance and usability evaluation of GeMTC, a novel execution mod...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
This work studies programmability enhancing abstractions in the context of accelerators and heteroge...
Computational science has benefited in the last years from emerging accelerators that increase the p...
AbstractComputational science has benefited in the last years from emerging accelerators that increa...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance compu...
Hardware accelerators have become permanent features in the post-Dennard computing landscape, displa...
The use of specialized accelerators is among the most promising paths to better energy efficiency fo...
In the last several years, there has been a growing interest in utilizing accelerator technologies w...
Modern high-performance computers engage a variety of computing devices. Underutilization and oversu...
Handheld devices are expected to start using fine-grained ASIC accelerators to meet energy-efficienc...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
Heterogeneous System-on-Chip (SoC) architectures combine general-purpose processors with many accele...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
We present the design and first performance and usability evaluation of GeMTC, a novel execution mod...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
This work studies programmability enhancing abstractions in the context of accelerators and heteroge...
Computational science has benefited in the last years from emerging accelerators that increase the p...
AbstractComputational science has benefited in the last years from emerging accelerators that increa...
During the past decade, accelerators, such as NVIDIA CUDA GPUs and Intel Xeon Phis, have seen an inc...
Accelerators, such as GPUs and Intel Xeon Phis, have become the workhorses of high-performance compu...
Hardware accelerators have become permanent features in the post-Dennard computing landscape, displa...
The use of specialized accelerators is among the most promising paths to better energy efficiency fo...
In the last several years, there has been a growing interest in utilizing accelerator technologies w...
Modern high-performance computers engage a variety of computing devices. Underutilization and oversu...
Handheld devices are expected to start using fine-grained ASIC accelerators to meet energy-efficienc...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
Heterogeneous System-on-Chip (SoC) architectures combine general-purpose processors with many accele...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
We present the design and first performance and usability evaluation of GeMTC, a novel execution mod...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
This work studies programmability enhancing abstractions in the context of accelerators and heteroge...