In recent years, the use of accelerators in conjunction with CPUs, known as heterogeneous computing, hasbrou ght about significant performance increases for scientifi c applications. One of the best examples ofthis is lattice quantum chromodynamics (Q CD), a stencil operation based simulation. These simulationshave a large memory footprint necessitating the use of many graphics processing units (GPUs) in parallel.This requires the use of a heterogeneous cluster with one or more GPUs per node. In order to obtainoptimal performance, it is necessary to determine an efficient commu nication pattern bet ween G PUs onthe same node and between nodes. In this paper, we present a performance model based method for min-imizing the communication time of ...
In this chapter we describe the architecture of a torus interconnect and its implementation on FPGAs...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
A recent trend in modern high-performance computing environments is the introduction of powerful, en...
PoznańA model two-processor heterogeneous computer consisting of one scalar and one vector processor...
Today's heterogeneous architectures bring together multiple general purpose CPUs, domain specific GP...
International audienceHardware accelerators are classic scientific coprocessors in HPC machines. How...
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite...
Accelerated computing has become pervasive for increasing the computational power and energy efficie...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
| openaire: EC/H2020/818665/EU//UniSDyn Funding Information: This work was supported by the Academy ...
The molecular dynamics method, used by scientists across the fields of physics, materials science, a...
In this dissertation, a heterogeneous GPUs system means the system consists of a variety of differen...
International audienceCommunication latency problems are universal and have become a major performan...
In this chapter we describe the architecture of a torus interconnect and its implementation on FPGAs...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...
Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types ...
A recent trend in modern high-performance computing environments is the introduction of powerful, en...
PoznańA model two-processor heterogeneous computer consisting of one scalar and one vector processor...
Today's heterogeneous architectures bring together multiple general purpose CPUs, domain specific GP...
International audienceHardware accelerators are classic scientific coprocessors in HPC machines. How...
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite...
Accelerated computing has become pervasive for increasing the computational power and energy efficie...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
| openaire: EC/H2020/818665/EU//UniSDyn Funding Information: This work was supported by the Academy ...
The molecular dynamics method, used by scientists across the fields of physics, materials science, a...
In this dissertation, a heterogeneous GPUs system means the system consists of a variety of differen...
International audienceCommunication latency problems are universal and have become a major performan...
In this chapter we describe the architecture of a torus interconnect and its implementation on FPGAs...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
We propose and evaluate a novel strategy for tuning the performance of a class of stencil computatio...