Abstract—Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement standards, thus providing applications with no direct mechanism to perform end-to-end data movement. We introduce MPI-ACC, an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC...
Abstract. Over the last decade, Message Passing Interface (MPI) has become a very successful paralle...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfor...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrate...
International audienceHeterogeneous supercomputers are now considered the most valuable solution to ...
Abstract—Accelerator awareness has become a pressing issue in data movement models, such as MPI, bec...
Abstract—Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) ...
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfor...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
The introduction and rise of General Purpose Graphics Computing has significantly impacted parallel ...
A number of efforts have been undertaken to integrate GPU functionality into an HPC environment, wit...
High performance scientific applications are frequently multiphysics codes composed from single-phys...
The low-power Adapteva Epiphany RISC array processor offers high computational energy-efficiency and...
In order to reach exascale computing capability, accelerators have become a crucial part in developi...
Abstract. Over the last decade, Message Passing Interface (MPI) has become a very successful paralle...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfor...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrate...
International audienceHeterogeneous supercomputers are now considered the most valuable solution to ...
Abstract—Accelerator awareness has become a pressing issue in data movement models, such as MPI, bec...
Abstract—Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) ...
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfor...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
The introduction and rise of General Purpose Graphics Computing has significantly impacted parallel ...
A number of efforts have been undertaken to integrate GPU functionality into an HPC environment, wit...
High performance scientific applications are frequently multiphysics codes composed from single-phys...
The low-power Adapteva Epiphany RISC array processor offers high computational energy-efficiency and...
In order to reach exascale computing capability, accelerators have become a crucial part in developi...
Abstract. Over the last decade, Message Passing Interface (MPI) has become a very successful paralle...
Scientific developers face challenges adapting software to leverage increasingly heterogeneous archi...
Modern HPC platforms are using multiple CPU, GPUs and high-performance interconnects per node. Unfor...