Abstract—Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communi-cation where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques to significantly reduce the cost of intranode communication involving one or more GPUs. Experiment results show an up to 2x increase in bandwidth, resulting in an average of 4.3 % improvement to the total execution time of a halo exchange benchmark. I
Network communication on GPU-based systems is a significant roadblock for many applications with sma...
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in ...
Abstract — Modern processors have multiple cores on a chip to overcome power consumption and heat di...
Abstract—Accelerator awareness has become a pressing issue in data movement models, such as MPI, bec...
Modern multi-core clusters are increasingly using GPUs to achieve higher performance and power effic...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
International audienceHeterogeneous supercomputers are now considered the most valuable solution to ...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any genera...
Abstract—We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an ...
Today, GPUs and other parallel accelerators are widely used in high performance computing, due to th...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high com...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrate...
GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application ...
Network communication on GPU-based systems is a significant roadblock for many applications with sma...
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in ...
Abstract — Modern processors have multiple cores on a chip to overcome power consumption and heat di...
Abstract—Accelerator awareness has become a pressing issue in data movement models, such as MPI, bec...
Modern multi-core clusters are increasingly using GPUs to achieve higher performance and power effic...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
International audienceHeterogeneous supercomputers are now considered the most valuable solution to ...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any genera...
Abstract—We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an ...
Today, GPUs and other parallel accelerators are widely used in high performance computing, due to th...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high com...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrate...
GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application ...
Network communication on GPU-based systems is a significant roadblock for many applications with sma...
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in ...
Abstract — Modern processors have multiple cores on a chip to overcome power consumption and heat di...