Abstract—Accelerator awareness has become a pressing issue in data movement models, such as MPI, because of the rapid deployment of systems that utilize accelerators. In our previous work, we developed techniques to enhance MPI with accelerator awareness, thus allowing applications to easily and efficiently communicate data between accelerator memories. In this paper, we extend this work with techniques to perform efficient data movement between accelerators within the same node using a DMA-assisted, peer-to-peer intranode communication technique that was recently introduced for NVIDIA GPUs. We present a detailed design of our new approach to intranode communication and evaluate its improvement to communication and application performance u...
GPUs are widely used in high performance computing, due to their high computational power and high p...
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in ...
Due to their massive parallelism and high performance per watt GPUs gain high popularity in high per...
Abstract—Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) ...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
Modern multi-core clusters are increasingly using GPUs to achieve higher performance and power effic...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
Today, GPUs and other parallel accelerators are widely used in high performance computing, due to th...
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any genera...
This paper explores the challenges in implementing a message passing interface usable on systems wit...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
Network communication on GPU-based systems is a significant roadblock for many applications with sma...
Abstract—We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an ...
GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application ...
Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high com...
GPUs are widely used in high performance computing, due to their high computational power and high p...
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in ...
Due to their massive parallelism and high performance per watt GPUs gain high popularity in high per...
Abstract—Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) ...
Abstract—Data movement in high-performance computing systems accelerated by graphics processing unit...
Modern multi-core clusters are increasingly using GPUs to achieve higher performance and power effic...
Current trends in computing and system architecture point towards a need for accelerators such as GP...
Today, GPUs and other parallel accelerators are widely used in high performance computing, due to th...
After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any genera...
This paper explores the challenges in implementing a message passing interface usable on systems wit...
A steady increase in accelerator performance has driven demand for faster interconnects to avert the...
Network communication on GPU-based systems is a significant roadblock for many applications with sma...
Abstract—We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an ...
GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application ...
Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high com...
GPUs are widely used in high performance computing, due to their high computational power and high p...
Due to their massive parallelism and high performance per Watt, GPUs have gained high popularity in ...
Due to their massive parallelism and high performance per watt GPUs gain high popularity in high per...