Despite dramatic improvements in GPU and interconnect architectures, inter-GPU communication remains the most significant architectural bottleneck in multi-GPU systems. With hundreds of thousands of independent concurrently executing threads, maximizing interconnect utilization without degrading computational efficiency when strong-scaling HPC workloads is an open problem. In this dissertation, I will explore fine-grained peer-to-peer stores as the communication paradigm for improved multi-GPU strong scaling and propose three solutions to overcome the limitations of existing GPU and interconnect architectures to benefit from such transfers. First, I will detail PROACT, a joint compile and runtime system that transparently fine-tunes inter-G...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
<p>The continued growth of the computational capability of throughput processors has made throughput...
GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application ...
General-purpose computing on GPUs has become more accessible due to features such as shared virtual ...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide highe...
Modern GPUs are powerful high-core-count processors, which are no longer used solely for graphics ap...
Over the past years, GPUs became ubiquitous in HPC installations around the world. Today, they provi...
As GPUs evolved into popular computing platforms in the cloud, GPU virtualization has become a highl...
2018-02-23Graphics Processing Units (GPUs) are designed primarily to execute multimedia, and game re...
Graphics Processing Unit (GPU) vendors have been scaling single-GPU architectures to satisfy the eve...
High compute-density with massive thread-level parallelism of Graphics Processing Units (GPUs) is be...
Coupling commodity CPUs and modern GPUs give you heterogeneous systems that are cheap, high-performa...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
The Graphics Processing Unit (GPU) has become a mainstream computing platform for a wide range of ap...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
<p>The continued growth of the computational capability of throughput processors has made throughput...
GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application ...
General-purpose computing on GPUs has become more accessible due to features such as shared virtual ...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide highe...
Modern GPUs are powerful high-core-count processors, which are no longer used solely for graphics ap...
Over the past years, GPUs became ubiquitous in HPC installations around the world. Today, they provi...
As GPUs evolved into popular computing platforms in the cloud, GPU virtualization has become a highl...
2018-02-23Graphics Processing Units (GPUs) are designed primarily to execute multimedia, and game re...
Graphics Processing Unit (GPU) vendors have been scaling single-GPU architectures to satisfy the eve...
High compute-density with massive thread-level parallelism of Graphics Processing Units (GPUs) is be...
Coupling commodity CPUs and modern GPUs give you heterogeneous systems that are cheap, high-performa...
In this dissertation, we explore multiple designs for a Distributed Transactional Memory framework f...
The Graphics Processing Unit (GPU) has become a mainstream computing platform for a wide range of ap...
Summarization: Every HPC system consists of numerous processing nodes interconnect using a number of...
<p>The continued growth of the computational capability of throughput processors has made throughput...
GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application ...