Improving Multi-GPU Strong Scaling Through Optimization of Fine-Grained Transfers

Muthukrishnan, Harini

Open PDF

Open link

Publication date

January 2022

DOI

10.7302/4540

Abstract

Despite dramatic improvements in GPU and interconnect architectures, inter-GPU communication remains the most significant architectural bottleneck in multi-GPU systems. With hundreds of thousands of independent concurrently executing threads, maximizing interconnect utilization without degrading computational efficiency when strong-scaling HPC workloads is an open problem. In this dissertation, I will explore fine-grained peer-to-peer stores as the communication paradigm for improved multi-GPU strong scaling and propose three solutions to overcome the limitations of existing GPU and interconnect architectures to benefit from such transfers. First, I will detail PROACT, a joint compile and runtime system that transparently fine-tunes inter-G...