Perf-Sat: Runtime Detection of Performance Saturation for GPGPU Applications

Mihir Awatramani
Joseph Zambreno
Diane Rover

Publication date

April 2020

Abstract

Abstract-Graphic Processing Units (GPUs) achieve latency tolerance by exploiting massive amounts of thread level parallelism. Each core executes several hundred to a few thousand simultaneously active threads. The work scheduler tries to maximize the number of active threads on each core by launching threads until at least one of the required resources is completely utilized. The rationale is, more threads would give the thread scheduler more opportunities to hide memory latency and thus would result in better performance. In this work, we show that launching the maximum number of threads is not always necessary to achieve the best performance. Applications have an optimal thread count value at which the performance saturates. Increasing th...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Perf-Sat: Runtime Detection of Performance Saturation for GPGPU Applications

Abstract

Extracted data

Perf-Sat: Runtime Detection of Performance Saturation for GPGPU Applications

Abstract

Extracted data

Related items

Related items