Achieving performance portability across parallel accelerator architectures

Kofsky, Stephen

Publication date

May 2013

Abstract

Parallel programming requires a significant amount of developer effort, and creating optimized parallel code is even more time-consuming. In the end, tuned parallel codes typically only perform well for a single architecture, or even microarchitecture. This thesis focuses on SPMD code written in CUDA, noting that programs must obey a number of constraints to achieve high performance on an NVIDIA GPU. Under such constraints, source-level optimizations can improve the performance of CUDA code on Rigel, a MIMD accelerator architecture currently under development. Source-level optimizations can produce code for Rigel that runs significantly faster than naïve translations. In some cases, benchmarks run nearly four times faster, rivaling the perf...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Achieving performance portability across parallel accelerator architectures

Abstract

Extracted data

Achieving performance portability across parallel accelerator architectures

Abstract

Extracted data

Related items

Related items