Using Fermi architecture knowledge to speed up CUDA and OpenCL programs

Yuri Torres
Arturo Gonzalez-escribano
Diego R. Llanos

Publication date

January 2012

DOI

10.1109/ispa.2012.92

Abstract

Abstract—The NVIDIA graphics processing units (GPUs) are playing an important role as general purpose programming devices. The implementation of parallel codes to exploit the GPU hardware architecture is a task for experienced programmers. The threadblock size and shape choice is one of the most important user decisions when a parallel problem is coded. The threadblock configuration has a significant impact on the global performance of the program. While in CUDA parallel program-ming model it is always necessary to specify the threadblock size and shape, the OpenCL standard also offers an automatic mechanism to take this delicate decision. In this paper we present a study of these criteria for Fermi architecture, introducing a general appro...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Using Fermi architecture knowledge to speed up CUDA and OpenCL programs

Abstract

Extracted data

Using Fermi architecture knowledge to speed up CUDA and OpenCL programs

Abstract

Extracted data

Related items

Related items