Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Lai, Junjie
Seznec, André

Publication date

April 2012

Publisher

HAL CCSD

Abstract

In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 Fermi Card. We also described how to use native assembly language directly in the CUDA runtime source code

Extracted data

We use cookies to provide a better user experience.

Data Protection

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Abstract

Extracted data

Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory

Abstract

Extracted data

Related items

Related items