Strassen’s matrix multiplication on gpus

Junjie Li
Sanjay Ranka
Sartaj Sahni

Open link

Publication date

January 2011

DOI

10.1109/icpads.2011.130

Publisher

IEEE

Abstract

We provide efficient single- and double-precision GPU (Graphics Processing Unit) implementa-tions of Strassen’s matrix multiplication algorithm as well as of Winograd’s variant of this algorithm. The single-precision implementations of these two algorithms are compared analytically using the arithmetic count, device-memory transactions, and device memory to multiprocessor data volume metrics. Our analysis indicates that, for 16384 × 16384 matrices, our single-precision implementation of Strassen’s algorithm limited to four levels of recursion reduces the number of arithmetics by 41.3%, the number of transactions by 33.7%, and the volume by 29.2 % relative to the best known GPU implementation of the classical n3 matrix multiplication algorit...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Strassen’s matrix multiplication on gpus

Abstract

Extracted data

Strassen’s matrix multiplication on gpus

Abstract

Extracted data

Related items

Related items