SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems

Manojkumar Krishnan
Jarek Nieplocha

Publication date

January 2004

Abstract

This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon’s algorithm. It is suitable for clusters and scalable shared memory systems. The current approach differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. The experimental results on clusters (IBM SP, Linux-Myrinet) and shared memory systems (SGI Altix, Cray X1) demonstrate consistent performance advantages over pdgemm from the ScaLAPACK/PBBLAS suite, the leading implementation of the parallel matrix multiplication algorithms used today. In the best case on the SGI ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems

Abstract

Extracted data

SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems

Abstract

Extracted data

Related items

Related items