Performance of a parallel matrix multiplication routine on Intel iPSC/860

Gutheil, Inge
Krotz-Vogel, Werner

Open PDF

Open link

Publication date

January 1994

DOI

10.1016/0167-8191(94)90012-4

Publisher

Elsevier BV

ISSN

0167-8191

Citation count (estimate)

Abstract

The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DGEMM of BLAS3 was tested for different numbers of nodes on a 32-node iPSC/860. The routine was then tunned for maximum performance on this particular computer system. Small changes in the original code lead to substantially higher performance and in all tested configurations there is a critical matrix size n≈50·np, the number of processor, above which Intel's non-blocking isend is more efficient than the blocking csend. This shows that special tuning for a single machine pays off for large matrices

Extracted data

We use cookies to provide a better user experience.

Data Protection

Performance of a parallel matrix multiplication routine on Intel iPSC/860

Abstract

Extracted data

Performance of a parallel matrix multiplication routine on Intel iPSC/860

Abstract

Extracted data

Related items

Related items