Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or size of the operands. For matrix-matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in-place is described in detail. We show that a level-3 DBLAS yields better performance than a level-2 DBLAS. On the Connection Machine system CM-200, blocking yields a performance improvement by a factor of up to three over level-2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements ...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
AbstractWe present an efficient parallel implementation of matrix-vector multiplication on a binary ...
In this paper is investigated a possible optimization of some linear algebra problems which can be s...
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place ...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
During the last half-decade, a number of research efforts have centered around developing software f...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Using super-resolution techniques to estimate the direction that a signal arrived at a radio receive...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
AbstractWe present an efficient parallel implementation of matrix-vector multiplication on a binary ...
In this paper is investigated a possible optimization of some linear algebra problems which can be s...
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place ...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
We describe a subset of the level-1, level-2, and level-3 BLAS implemented for each node of the Conn...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
During the last half-decade, a number of research efforts have centered around developing software f...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Using super-resolution techniques to estimate the direction that a signal arrived at a radio receive...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
A number of parallel formulations of dense matrix multiplication algorithm have been developed. For ...
AbstractWe present an efficient parallel implementation of matrix-vector multiplication on a binary ...
In this paper is investigated a possible optimization of some linear algebra problems which can be s...