This paper is concerned with the consequences for matrix computations of having a rather large number of general purpose processors, say ten or twenty thousand, connected in a network in such a way that a processor can communicate only with its immediate neighbors. Certain communication tasks associated with most matrix algorithms are defined and formulas developed for the time required to perform them under several communication regimes. The results are compared with the times for a nominal $n^3$ floating point operations. The results suggest that it is possible to use a large number of processors to solve matrix problems at a relatively fine granularity, provided fine grain communication is available. Additional figures are available a...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
This paper describes two models of the cost of data movement in parallel numerical algorithms. One m...
AbstractWe study the effect of limited communication throughput on parallel computation in a setting...
This paper is concerned with the consequences for matrix computations of having a rather large numbe...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
In this paper we propose a new approach to the study of the communication requirements of distribute...
The use of an appropriate methodology for calculating the communication cost, time complexity and pe...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
This paper initiates the study of communication complexity when the processors have limited work spa...
Hypercube algorithms are developed for a variety of communication-intensive tasks such as transposin...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
Sparse matrix operations dominate the cost of many scientific applications. In parallel, the perform...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
This paper describes two models of the cost of data movement in parallel numerical algorithms. One m...
AbstractWe study the effect of limited communication throughput on parallel computation in a setting...
This paper is concerned with the consequences for matrix computations of having a rather large numbe...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
In this paper we propose a new approach to the study of the communication requirements of distribute...
The use of an appropriate methodology for calculating the communication cost, time complexity and pe...
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
This paper initiates the study of communication complexity when the processors have limited work spa...
Hypercube algorithms are developed for a variety of communication-intensive tasks such as transposin...
Parallel matrix multiplication is one of the most studied fun-damental problems in distributed and h...
Sparse matrix operations dominate the cost of many scientific applications. In parallel, the perform...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
This paper describes two models of the cost of data movement in parallel numerical algorithms. One m...
AbstractWe study the effect of limited communication throughput on parallel computation in a setting...