This report considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers. The functionality and execution time of DDI programs are independent of initial data distributions. First, modular mappings, which can be used to derive many equally optimal ant1 functionally equivalent programs, are briefly reviewed. Relations between modular mappings and input data distributions are then established. These relations are the basis of a systematic approach to the derivation of DDI programs which is illustrated for matrix-matrix multiplication(c = a x b). Conditions on data distributions that correspond to an optimal modular ...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
This paper presents an architecture-independent method for performing BMMC permutations on multiproc...
This paper describes techniques for translating out-of-core programs written in a data parallel lang...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
Data-partition and migration for efficient communication in distributed memory architectures are cri...
(eng) Implementing linear algebra kernels on distributed memory parallel computers raises the proble...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
Dynamic redistribution of arrays is required very often in programs on distributed memory parallel c...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
This paper addresses the problem of partitioning data for distributed memory machines (multicomputer...
An important problem facing parallelizing compilers for distributed memory mimd machines is that of ...
Appropriate data distribution has been found to be critical for obtaining good performance on Distri...
Massively Parallel Processor systems provide the required computational power to solve most large sc...
Distributed-memory multicomputers, such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 /SP-2...
In this document, we describe two strategies of distribution of computations that can be used to imp...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
This paper presents an architecture-independent method for performing BMMC permutations on multiproc...
This paper describes techniques for translating out-of-core programs written in a data parallel lang...
Many optimizations (of programs with loops) used in parallelizing compilers and systolic array desig...
Data-partition and migration for efficient communication in distributed memory architectures are cri...
(eng) Implementing linear algebra kernels on distributed memory parallel computers raises the proble...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
Dynamic redistribution of arrays is required very often in programs on distributed memory parallel c...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Indep...
This paper addresses the problem of partitioning data for distributed memory machines (multicomputer...
An important problem facing parallelizing compilers for distributed memory mimd machines is that of ...
Appropriate data distribution has been found to be critical for obtaining good performance on Distri...
Massively Parallel Processor systems provide the required computational power to solve most large sc...
Distributed-memory multicomputers, such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 /SP-2...
In this document, we describe two strategies of distribution of computations that can be used to imp...
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processor...
This paper presents an architecture-independent method for performing BMMC permutations on multiproc...
This paper describes techniques for translating out-of-core programs written in a data parallel lang...