We provide performance models for several primitive operations on data structures distributed over memory units interconnected by a Boolean cube network. In particular, we model single source, and multiple source concurrent broadcasting or reduction, concurrent gather and scatter operations, shifts along several axes of multi-dimensional arrays, and emulation of butterfly networks. We also show how the processor configuration, data aggregation, and the encoding of the address space affect the performance for two important basic computations: the multiplication of arbitrarily shaped matrices, and the Fast Fourier Transform. We also give an example of the performance behavior for local matrix operations for a processor with a single path to l...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
This chapter describes the Decomposable Bulk Synchrounous Parallel (D-BSP) model of computation, as ...
This paper examines the performance of distributed-shared-memory systems based on the Simultaneous O...
General analytic models for the performance analysis of various unique and redundant path circuit-sw...
95 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1988.Multiprocessor systems offer t...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Detailed algorithms for all-to-all broadcast and reduction are given for arrays mapped by binary or ...
We discuss communication algorithms relevant for neural network modeling on distributed memory concu...
We discuss some techniques for preserving locality of reference in index spaces when mapped to memor...
Evaluating the performance of large distributed applications is an important and non-trivial task. W...
Abstmct-VLSI communication networks are wire-limited. The cost of a network is not a function of the...
We study, using analytic models and simulation, the performance of the multifrontal methods on distr...
Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneou...
The design and implementation of distributed systems is helped by the availability of design pattern...
[[abstract]]The authors study the performance of multiprocessor systems employing multiple buses as ...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
This chapter describes the Decomposable Bulk Synchrounous Parallel (D-BSP) model of computation, as ...
This paper examines the performance of distributed-shared-memory systems based on the Simultaneous O...
General analytic models for the performance analysis of various unique and redundant path circuit-sw...
95 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1988.Multiprocessor systems offer t...
Parallel computing on networks of workstations are intensively used in some application areas such a...
Detailed algorithms for all-to-all broadcast and reduction are given for arrays mapped by binary or ...
We discuss communication algorithms relevant for neural network modeling on distributed memory concu...
We discuss some techniques for preserving locality of reference in index spaces when mapped to memor...
Evaluating the performance of large distributed applications is an important and non-trivial task. W...
Abstmct-VLSI communication networks are wire-limited. The cost of a network is not a function of the...
We study, using analytic models and simulation, the performance of the multifrontal methods on distr...
Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneou...
The design and implementation of distributed systems is helped by the availability of design pattern...
[[abstract]]The authors study the performance of multiprocessor systems employing multiple buses as ...
In this paper we present an efficient dense matrix multi-plication algorithm for distributed memory ...
This chapter describes the Decomposable Bulk Synchrounous Parallel (D-BSP) model of computation, as ...
This paper examines the performance of distributed-shared-memory systems based on the Simultaneous O...