Loosely-coupled MIMD architectures do not suffer from memory contention; hence large numbers of processors may be utilized. The main problem, however, is how to partition data and programs in order to exploit the available parallelism. In this paper we show that efficient schemes for automatic data/program partitioning and synchronization may be employed if single assignment is used. Using simulations of program loops common to scientific computations (the Livermore Loops), we demonstrate that only a small fraction of data accesses are remote and thus the degradation in network performance due to multiprocessing is minimal
In this paper we present a unified approach for compiling programs for Distributed-Memory Multiproce...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
Abstract. The message-passing paradigm is now widely accepted and used mainly for inter-process comm...
170 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1986.Since the mid 1970's, vector ...
Modern, high performance reconfigurable architectures integrate on-chip, distributed block RAM modul...
minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts...
grantor: University of TorontoScalable shared memory multiprocessors are becoming increasi...
In order to reduce remote memory accesses on CC-NUMA multiprocessors, we present an interprocedural ...
. We present compiler optimization techniques for explicitly parallel programs that communicate thro...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
The general problem studied is that of segmenting or partitioning programs for distribution across a...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published...
In order to achieve viable parallel processing three basic criteria must be met: (1) the system must...
This paper addresses the problem of partitioning data for distributed memory machines (multicomputer...
In this paper we present a unified approach for compiling programs for Distributed-Memory Multiproce...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
Abstract. The message-passing paradigm is now widely accepted and used mainly for inter-process comm...
170 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1986.Since the mid 1970's, vector ...
Modern, high performance reconfigurable architectures integrate on-chip, distributed block RAM modul...
minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts...
grantor: University of TorontoScalable shared memory multiprocessors are becoming increasi...
In order to reduce remote memory accesses on CC-NUMA multiprocessors, we present an interprocedural ...
. We present compiler optimization techniques for explicitly parallel programs that communicate thro...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
The general problem studied is that of segmenting or partitioning programs for distribution across a...
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published...
In order to achieve viable parallel processing three basic criteria must be met: (1) the system must...
This paper addresses the problem of partitioning data for distributed memory machines (multicomputer...
In this paper we present a unified approach for compiling programs for Distributed-Memory Multiproce...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...