Distributed-memory message-passing machines deliver scalable perfor-mance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. Recently, scalable machines based on logi-cally shared physically distributed memory have been designed and implemented. While some of the performance issues like parallelism and locality are common to different parallel architectures, issues such as data distribution are unique to specific architectures. One of the most important challenges compiler writers face is the design of compilation techniques that can work well on a variety of architectures. In this paper, we propose an algorithm that ...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
Over the past two decades tremendous progress has been made in both the design of parallel architect...
This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider ...
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines...
. We present compiler optimization techniques for explicitly parallel programs that communicate thro...
Distributed-memory multicomputers, such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 /SP-2...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor archi...
Many parallel languages presume a shared address space in which any portion of a computation can acc...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
We present a unified approach to locality optimization that employs both data and control transforma...
In this thesis, we explore the use of software distributed shared memory (SDSM) as a target communic...
This work identifies practical compiling techniques for scalable shared memory machines. For this, w...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
Over the past two decades tremendous progress has been made in both the design of parallel architect...
This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider ...
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines...
. We present compiler optimization techniques for explicitly parallel programs that communicate thro...
Distributed-memory multicomputers, such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 /SP-2...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor archi...
Many parallel languages presume a shared address space in which any portion of a computation can acc...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
We present a unified approach to locality optimization that employs both data and control transforma...
In this thesis, we explore the use of software distributed shared memory (SDSM) as a target communic...
This work identifies practical compiling techniques for scalable shared memory machines. For this, w...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
Many applications are memory intensive and thus are bounded by memory latency and bandwidth. While i...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
Over the past two decades tremendous progress has been made in both the design of parallel architect...