This document describes a set of new techniques for improving the efficiency of compiler-directed software prefetching for parallel Fortran programs running on cache-coherent DSM (distributed shared memory) multiprocessors. The key component used in this scheme is a data flow framework that exploits information about array access patterns and about the cache coherence protocol to predict at compile-time the sets of array references that are likely to cause coherence activity at run-time. The information derived from the data flow framework can be used to aid in applying a variety of optimizations, including better prefetching for coherence misses, more precise application of exclusivemode prefetching, and better prefetching for false-shared...
The cache coherence maintenance problem has been the major obstacle in using private cache memory to...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
In this paper, we present compiler algorithms for detecting references to stale data in sharedmemory...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
This dissertation presents a systematic approach to reduction of cache coherence overhead in shared-...
Although it is convenient to program large-scale multiprocessors as though all processors shared acc...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
. Data used by parallel programs can be divided into classes, based on how threads access it. For di...
[[abstract]]A cache coherence protocol for a multiprocessor system. Each processor in the system has...
The cache coherence maintenance problem has been the major obstacle in using private cache memory to...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache c...
In this paper, we present compiler algorithms for detecting references to stale data in sharedmemory...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
This dissertation presents a systematic approach to reduction of cache coherence overhead in shared-...
Although it is convenient to program large-scale multiprocessors as though all processors shared acc...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and mas...
Prefetching, i.e., exploiting the overlap of processor com-putations with data accesses, is one of s...
Memory latency becoming an increasing important performance bottleneck as the gap between processor ...
. Data used by parallel programs can be divided into classes, based on how threads access it. For di...
[[abstract]]A cache coherence protocol for a multiprocessor system. Each processor in the system has...
The cache coherence maintenance problem has been the major obstacle in using private cache memory to...
Shared-memory multiprocessors are becoming increasingly popular as a high-performance, easy to progr...
Compiler-directed cache prefetching has the poten-tial to hide much of the high memory latency seen ...